Hong Quan Nguyen , Duc Dang Khoi Nguyen , Tan Duy Le , An Mai and Kha Tu Huynh *

* Corresponding author (hktu@hcmiu.edu.vn)

Main Article Content

Abstract

This paper proposes an approach for constructing a system for career prediction by applying the eXtreme Gradient Boosting (XGBoost) Decision Tree model to the academic results of Ho Chi Minh International University’s School of Computer Science and Engineering graduates in the past 5 years. Initially, the dataset is cleaned up and normalized to be usable for the prediction algorithm with the help of Python 3 programming language. It is then split into 2 subsets: one for training (80 percent) and the other for testing (20 percent). After that, the algorithm uses the training subset to build the classification model. Finally, the testing subset is loaded into the model to predict each student’s career path based on the respective inputs and hyper-parameters tuning is employed to boost the model’s accuracy. By utilizing this solution, the problem related to predicting students’ future career paths based on their performance throughout their years studying at the university can be adequately addressed and handled.

Keywords: Career path prediction, decision making, deep learning, tree classification, XGBoost decision

Article Details

References

Alabadla, M., Sidi, F., Ishak, I., Ibrahim, H., Affendey, L. S., Ani, Z. C., ... & Jaya, M. I. M. (2022). Systematic review of using machine learning in imputing missing values. IEEE Access, 10, 44483-44502.

Bao, J. (2020, August). Multi-features based arrhythmia diagnosis algorithm using xgboost. In 2020 International Conference on Computing and Data Science (CDS) (pp. 454-457). IEEE.

Budholiya, K., Shrivastava, S. K., & Sharma, V. (2022). An optimized XGBoost based diagnostic system for effective prediction of heart disease. Journal of King Saud University-Computer and Information Sciences, 34(7), 4514-4523.

Casuat, C. D., & Festijo, E. D. (2019, December). Predicting students' employability using machine learning approach. In 2019 IEEE 6th international conference on engineering technologies and applied sciences (ICETAS) (pp. 1-5). IEEE.

Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(1), 20-28.

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794).

Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., & Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1-37.

Gumelar, A. B., Setyorini, H., Adi, D. P., Nilowardono, S., Widodo, A., Wibowo, A. T., ... & Christine, E. (2020, September). Boosting the Accuracy of Stock Market Prediction using XGBoost and Long Short-Term Memory. In 2020 International Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 609-613). IEEE.

Jamal, K., Kurniawan, R., Husti, I., Nazri, M. Z. A., & Arifin, J. (2020, October). Predicting Career Decisions Among Graduates of Tafseer and Hadith. In 2020 2nd International Conference on Computer and Information Sciences (ICCIS) (pp. 1-4). IEEE.

Križanić, S. (2020). Educational data mining using cluster analysis and decision tree technique: A case study. International Journal of Engineering Business Management, 12, 1847979020908675.

Loh, W. Y. (2011). Classification and regression trees. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(1), 14-23.

Nie, M., Xiong, Z., Zhong, R., Deng, W., & Yang, G. (2020). Career choice prediction based on campus big data—mining the potential behavior of college students. Applied Sciences, 10(8), 2841.

Panhalkar, A. R., & Doye, D. D. (2022). Optimization of decision trees using modified African buffalo algorithm. Journal of King Saud University-Computer and Information Sciences, 34(8), 4763-4772.

Poznanski, K. Z. (2015). Confucian economics: the world at work. World Review of Political Economy, 6(2), 208-251.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.

Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.

Roy, K. S., Roopkanth, K., Teja, V. U., Bhavana, V., & Priyanka, J. (2018). Student career prediction using advanced machine learning techniques. International Journal of Engineering & Technology, 7(2.20), 26-29.

Saa, A. A. (2016). Educational data mining & students’ performance prediction. International Journal of Advanced Computer Science and Applications, 7(5).

Vignesh, A., Yokesh Selvan, T., Gopala Krishnan, G. K., Sasikumar, A. N., & Ambeth Kumar, V. D. (2020). Efficient student profession prediction using XGBoost algorithm. In Emerging Trends in Computing and Expert Technology (pp. 140-148). Springer International Publishing.

Wang, T., Zhang, Y., & Jia, R. (2021, May). Improving robustness to model inversion attacks via mutual information regularization. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 13, pp. 11666-11673).

Wang, Y., & Guo, Y. (2020). Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Communications, 17(3), 205-221.