Thanh Hai Nguyen * , Phuong Le , Tuyen Thanh Thi Nguyen and Anh Kim Su

* Corresponding author (nthai@cit.ctu.edu.vn)

Main Article Content

Abstract

Student dropout rates can have a significant negative impact on both the development of educational institutions and the personal growth of students. Consequently, many institutions are focused on identifying key factors that contribute to dropout and implementing strategies to mitigate them. This study aims to predict student dropout rates using classical machine learning algorithms while analyzing the key factors influencing these outcomes in higher education. The dataset includes demographic, socioeconomic, and academic information from various sources. Additionally, the study leverages the Local Interpretable Model-Agnostic Explanations (LIME) model to provide insights into the predictions, offering a clearer understanding of the factors driving dropout decisions. This knowledge is crucial for identifying influential factors and, more importantly, enhancing early intervention strategies and policies in educational settings, ultimately reducing dropout rates.

Keywords: Dropout Prediction, Machine learning, Explanation

Article Details

References

Belyadi, H., & Haghighat, A. (2021). Supervised learning. In Machine Learning Guide for Oil and Gas Using Python (pp. 169–295). Elsevier. https://doi.org/10.1016/B978-0-12-821929-4.00004-4

Dinh-Thanh, N., Thanh-Hai, N., & Thi-Ngoc-Diem, P. (2021). Forecasting and Analyzing the Risk of Dropping Out of High School Students in Ca Mau Province. In T. K. Dang, J. Küng, T. M. Chung, & M. Takizawa (Eds.), Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications (Vol. 1500, pp. 224–237). Springer Singapore. https://doi.org/10.1007/978-981-16-8062-5_15

Gault, B., & Cruse, L. R. (n.d.). Investing in Single Mothers’ Higher Education: Higher education.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30. https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html

Kruvelis, M., Cruse, L. R., & Gault, B. (2017). Single mothers in college: Growing enrollment, financial challenges, and the benefits of attainment. Briefing Paper #C460. Institute for Women’s Policy Research. https://eric.ed.gov/?id=ED612464

Li, I. W., & Carroll, D. R. (2020). Factors influencing dropout and academic performance: An Australian higher education equity perspective. Journal of Higher Education Policy and Management, 42(1), 14–30. https://doi.org/10.1080/1360080X.2019.1649993

Liu, J., Hu, S., & Pascarella, E. T. (2021). Are non-native English speaking students disadvantaged in college experiences and cognitive outcomes? Journal of Diversity in Higher Education, 14(3), 398–407. https://doi.org/10.1037/dhe0000164

Moreira Da Silva, D. E., Solteiro Pires, E. J., Reis, A., De Moura Oliveira, P. B., & Barroso, J. (2022). Forecasting students dropout: A UTAD University Study. Future Internet, 14(3), 76. https://doi.org/10.3390/fi14030076

Núñez-Hernández, C., & Buele, J. (2023). Factors Influencing university dropout in distance learning: A case study. Journal of Higher Education Theory and Practice, 23(14). https://doi.org/10.33423/jhetp.v23i14.6379

Nurmalitasari, Awang Long, Z., & Faizuddin Mohd Noor, M. (2023). Factors influencing dropout students in higher education. Education Research International, 2023, 1–13. https://doi.org/10.1155/2023/7704142

Oreopoulos, P., & Ford, R. (2019). Keeping college options open: A field experiment to help all high school seniors through the college application process. Journal of Policy Analysis and Management, 38(2), 426–454. https://doi.org/10.1002/pam.22115

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31. https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html

Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2021). Predict students’ dropout and academic success (Version 1.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.5777340

Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2022). Predicting student dropout and academic success. Data, 7(11), 146. https://doi.org/10.3390/data7110146

Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31–39. https://doi.org/10.17849/insm-47-01-31-39.1

Singh, S., & Guestrin, C. (2016). "Why Should I trust you?": Explaining the predictions of any classifier (arXiv:1602.04938). arXiv. http://arxiv.org/abs/1602.04938

Song, Z., Sung, S.-H., Park, D.-M., & Park, B.-K. (2023). All-year dropout prediction modeling and analysis for university students. Applied Sciences, 13(2), 1143. https://doi.org/10.3390/app13021143

Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474