A multivariate analysis of the early dropout using classical machine learning and local interpretable model-agnostic explanations
Main Article Content
Abstract
Student dropout rates can have a significant negative impact on both the development of educational institutions and the personal growth of students. Consequently, many institutions are focused on identifying key factors that contribute to dropout and implementing strategies to mitigate them. This study aims to predict student dropout rates using classical machine learning algorithms while analyzing the key factors influencing these outcomes in higher education. The dataset includes demographic, socioeconomic, and academic information from various sources. Additionally, the study leverages the Local Interpretable Model-Agnostic Explanations (LIME) model to provide insights into the predictions, offering a clearer understanding of the factors driving dropout decisions. This knowledge is crucial for identifying influential factors and, more importantly, enhancing early intervention strategies and policies in educational settings, ultimately reducing dropout rates.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Belyadi, H., & Haghighat, A. (2021). Supervised learning. In Machine Learning Guide for Oil and Gas Using Python (pp. 169–295). Elsevier. https://doi.org/10.1016/B978-0-12-821929-4.00004-4
Dinh-Thanh, N., Thanh-Hai, N., & Thi-Ngoc-Diem, P. (2021). Forecasting and Analyzing the Risk of Dropping Out of High School Students in Ca Mau Province. In T. K. Dang, J. Küng, T. M. Chung, & M. Takizawa (Eds.), Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications (Vol. 1500, pp. 224–237). Springer Singapore. https://doi.org/10.1007/978-981-16-8062-5_15
Gault, B., & Cruse, L. R. (n.d.). Investing in Single Mothers’ Higher Education: Higher education.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30. https://papers.nips.cc/paper_files/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
Kruvelis, M., Cruse, L. R., & Gault, B. (2017). Single mothers in college: Growing enrollment, financial challenges, and the benefits of attainment. Briefing Paper #C460. Institute for Women’s Policy Research. https://eric.ed.gov/?id=ED612464
Li, I. W., & Carroll, D. R. (2020). Factors influencing dropout and academic performance: An Australian higher education equity perspective. Journal of Higher Education Policy and Management, 42(1), 14–30. https://doi.org/10.1080/1360080X.2019.1649993
Liu, J., Hu, S., & Pascarella, E. T. (2021). Are non-native English speaking students disadvantaged in college experiences and cognitive outcomes? Journal of Diversity in Higher Education, 14(3), 398–407. https://doi.org/10.1037/dhe0000164
Moreira Da Silva, D. E., Solteiro Pires, E. J., Reis, A., De Moura Oliveira, P. B., & Barroso, J. (2022). Forecasting students dropout: A UTAD University Study. Future Internet, 14(3), 76. https://doi.org/10.3390/fi14030076
Núñez-Hernández, C., & Buele, J. (2023). Factors Influencing university dropout in distance learning: A case study. Journal of Higher Education Theory and Practice, 23(14). https://doi.org/10.33423/jhetp.v23i14.6379
Nurmalitasari, Awang Long, Z., & Faizuddin Mohd Noor, M. (2023). Factors influencing dropout students in higher education. Education Research International, 2023, 1–13. https://doi.org/10.1155/2023/7704142
Oreopoulos, P., & Ford, R. (2019). Keeping college options open: A field experiment to help all high school seniors through the college application process. Journal of Policy Analysis and Management, 38(2), 426–454. https://doi.org/10.1002/pam.22115
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31. https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html
Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2021). Predict students’ dropout and academic success (Version 1.0) [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.5777340
Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2022). Predicting student dropout and academic success. Data, 7(11), 146. https://doi.org/10.3390/data7110146
Rigatti, S. J. (2017). Random forest. Journal of Insurance Medicine, 47(1), 31–39. https://doi.org/10.17849/insm-47-01-31-39.1
Singh, S., & Guestrin, C. (2016). "Why Should I trust you?": Explaining the predictions of any classifier (arXiv:1602.04938). arXiv. http://arxiv.org/abs/1602.04938
Song, Z., Sung, S.-H., Park, D.-M., & Park, B.-K. (2023). All-year dropout prediction modeling and analysis for university students. Applied Sciences, 13(2), 1143. https://doi.org/10.3390/app13021143
Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. https://doi.org/10.1016/j.techsoc.2024.102474