Fine-tuned PhoBERT for sentiment analysis of Vietnamese phone reviews
Main Article Content
Abstract
This paper presents an exploration of sentiment analysis applied to Vietnamese phone reviews, leveraging the PhoBERT model. While significant advancements have been made in sentiment analysis for English and other widely spoken languages, Vietnamese remains relatively under investigated. Our study addresses this gap by constructing a comprehensive dataset that integrates data from the UIT-ViSFD dataset and data collected through web scraping. We experimented with various models including naive Bayes, Support Vector Machine, and PhoBERT, utilizing multiple data preprocessing techniques. PhoBERT, a state-of-the-art pre-trained language model specifically designed for Vietnamese, demonstrated superior performance. The final PhoBERT model with optimized preprocessing achieved an accuracy of 92.74%, highlighting its efficacy in accurately identifying sentiments.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
1StopAsia. (2024). Difficulties with developing NLP for Vietnamese. 1StopAsia. https://www.1stopasia.com/blog/challenges-developing-nlp-for-vietnamese/
Hoang, V. C. D., Dinh, D., Nguyen, N. L., & Ngo, H. Q. (2007). A comparative study on Vietnamese text classification methods. In Proceedings of the 2007 IEEE International Conference on Research, Innovation and Vision for the Future (pp. 267–273). IEEE. https://doi.org/10.1109/RIVF.2007.369167
Kieu, B. T., & Pham, S. B. (2010). Sentiment analysis for Vietnamese. In Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering (pp. 7–9). IEEE. https://doi.org/10.1109/KSE.2010.33
Le, B. H., Nguyen, H. M., Nguyen, N. K. P., & Nguyen, B. T. (2022). A new approach for Vietnamese aspect-based sentiment analysis. In Proceedings of the 2022 14th International Conference on Knowledge and Systems Engineering (KSE) (pp. 19–21). IEEE. https://doi.org/10.1109/KSE56063.2022.9953759
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011
Nguyen, D. Q., & Nguyen, T. A. (2020). PhoBERT: Pre-trained language models for Vietnamese. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1037–1042). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.92
Nguyen, M. H., Nguyen, T. M., & Nguyen, D. V. (2019). A corpus for aspect-based sentiment analysis in Vietnamese. In Proceedings of the 2019 11th International Conference on Knowledge and Systems Engineering (KSE) (pp. 24–26). IEEE. https://doi.org/10.1109/KSE.2019.8919448
Nguyen, P. X. V., Hong, T. T. T., Nguyen, K. V., & Nguyen, N. L. T. (2018). Deep learning versus traditional classifiers on Vietnamese students feedback corpus. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science (NICS) (pp. 23–24). IEEE. https://doi.org/10.1109/NICS.2018.8606837
Phan, L. L., Pham, P. H., Nguyen, K. T. T., Huynh, S. K., Nguyen, T. T., Nguyen, L. T., & Huynh, T. V. (2023). SA2SL: From aspect-based sentiment analysis to social listening system for business intelligence. In H. Qiu, C. Zhang, Z. Fei, M. Qiu, & S. Y. Kung (Eds.), Knowledge Science, Engineering and Management. KSEM 2021. Lecture Notes in Computer Science (Vol. 12816, pp. 662–677). Springer. https://doi.org/10.1007/978-3-030-82147-0_53
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 1715–1725). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1162
Shaheen, M., Awan, S. M., Hussain, N., & Gondal, Z. A. (2019). Sentiment analysis on mobile phone reviews using supervised learning techniques. International Journal of Modern Education and Computer Science, 7, 32–43. https://doi.org/10.5815/ijmecs.2019.07.04
Start.io. (2024). Smartphone users in Vietnam. Start.io. https://www.start.io/audience/smartphone-users-in-vietnam
Statista. (2024). Number of mobile internet users in Vietnam from 2010 to 2029. Statista. https://www.statista.com/forecasts/1147340/mobile-internet-users-in-vietnam
Vu, T., Nguyen, D. Q., Nguyen, D. Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese natural language processing toolkit. In Y. Liu, T. Paek, & M. Patwardhan (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 56–60). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-5012
Yiran, Y., & Srivastava, S. (2019). Aspect-based sentiment analysis on mobile phone reviews with LDA. In Proceedings of the 4th International Conference on Machine Learning Technologies (ICMLT). ACM. https://doi.org/10.1145/3340997.3341012