Ngo Minh Tan , Ngo Ba Hung * and Stuchilin Vladimir Valerievich

* Corresponding author (nbhung@ctu.edu.vn)

Main Article Content

Abstract

This paper presents an exploration of sentiment analysis applied to Vietnamese phone reviews, leveraging the PhoBERT model. While significant advancements have been made in sentiment analysis for English and other widely spoken languages, Vietnamese remains relatively under investigated. Our study addresses this gap by constructing a comprehensive dataset that integrates data from the UIT-ViSFD dataset and data collected through web scraping. We experimented with various models including naive Bayes, Support Vector Machine, and PhoBERT, utilizing multiple data preprocessing techniques. PhoBERT, a state-of-the-art pre-trained language model specifically designed for Vietnamese, demonstrated superior performance. The final PhoBERT model with optimized preprocessing achieved an accuracy of 92.74%, highlighting its efficacy in accurately identifying sentiments.

Keywords: Fine-tuned PhoBERT, natural language processing, sentiment analysis, text classification, Vietnamese language

Article Details

References

1StopAsia. (2024). Difficulties with developing NLP for Vietnamese. 1StopAsia. https://www.1stopasia.com/blog/challenges-developing-nlp-for-vietnamese/

Hoang, V. C. D., Dinh, D., Nguyen, N. L., & Ngo, H. Q. (2007). A comparative study on Vietnamese text classification methods. In Proceedings of the 2007 IEEE International Conference on Research, Innovation and Vision for the Future (pp. 267–273). IEEE. https://doi.org/10.1109/RIVF.2007.369167

Kieu, B. T., & Pham, S. B. (2010). Sentiment analysis for Vietnamese. In Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering (pp. 7–9). IEEE. https://doi.org/10.1109/KSE.2010.33

Le, B. H., Nguyen, H. M., Nguyen, N. K. P., & Nguyen, B. T. (2022). A new approach for Vietnamese aspect-based sentiment analysis. In Proceedings of the 2022 14th International Conference on Knowledge and Systems Engineering (KSE) (pp. 19–21). IEEE. https://doi.org/10.1109/KSE56063.2022.9953759

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011

Nguyen, D. Q., & Nguyen, T. A. (2020). PhoBERT: Pre-trained language models for Vietnamese. In T. Cohn, Y. He, & Y. Liu (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2020 (pp. 1037–1042). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.92

Nguyen, M. H., Nguyen, T. M., & Nguyen, D. V. (2019). A corpus for aspect-based sentiment analysis in Vietnamese. In Proceedings of the 2019 11th International Conference on Knowledge and Systems Engineering (KSE) (pp. 24–26). IEEE. https://doi.org/10.1109/KSE.2019.8919448

Nguyen, P. X. V., Hong, T. T. T., Nguyen, K. V., & Nguyen, N. L. T. (2018). Deep learning versus traditional classifiers on Vietnamese students feedback corpus. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science (NICS) (pp. 23–24). IEEE. https://doi.org/10.1109/NICS.2018.8606837

Phan, L. L., Pham, P. H., Nguyen, K. T. T., Huynh, S. K., Nguyen, T. T., Nguyen, L. T., & Huynh, T. V. (2023). SA2SL: From aspect-based sentiment analysis to social listening system for business intelligence. In H. Qiu, C. Zhang, Z. Fei, M. Qiu, & S. Y. Kung (Eds.), Knowledge Science, Engineering and Management. KSEM 2021. Lecture Notes in Computer Science (Vol. 12816, pp. 662–677). Springer. https://doi.org/10.1007/978-3-030-82147-0_53

Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 1715–1725). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1162

Shaheen, M., Awan, S. M., Hussain, N., & Gondal, Z. A. (2019). Sentiment analysis on mobile phone reviews using supervised learning techniques. International Journal of Modern Education and Computer Science, 7, 32–43. https://doi.org/10.5815/ijmecs.2019.07.04

Start.io. (2024). Smartphone users in Vietnam. Start.io. https://www.start.io/audience/smartphone-users-in-vietnam

Statista. (2024). Number of mobile internet users in Vietnam from 2010 to 2029. Statista. https://www.statista.com/forecasts/1147340/mobile-internet-users-in-vietnam

Vu, T., Nguyen, D. Q., Nguyen, D. Q., Dras, M., & Johnson, M. (2018). VnCoreNLP: A Vietnamese natural language processing toolkit. In Y. Liu, T. Paek, & M. Patwardhan (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations (pp. 56–60). Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-5012

Yiran, Y., & Srivastava, S. (2019). Aspect-based sentiment analysis on mobile phone reviews with LDA. In Proceedings of the 4th International Conference on Machine Learning Technologies (ICMLT). ACM. https://doi.org/10.1145/3340997.3341012