Lam Thanh Toan * , Nguyen Xuan Ha Giang and Nguyen Hoang Thuan

* Corresponding author (lttoan@ctuet.edu.vn)

Main Article Content

Abstract

Electronic commerce (e-commerce) brings huge advantages to businesses for selling products through multiple online shops. However, companies have difficulties in supervising the prices of products set by different retail shops on e-commerce platforms. Addressing these difficulties, we suggest a method to identify and predict products that sell at incorrect prices using a machine learning model combined price analysis. The study uses four machine learning models: K-nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Multinomial Naive Bayes (MNB) and two text-based information extraction methods: BoW and TF-IDF to find to the best method. The research results show that the RF model and text-based information extraction method by the BoW provide more average accuracy than other specific models, when experimenting on the filter dataset the average accuracy after 10 runs are RF: 98.06%, SVM: 83.92%, MNB: 92.21%, KNN: 94.06%. Experimental results on the product dataset have an accuracy of RF: 83.02%, SVM: 55%, MNB: 79.33%, KNN: 79.36%.

Keywords: Machine Learning, Random Forest, Price supervision

Article Details

References

Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics, 26(3), 801-849.

Agrawal, M., Khan, A. U., & Shukla, P. K. (2019). Stock price prediction using technical indicators: a predictive model using optimal deep learning. Learning, 6(2), 7.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Agustina, D. A., Subanti, S., & Zukhronah, E. (2021). Implementasi Text Mining Pada Analisis Sentimen Pengguna Twitter Terhadap Marketplace di Indonesia Menggunakan Algoritma Support Vector Machine. Indonesian Journal of Applied Statistics, 3(2), 109-122.

Fix, E., & Hodges Jr, J. L. (1952). Discriminatory analysis-nonparametric discrimination: Small sample performance. California Univ Berkeley.

Gielens, K., & Steenkamp, J. B. E. (2019). Branding in the era of digital (dis) intermediation. International Journal of Research in Marketing, 36(3), 367-384.

Hamad, H., Elbeltagi, I., & El‐Gohary, H. (2018). An empirical investigation of business‐to‐business e‐commerce adoption and its impact on SMEs competitive advantage: The case of Egyptian manufacturing SMEs. Strategic Change, 27(3), 209-229.

Holsapple, C. W., & Singh, M. (2000). Electronic commerce: from a definitional taxonomy toward a knowledge-management view. journal of Organizational computing and Electronic Commerce, 10(3), 149-170.

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Int. Group, 37(15), 237-251.

Loesche, W. J. (1994). Periodontal disease as a risk factor for heart disease. Compendium (Newtown, Pa.), 15(8), 976-978.

Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR). [Internet], 9, 381-386.

Muljono, M., Artanti, D. P., Syukur, A., & Prihandono, A. (2018). Analisa Sentimen Untuk Penilaian Pelayanan Situs Belanja Online Menggunakan Algoritma Naïve Bayes. Konferensi Nasional Sistem Informasi (KNSI) 2018.

Van, T. P., & Thanh, T. M. (2017, November). Vietnamese news classification based on BoW with keywords extraction and neural network. In 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES) (pp. 43-48). IEEE.

Quinlan, J. R. (1993). C4. 5 Programs for Machine Learning. Morgan Kaufmann, San Mateo, California.

Shalehanny, S., Triayudi, A., & Handayani, E. T. E. (2021). Public’s sentiment analysis on shopee-food service using lexicon-based and support vector machine. Jurnal Riset Informatika, 4(1), 1-8.

Joachims, T. (1998, April). Text categorization with support vector machines: Learning with many relevant features. In European conference on machine learning (pp. 137-142). Springer, Berlin, Heidelberg.

Tan, F. T. (2019). Realising platform operational agility through information technology–enabled capabilities: A resource-interdependence perspective. Information Systems Journal, 3(29), 582–608.

Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media.

Luo, Y., Zhao, S., Li, X., Han, Y., & Ding, Y. (2016). Text keyword extraction method based on word frequency statistics. Journal of computer applications, 36(3), 718.

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.

Negandhi, P., Trivedi, Y., & Mangrulkar, R. (2019). Intrusion detection system using random forest on the NSL-KDD dataset. In Emerging Research in Computing, Information, Communication and Applications (pp. 519-531). Springer, Singapore.

Hartford, J., Lewis, G., Leyton-Brown, K., & Taddy, M. (2017, July). Deep IV: A flexible approach for counterfactual prediction. In International Conference on Machine Learning (pp. 1414-1423). PMLR.

Nassar, L., Okwuchi, I. E., Saad, M., Karray, F., & Ponnambalam, K. (2020, July). Deep learning based approach for fresh produce market price prediction. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.

Imran, I., Zaman, U., Waqar, M., & Zaman, A. (2021). Using machine learning algorithms for housing price prediction: the case of Islamabad housing data. Soft Computing and Machine Intelligence, 1(1), 11-23.

Lamon, C., Nielsen, E., & Redondo, E. (2017). Cryptocurrency price prediction using news and social media sentiment. SMU Data Sci. Rev, 1(3), 1-22

An, J. (2019). Oil price predictors: Machine learning approach. 670216917.

Good, I. J. (1965). The estimation of probabilities: An essay on modern bayesian methods, pp. xi-xii.

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 986-996). Springer, Berlin, Heidelberg.

Nguyen, L. (2019). Text classification based on support vector machine. Dalat university journal of science, 9(2), 3-19.

Park, B., & Bae, J. K. (2015). Using machine learning algorithms for housing price prediction: The case
of Fairfax County, Virginia housing data. Expert Systems with Applications, 42, 2928–2934. Advance online publication. https://doi.org/10.1016/j.eswa.2015.03.005.

Kohli, C., Suri, R., & Kapoor, A. (2015). Will social media kill branding? Business Horizons, 58(1), 35–44.
https://doi.org/10.1016/j.bushor.2014.08.004

Kohli, P. P. S., Zargar, S., Arora, S., & Gupta, P. (2019). Stock prediction using machine learning algorithms. In Applications of Artificial Intelligence Techniques in Engineering (pp. 405-414). Springer, Singapore. https://doi.org/10.1007/978-981-13-1819-1_38

An, J. (2019). Oil price predictors: Machine learning approach. 670216917.

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.

Van-Duyet Le. 2017. stopwords: Vietnamese.
https://github.com/stopwords/vietnamese-stopwords.