Vol. 15 No. Special issue: ISDS (2023) | CTU Journal of Innovation and Sustainable Development

Impact of dimensionality reduction techniques on student performance prediction using machine learning

Koushik Roy, Huu-Hoa Nguyen, Dewan Md. Farid

Abstract |

PDF

This study addresses the crucial issue of predicting student performance in educational data mining (EDM) by proposing an Adaptive Dimensionality Reduction Algorithm (ADRA). ADRA efficiently reduces the dimensionality of student data, encompassing various academic, demographic, behavioral, social, and health-related features. It achieves this by iteratively selecting the most relevant features based on a combined normalized mean rank of five feature ranking methods. This reduction in dimensionality enhances the performance of predictive models and provides valuable insights into the key factors influencing student performance. The study evaluates ADRA using four different student performance datasets and six machine learning algorithms, comparing it to three existing dimensionality reduction methods. The results show that ADRA achieves an average dimensionality reduction factor of 6.2 while maintaing comprable accuracy with other mehtods.

The Face and number plate recognition for car anti-theft

Quoc Bao Truong, Tan-Loc Tran , Tan-Kiet Thanh Nguyen, Huu-Cuong Nguyen

Abstract |

PDF

Smart parking systems along with that is continuous development of new technologies, are widely applied to improve our lives. It can also add new technologies with advanced functions making it a multi-functional management system. Thanks to the anti-theft technologies that are installed, modern cars are significantly more difficult to steal than they once were. However, electrical systems can still experience issues, though and malfunction at some point. This paper suggests using video image recognition technology, at car parks and parking lots as an anti-theft solution, alerting the presence of the non-owner of vehicle in the driver’s seat. The system automatically predicts whether the driver is valid with the registration number plate. The image of the car is captured by the camera at the entrance gates of the parking lot. The proposed algorithm includes face recognition in images, building a deep learning convolutional network that classifies faces (subscriber’s images); using Cascade trainer to train number plate object recognition, vehicle number recognition through character recognition technique. The system can recognize reality through a personal computer connected to the camera at the scene or through photos and video files. Result, the model can face recognition and match to license plate in at a moment.

Automatic identification of Dong Son antique glass artifacts using evolving learning

Ngo-Ho Anh-Khoi, Bui Hoang-Bac, Pham Van-Trieu

Abstract |

PDF

Regarding the Dong Son culture, given the diverse range of artifacts discovered, we propose the utilization of an artificial intelligence system for the automated and comprehensive identification of Dong Son glass jewelry through SEM gemological analysis. This approach, which has gained prominence in the field of archaeology worldwide over the past five years, aims to integrate advanced technology into Vietnamese archaeology. Our research is motivated by the unique conditions present in archaeology, where we seek to employ evolving learning algorithms to archaeological databases, comparing and selecting the most suitable model that aligns with the archeological dataset's performance. We have developed the Recognition Automatic System for Dong Son Antique Glasses (RAS-DSA), capable of accurately distinguishing between Dong Son and non-Dong Son glass ornaments, and is freely distributed to experts and archaeologists. This collaborative research involves Nam Can Tho University, Hanoi University of Mining and Geology, the Vietnamese Institute of Archaeology, and the UNESCO Center for Research and Conservation of Vietnamese Antiquities.

An environment monitoring system for a rice production model adaptive to climate change in the Mekong Delta of Viet Nam

Truong Minh Thai, Nguyen Dai Nghia, Trinh Minh Qui, Nguyen Khoi Nghia, Nguyen Huu Thien, Huynh Quy Khang, Luu Ca, Vo Huynh Tram

Abstract |

PDF

This article presents an environment monitoring model for agricultural production applied to farmers in the Mekong Delta, Viet Nam. The proposed model's special feature is a system combining techniques such as IoT, agent-based, sensor networks, and data warehouses. Successfully deploying the model in monitoring the water level, and the environmental factors of the rice soil such as NPK, pH, EC, temperature, and soil moisture not only aids managers and farmers monitor environmental quality indicators in real time, but also experts to analyze data, give warnings and appropriate response solutions in agricultural production according to the pivot strategy towards Agriculture 4.0 proposed by the Minister of the Ministry of Agriculture and Rural Development of Viet Nam.

Application of IoT technology on control system and monitoring for Cucumis Melo L. grown in greenhouse

Lam Minh Dung, Truong Quoc Bao, Nguyen Hoang Dung, Lam Van Vu

Abstract |

PDF

In this paper, we propose developing a system for optimally watering Cucumis Melo L. crops based on an application Internet of Things (IoT). The three components are hardware, a web application, and a mobile application. The first component was designed and implemented in control box hardware connected to collect crop datas. Soil moisture sensors are used to monitor the greenhouse, connected to the control box. The second component is a web-based application that was designed and implemented to manipulate the crop data and field information. This component applied data mining to analyze the data for predicting suitable temperature, humidity, and soil moisture for optimal future management of crop growth. The final component is mainly used to control crop watering through a mobile application on a smartphone. This allows either automatic or manual control of the user. The automatic control uses data from soil moisture sensors for water. The results showed the implementation to be useful in agriculture. The moisture content of the soil was maintained appropriately for Cucumis Melo L. growth, reducing costs and increasing agricultural productivity. Moreover, this work represents improvements to agriculture through digital innovation.

Detecting Vietnamese fake news

Vo Duc Vinh, Do Phuc

Abstract |

PDF

This paper focuses on constructing a dataset consisting of both fake news and factual news in the Vietnamese language. We employ Deep Learning models, namely Long Short-Term Memory, bidirectional Long Short-Term Memory, and Convolutional Neural Network - bidirectional Long Short-Term Memory, to identify Vietnamese fake news. The performance evaluation of the models includes assessing the prediction ratio Area Under The Curve of each model and providing insights into their computational efficiency. Additionally, these three models evaluate the contribution of deep learning techniques for fake news detection and emphasize the potential for exploring interconnections between neural networks in addressing automatic Vietnamese fake news detection.

Exploring MediaPipe optimization strategies for real-time sign language recognition

Nguyen Phuoc Thanh, Nguyen Thanh Hoang, Hoang Ngoc Xuan Nguyen, Phan Huynh Thanh Binh, Vu Hoang Son Hai, Huynh Hieu Nhan

Abstract |

PDF

The present study meticulously investigates optimization strategies for real-time sign language recognition (SLR) employing the MediaPipe framework. We introduce an innovative multi-modal methodology, amalgamating four distinct Long Short-Term Memory (LSTM) models dedicated to processing skeletal coordinates ascertained from the MediaPipe framework. Rigorous evaluations were executed on esteemed sign language datasets. Empirical findings underscore that the multi-modal approach significantly elevates the accuracy of the SLR model while preserving its real-time capabilities. In comparative analyses with prevalent MediaPipe-based models, our multi-modal strategy consistently manifested superior performance metrics. A distinguishing characteristic of this approach is its inherent adaptability, facilitating modifications within the LSTM layers, rendering it apt for a myriad of challenges and data typologies. Integrating the MediaPipe framework with real-time SLR markedly amplifies recognition precision, signifying a pivotal advancement in the discipline.

Analysis of printed document identification based on Deep Learning

Nguyen Dinh Thong, Phu Quang Nguyen, Mai Hoang Bao An

Abstract |

PDF

In this study, we investigate the effectiveness of ResNet, a deep neural network architecture, for a deep learning approach to address the problem of printed document identification. ResNet is known for its ability to handle the vanishing gradient problem and learn highly representative features. Multiple variations of ResNet have been applied, including ResNet50, ResNet101, and ResNet152, which provide the backbone architecture of our classification model and are trained on a comprehensive dataset of microscopic printed images containing some microscopic printing patterns from various source printers. We also incorporate Mix-up augmentation, a technique that generates virtual training samples by interpolating pairs of images and labels, to further enhance the performance and generalization capability of the model. The experimental results showed that ResNet101 and ResNet152 variants outperformed in accurately distinguishing printer sources based on microscopic printed patterns. We developed a mobile app to test the feasibility of our findings in practice. In conclusion, this study aims to lay the groundwork for creating a sufficiently pre-trained model with accurate performance of identification that can be deployed on mobile devices to detect the printed sources of documents.

Detection of Crowd concentrations with YOLO V3

Ba Duy Nguyen, Thanh Nhan Dinh, Thanh Bach Nguyen, Quoc Dinh Truong

Abstract |

PDF

Crowd detection using street cameras has attracted a lot of research in recent years. In this paper, we propose a simple, fast, and effective method using YOLOv3 model for crowd detection. Using image frames extracted from surveillance video, pedestrian objects are detected, counted and a warning signal is sent out when a crowd occurs. The obtained results on test data extracted from 2 data sets STCrowd, SmartCity, and our self-collected dataset confirm the feasibility of the proposed method.

An interpretable approach for trustworthy intrusion detection systems against evasion samples

Ngoc Tai Nguyen, Hien Do Hoang, Phan The Duy, Van-Hau Pham

Abstract |

PDF

In recent years, Deep Neural Networks (DNN) have demonstrated remarkable success in various domains, including Intrusion Detection Systems (IDS). The ability of DNN to learn complex patterns from large datasets has significantly improved IDS performance, leading to more accurate and efficient threat detection. Despite their effectiveness, DNN models exhibit vulnerabilities to adversarial attacks, where malicious inputs are specifically crafted to deceive the models and evade detection. This paper provides insights into the effectiveness of deep learning-based IDS (DL-IDS) against adversarial example (AE) attacks. We tackle the weaknesses of DNN in detecting adversarial attacks by proposing the Convolutional Neural Network (CNN), which serves as an AE detector. We also utilize one of the XAI techniques, specifically SHAP, to enhance the transparency of the AE detector. Our results show that the AE detector has obvious effects for detecting adversarial examples and achieves an impressive 99.46% accuracy in our experimental environment.

A practical blockchain-based framework for anti-counterfeiting and traceability

Hung Ho Dac, Vo Van Len, Nguyen The Bao, Nguyen Cao Hoai Phuong, Tran Van Huu

Abstract |

PDF

Blockchain has features that help systems overcome the inherent limitations of traditional centralized approaches. Integrating blockchain technologies into systems to improve security, privacy, and transparency is a prominent trend. Currently, consumers are very concerned about the origin and the legitimacy of the products, so it is necessary to strengthen anti-counterfeiting and traceability in product management. Many theoretical approaches have been proposed recently. In this study, we propose a practical framework based on blockchain that supports anti-counterfeiting and traceability. Based on the proposed practical framework, we implemented a pattern as a proof-of-concept of the approach. We conduct experiments at Thu Dau Mot University, a pioneer in the production and transfer of biotechnology. The results show feasibility and good results regarding the proposed practical framework. Moreover, we also published our code base to GitHub with an open license.

Career path prediction using XGBoost Model and students’ academic results

Hong Quan Nguyen, Duc Dang Khoi Nguyen, Tan Duy Le, An Mai, Kha Tu Huynh

Abstract |

PDF

This paper proposes an approach for constructing a system for career prediction by applying the eXtreme Gradient Boosting (XGBoost) Decision Tree model to the academic results of Ho Chi Minh International University’s School of Computer Science and Engineering graduates in the past 5 years. Initially, the dataset is cleaned up and normalized to be usable for the prediction algorithm with the help of Python 3 programming language. It is then split into 2 subsets: one for training (80 percent) and the other for testing (20 percent). After that, the algorithm uses the training subset to build the classification model. Finally, the testing subset is loaded into the model to predict each student’s career path based on the respective inputs and hyper-parameters tuning is employed to boost the model’s accuracy. By utilizing this solution, the problem related to predicting students’ future career paths based on their performance throughout their years studying at the university can be adequately addressed and handled.

New data about library service quality and convolution prediction

Nguyen Minh Tuan, Phayung Meesad, Duong Van Hieu, Maleerat Maliyaem

Abstract |

PDF

Library service quality, one of the key performance indicators of service qualities in universities, has been considered deeply in management strategies as part of the Fourth Industrial Revolution, especially, after the Covid-19 pandemic. We undertook a survey around Universities in Ho Chi Minh City and Tien Giang University, Vietnam focused on freshmen and sophomores to assess library service quality for improving the learning service quality. Machine learning has been deployed for predicting the library service, quality, and has been adopted successfully in depicting the assessment results. To perform the effectiveness of data, the Convolution Bidirectional Long-Short Term Memory (Conv-BiLSTM), and Convolution Bidirectional Gated Recurrent Unit (ConvBiGRU) were used. The models have illustrated appropriate performances when providing sufficient accuracy and extracting the prediction of the output.

Topic based document modeling for information filtering

Nguyen Tran Diem Hanh

Abstract |

PDF

Information Filtering (IF), which has been popularly studied in recent years, is one of the areas that applies document retrieval techniques for dealing with the huge amount of information. In IF systems, modelling user’s interest and filtering relevant documents are major parts of the systems. Various approaches have been proposed for modelling the first component. In this study, we utilized a topic-modelling technique, Latent Dirichlet Topic Modelling, to model user’s interest for IFs. In particular, an extended model of it to represent user’s interest named Latent Dirichlet Topic Modelling with high Frequency Occurrences, shorted as LDA_HF, was proposed with the intention to enhance retrieving performance of IFs. The new model was then compared to the existing methods in modelling user’s interest such as BM25, pLSA, and LDA_IF over the big benchmark datasets, RCV1 and R8. The results of extensive experiments showed that the new proposed model outperformed all the state-of-the-art baseline models in user modelling such as BM25, pLSA and LDA_IF according to 4 major measurement metrics including Top20, B/P, MAP, and F1. Hence, the model LDA_HF promises one of the reliable methods of enhancing performance of IFs.

Predicting graduation grades using Machine Learning: A case study of Can Tho University students

Nguyen Minh Khiem, Huynh Van Tu, Nguyen Hung Dung

Abstract |

PDF

A number of factors influence a student's attainment of graduation. Besides scholastic performance within the academic curriculum, other variables such as living circumstances, gender, and choice of major significantly contribute to the probability of achieving graduation. The capacity to forecast academic performance at the time of graduation holds profound importance for universities, especially in discerning the influential factors that contribute to a student's successful completion of their educational pursuits. This study employs multiple machine learning algorithms, including K-nearest neighbor, Neural network, Decision tree, Random forest, and Gradient boosting, to prognosticate the graduation outcomes of 7,837 undergraduate students from Can Tho University during the academic year 2022. These selected students were enrolled in 16 colleges and institutes affiliated with Can Tho University. The efficacy of the employed algorithms was assessed through performance evaluation metrics encompassing accuracy, precision, recall, and F-measure. Furthermore, a 15-fold cross-validation technique was employed for validation. The findings revealed that the Random forest model yielded the most reliable predictions. The factors that significantly impact graduation grades comprise GPA, training point, residential address, college, major, and gender. Based on the experimental findings, these factors were ranked to ascertain their effects on student graduation.

Factorizing social advanced aspects in modern life on human health

Phan Yen Ngan Nguyen, Dung Hai Dinh, Ngoc Hong Tran

Abstract |

PDF

The emergence of technology has brought about a dramatic shift in different aspects of life, especially the aspect of societal advancement. As people adapt to such transformation, it is undoubted that their lifestyles adapt as well. However, whether such changes oppose the well-being of an individual and whether there is a relationship between daily habits and health condition is an interesting topic that this research is going to focus on. Meanwhile, the benefits of implementing data analysis have been proven regarding understanding statistical problems. On such account, this work is going to use the powerful method of data analysis into investigating the relationship between lifestyle and health conditions, with the practice of big data sets deploying R programming language. The procedure from gathering to interpreting data is going to be introduced, and the result is then placed in comparison with related existing studies. By doing so, it has shown that the result has reflected accurately the topics and thus, given evidence for the potential of analyzing the connection between habitual practice and health status.

Similarity join over multiple time series under Dynamic Time Warping

Bui Cong Giao

Abstract |

PDF

Similarity join over multiple time series is an interesting task of data mining. This task aims at identifying couples of similar subsequences from multiple time series and the two subsequences might have any length and be at any position in the time series. However, the task is extremely challenging since the computational time to search for couples of similar subsequences from two time series is very large. Moreover, the task needs to normalize two subsequences before conducting a distance measure on the normalized subsequences to consider the similar degree of the original subsequences. To address the problem, this paper proposes a method of similarity join over two time series under Dynamic Time Warping (DTW), supporting z-score normalization. The proposed method utilizes both a suite of state-of-the-art techniques for computing the DTW distance and a technique of incremental z-score normalization to reduce the computational costs. The method employs multithreading to improve runtime performance. If similar subsequences from two time series may not pair up because they are too far apart, the method might use a sliding window to constrain a scope for coupling similar subsequences. The experiments on the proposed method show that the method could return similar subsequences quickly and incur no false dismissals.

	All	Since 2020
Citations	1460	1295
h-index	15	14
i10-index	26	22

Cover & Content