Vol. 17 No. Special issue: ISDS (2025) | CTU Journal of Innovation and Sustainable Development

Enhancing text-to-SQL capabilities of small language models via schema context enrichment and self-correction

Kiet Le Gia, Khanh Le Quoc, Nhut Nguyen Minh, Thuan Nguyen Dinh

Abstract |

PDF

Translating natural language into SQL is essential for intuitive database access, yet open-source small language models (SLMs) still lag behind larger systems when faced with complex schemas and tight context windows. This paper introduces a two-phase workflow designed to enhance the Text-to-SQL capabilities of SLMs. Phase 1 (offline) transforms the database schema into a graph, partitions it with Louvain community detection, and enriches each component in a cluster with metadata, relationships, and sample rows. Phase 2 (at runtime) selects the relevant tables, generates SQL queries, and iteratively refines the SQL through an execution-driven feedback loop until the query executes successfully. Evaluated on the Spider test set, our pipeline raises Qwen-2.5-Coder-14B to 86.2% Execution Accuracy (EX), surpassing its zero-shot baseline and outperforming all contemporary SLM + ICL approaches and narrowing the gap to GPT-4-based systems all while running on consumer-grade hardware. Ablation studies confirm that both schema enrichment and self-correction contribute significantly to the improvement. The study concludes that this workflow provides a practical methodology for deploying resource-efficient open-source SLMs in Text-to-SQL applications, effectively mitigating common challenges. An open-source implementation is released to support further research.

A hybrid deep learning approach for detecting lung abnormalities from chest X-ray images

Nguyen Viet Dung, Vu Van Huan, Le Duc Lam, Vu Huan, Bui Ngoc Dung

Abstract |

PDF

This paper proposes a hybrid deep learning model for lung abnormalities detection using X-ray images. To improve the performance and accuracy of the model, we use the transfer learning technique with two pre-trained models VGG16 and DenseNet121. Moreover, to extract deeply the feature of lung abnormal, frontal and lateral views of X-ray images have been trained using ensemble technique. The features extracted by these two models will be combined and passed to the classification layer. The experimental results on three datasets demonstrate the effectiveness of the proposed model, which outperforms the individual performance of the two base models, achieving a higher accuracy rate of 89%. Furthermore, in comparative assessments against several alternative models and datasets from previous research, our method demonstrates its efficiency, boasting an impressive AUC value of 0.95. These results underscore the promise of our approach in advancing the accuracy and effectiveness of lung abnormality detection in chest X-ray images.

MobiTran-SE: Hybrid MobileNetV3Small-Transformer architecture with squeeze-and-excitation for tomato leaf disease classification

Vo Hoang-Tu, Thien Nhon Nguyen, Mui Kheo Chau, Le Huan Lam, Tien Phuc Pham, Trung Hieu Nguyen, Phuc Vuong Nguyen

Abstract |

PDF

Diseases affecting tomato leaves represent a major risk to worldwide agricultural output and overall food security. In this study, we propose a innovative, lightweight and efficient deep learning (DL) approach for the classification of tomato leaf disease. Our architecture integrates the MobileNetV3Small backbone to extract multi-level features from input images, while Squeeze-and-Excitation (SE) blocks strengthen the focus on channel-wise features. A key component of our model is the incorporation of a Transformer-based module, which is applied to the fused features to extract long-range spatial interactions and contextual relationships. This hybrid approach enables the model to better distinguish between complex disease patterns in categories. The experimental findings indicate that the proposed model attains a high classification accuracy of 99.02%. The model also exhibits fast convergence and strong generalization, making it highly applicable for real-time deployment and resource-constrained agricultural environments. This work contributes a powerful and efficient solution to intelligent plant disease monitoring in the field of precision agriculture.

Component-based ensemble cluster analysis

Hassan Al Maruf, Nguyen Huu-Hoa, Hassan Md. Maruf, Masum Abdul Kadar Muhammad, Farid Dewan Md.

Abstract |

PDF

Ensemble clustering leverages multiple methods to identify diverse patterns and, instead of depending on a singular approach, generates a more dependable and accurate clustering solution. This methodology mitigates bias and noise in intricate, high-dimensional data, allowing the grouping of biological and genomic big data. Component-based ensemble clustering divides data into subsets, applies several algorithms, and then aggregates the outcomes to increase performance. This method analyzes each data subset independently, facilitating the recognition of various patterns while minimizing noise and bias. This paper proposes two novel clustering methods that integrate multiple algorithms, including Agglomerative Hierarchical Clustering (AHC), K-Means Clustering, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS), Improved Density-Based Spatial Clustering of Applications with Noise (IDBSCAN), and Density-Based Spatial Clustering of Applications with Noise Plus Plus (DBSCAN++). The second method, termed Ensemble Clustering with Each Subset (ECES), employs both ‘with-replacement’ and ‘without-replacement’ techniques to increase variety, minimize redundancy, and improve generalization. The key distinction resides in the ensemble step of the second strategy, which divides datasets into equal subsets to ensure fairness and comparability. This ensures fairness, comparability, and controlled diversity within the ensemble, reducing bias, redundancy, and overlap.

BagViT: Bagged vision transformers for classifying chest X-ray images

Truong Thi-Diem, Do Thanh-Nghi

Abstract |

PDF

In this paper, we propose a novel ensemble method, termed Bagged Vision Transformers (BagViT), to enhance the classification accuracy for Chest X-ray (CXR) images. BagViT constructs an ensemble of independent Vision Transformer (ViT) models, each of which is trained on a bootstrap sample (sampling with replacement) drawn from the original training dataset. To enhance model diversity, we use MixUp to generate synthetic training examples and introduce training randomness by varying the number of training epochs and selectively fine-tuning the top layers of each model. Final predictions are obtained through majority voting. Experimental results on a real-world dataset collected from Chau Doc Hospital (An Giang, Vietnam) demonstrate that BagViT significantly outperforms fine-tuned baselines such as VGG16, ResNet, DenseNet, ViT. Our BagViT achieves a classification accuracy of 72.25%, highlighting the effectiveness of ensemble learning with transformer architectures in scenarios with complex CXR images.

Naviblind: A multimodal AI assistant for visually impaired users to identify product information from images and speech

Tran Minh-Quan, Truong Duy, Pham Duy-Tan, Nguyen Minh-Anh, Le Duc-Tung, Le Di-Hao, Duong Quang-Huy

Abstract |

PDF

People with visual impairments often face significant challenges in identifying and accessing product information in their daily lives, particularly when visual cues such as packaging details, labels, or expiration dates are inaccessible. In this paper, we present NaviBlind, a multimodal AI-powered assistive system designed to help visually impaired individuals understand key product details through natural interactions. Our system combines image understanding using Gemini Flash vision models with Vietnamese speech recognition powered by PhoWhisper for extracting information needs directly from user voice commands. By uploading an image of the product and speaking what kind of information is needed, such as name, color, type, or expiry date, the system analyzes the image and returns a concise, structured textual description, which is then converted into Vietnamese speech. To ensure reliability, we incorporate mechanisms to detect uncertain or hallucinated outputs from the vision model, especially in cases of low-quality images. The system is deployed as a user-friendly web application, enabling real-time accessibility for users with limited visual capabilities. Experimental evaluation demonstrates the potential of NaviBlind in promoting autonomy and independence for the visually impaired in everyday shopping and product recognition tasks.

Non-destructive detection of ovary regions in live female mud crabs by machine-learning assisted multispectral imaging

Vo Hai-Dang, Tran Nhut-Thanh, Fukuzawa Masayuki

Abstract |

PDF

Ovarian fullness of female mud crabs (Scylla paramamosain) is key determinant of market value but is still assessed subjectively by hand. Spectrometry offers an objective alternative, and our previous studies under in vitro and semi-in vivo conditions demonstrated the potential of spectrometric features for discrimination of crab tissues (meat, ovary, hepatopancreas, and shell). However, it was still challenging to apply under in vivo conditions. This study aims to detect the ovary region in live mud crabs while keeping the ‘in vivo’ condition by combining a custom multispectral-imaging system and simple ML techniques. A special optical setup and a concise multispectral camera were included in the system aiming to acquire the transmission image through the intact carapace practically in the crab-farming fields. The ovary region was predicted pixel-wise and patch-wise using conventional classifiers (Logistic Regression, Random Forest, Gradient Boosting, k-NN, and SVM) and Convolutional Neural Networks (CNN), enhanced by Principal Component Analysis (PCA) for feature transformation. The patch-wise random forest model with PCA (7×7 patches) achieved superior performance, with an accuracy of 0.872 and an F1-score of 0.872, outperforming other methods. These findings mark a significant advancement in the application of multispectral imaging for automated, non-destructive quality assessment in live aquaculture specimens.

An acoustic-mechanical sensing system with multimodal machine learning techniques for in-line quality grading of watermelons

Tran Nhut-Thanh, Nguyen Huu-Phuoc, Nguyen Cat-Tuong, Truong Gia-Thuan, Nguyen Chanh-Nghiem, Fukuzawa Masayuki

Abstract |

PDF

A complete assessment of both internal and external quality parameters of watermelons is essential for export. However, small and medium-sized watermelon export enterprises often face challenges in accessing cost-effective and integrated grading systems. This study proposes an acoustic-mechanical sensing system for classifying watermelons based on both sweetness and weight. By combining weight measurement and sweetness estimation through acoustic analysis at a single station, the proposed system achieves a compact design and reduces data acquisition time. Additionally, a multimodal machine learning approach is applied to classify watermelon quality accurately. Among the tested models, the K-Nearest Neighbors model achieves the highest classification performance, with an accuracy of 97.3% and a precision of 96.6%. With its strong classification ability, integrated design, and low cost, the proposed system shows great potential for automated in-line quality grading of watermelons and other agricultural products. Unlike conventional large-scale systems that cascade individual grading functions, the integrated and cost-effective design of this system is suitable for small and medium-sized watermelon export enterprises to apply at each distributed shipping facility during intensive periods.

Traffic flow prediction using adaptive graph convolutional networks and long short-term memory

Han Phan Thi Ngoc , Ngoc Ho Quoc, Trung Nguyen Quoc, Binh Nguyen Thanh

Abstract |

PDF

Traffic congestion is becoming an increasingly serious and challenging issue in major urban areas. This problem not only causes a waste of time and increased fuel consumption but also contributes to environmental pollution and deterioration of residents’ quality of life. In this study, a new method of predicting the average speed reported by traffic sensors across the city was proposed. In this method, we make the most of two core models: Graph Convolutional Networks and Long Short-Term Memory. The YOLO model is used to analyze images and video during data collection. By leveraging Graphe Convolution Networks ability to capture spatial information, Long Short-Term Memory capacity to model temporal dynamics, and YOLO’s strength in visual object detection, our integrated framework enhances the accuracy of traffic flow predictions at specific locations and time intervals. This comprehensive approach aims to support real-world applications such as adaptive traffic light control, traffic planning support, and congestion alerts. The proposed method outperforms other methods on the Caltrans PeMS dataset.

Towards robust visual recognition for smart cities and remote sensing: A survey of regression losses in rotated object detection

Thai Chien, Trang Mai Xuan, Anh Son Le

Abstract |

PDF

Rotated object detection (ROD), often termed oriented object detection, is essential for numerous practical tasks, including remote sensing, self-driving systems, urban surveillance, and text recognition in natural scenes. Unlike conventional object detection, ROD must estimate object orientation, making angle regression and loss function design crucial to model performance. This paper presents a comprehensive survey of regression loss functions used in ROD, categorized into coordinate-based, approximated rotated IoU-based, and Gaussian-based approaches. We analyze their theoretical foundations, practical trade-offs, and effectiveness in addressing core challenges including angle periodicity, edge ambiguity, and metric inconsistency. Representative loss functions are benchmarked on standard datasets to evaluate their suitability for various detection frameworks. By emphasizing application contexts such as smart city monitoring and environmental analysis, this survey offers practical guidance for designing robust and efficient ROD systems that support sustainable development goals.

Incorporating self-attention into DenseNet for multi-label chest X-ray image classification

Vo Tri-Thuc, Do Thanh-Nghi

Abstract |

PDF

This paper presents DNet-nSA, a novel deep learning architecture designed to enhance multi-label classification of chest X-ray (CXR) images by integrating n self-attention blocks into the DenseNet framework. While convolutional neural networks (CNNs) are effective at identifying local patterns, they frequently face challenges in capturing long-range dependencies and global context, which are essential for detecting spatially distributed abnormalities in CXR images. By embedding self-attention mechanisms, DNet-nSA allows the network to better capture non-local interactions and highlight diagnostically relevant regions. We propose and evaluate two variants: DNet-1SA and DNet-2SA, corresponding to the number of self-attention modules used. Experiments conducted on the ChestX-ray14 dataset demonstrate that the proposed models outperform the baseline DenseNet, the contrastive learning approach MoCoR101, and the self-supervised learning model MoBYSwinT, achieving a notable AUC of 0.822, confirming the effectiveness of self-attention in improving multi-label CXR image classification.

Enabling smart campus indoor spaces through spatial modeling with the IMDF Platform

Hoang Vu Nguyen, Dai Nghia Nguyen, Kim Tran Nguyen Thi, Dinh Tri Ho, Chi Thinh Pham, Minh Thai Truong, Xuan Viet Truong

Abstract |

PDF

Smart campus is developed to deliver intelligent, user-centric services by leveraging IoT and big data to optimize the management of resources, spaces, and campus-wide activities. Its architecture relies on three core technological pillars: (1) Internet of Things (IoT) for collecting real-time data from the physical environment, (2) cloud computing for processing and storing both spatial and non-spatial data at scale, and (3) intelligent analytics that apply machine learning and data mining for automated decision-making and anomaly detection. Among these, spatial data, especially indoor spatial data, plays a vital role in enabling services such as indoor navigation, resource allocation, and environmental monitoring. However, the lack of standardization and poor interoperability with IoT systems remain key barriers to the effective use of indoor spatial data. To overcome this, this paper proposes a unified approach that leverages the Indoor Mapping Data Format (IMDF) as part of the spatial data infrastructure (SDI) for smart campuses. By integrating IMDF with the OGC SensorThings API, referred to as the digital nervous system (DNS) in a smart campus architecture, the approach helps build a flexible, real-time responsive indoor mapping system. This solution aims to standardize and optimize the connection between IoT data and indoor maps, thereby improving user experience and operational efficiency of smart campuses.

A clinically-oriented 2D image-analysis system for appearance-based aesthetic outcome evaluation of breast reconstruction surgery

Sonoi Takumi, Duong Nam Phong, Sowa Yoshihiro, Fukuzawa Masayuki

Abstract |

PDF

Aesthetic outcome of reconstructed breasts is currently rated subjectively by plastic surgeons, which introduces inter-rater bias and variability; thus, an image-based objective technique is desired. Developing such a technique, however, has been challenging due to the limited availability of reconstructed breast images under standardized conditions, and the complexity of assessing multiple aesthetic viewpoints. In this study, we propose a clinically-oriented two-dimensional (2D) image-analysis system where small fingerprint pairs representing the left and right breasts are extracted from conventional 2D chest images obtained in clinical settings and used as training data for a simple convolutional neural network (CNN), aiming for high effectiveness even with a limited number of cases. We extracted 16 type variations of fingerprints from 170 cases, and evaluated their influence on CNN performance. The optimal fingerprint types varied depending on the aesthetic viewpoint. The overall aesthetic score, calculated by aggregating the best-performing model scores across all viewpoints, showed a strong correlation (r > 0.9) with the average rater scores. Although 2D images capture only partial breast appearances and may not fully represent intrinsic three-dimensional (3D) features, the experimental results strongly support the potential of the proposed system for developing appearance-based models for aesthetic evaluation in clinical settings.

	All	Since 2020
Citations	1708
h-index	16
i10-index	35

Cover & Content