Nguyen Hoang Anh and Tran Thanh Dien *

* Corresponding author: Tran Thanh Dien (email: thanhdien@ctu.edu.vn)

Main Article Content

Abstract

This study presents a comparative analysis of state-of-the-art deep learning models–EfficientNetB0, MobileNetV2, and ResNet101–for image classification and content-based retrieval in scientific publications. A dataset of 4,303 images from 11 categories was curated from the Can Tho University Journal of Science and enhanced through tailored data augmentation strategies. The models were fine-tuned using transfer learning with hyperparameters optimized via Grid Search. Features were extracted using GlobalAveragePooling2D, and cosine similarity combined with the FAISS library was employed for efficient similarity search. Experimental results demonstrate a clear performance-efficiency trade-off: ResNet101 achieved the highest classification accuracy, while EfficientNetB0 and MobileNetV2 offered significant advantages in inference speed. A user-friendly web interface was developed to support practical image retrieval applications. These findings highlight the potential of deep learning in enhancing the management and integrity of scientific image resources.

Keywords: Comparative analysis, Content-based image retrieval (CBIR), Deep learning, FAISS, Grad-CAM, Image classification

Article Details

References

Ameen, Y. A., & Mohammed, D. (2023). Which data subset should be augmented for deep learning? A simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformaticshttps://doi.org/10.1186/s12859-023-05199-y

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.

Chang, P., Sun, H., & et al. (2024). Extraction and evaluation of features of preterm patent ductus arteriosus in chest X-ray images using deep learning. Scientific Reports. https://doi.org/10.1038/s41598-024-79361-8

Dien, T. T., Linh, T. T. T., Anh, L. D., Quyen, N. T. K., Dan, N. B., Hai, N. T., & Nghe, N. T. (2025). Image similarity detection in scientific articles using image processing and deep learning Resnet50. CTU Journal of Science, 61(2), 44-53. https://doi.org/10.22144/ctujos.2025.046 (in Vietnamese)

Falaschetti, L., Manoni, L., & et al. (2022). A CNN-based image detector for plant leaf diseases classification. HardwareX, 12https://doi.org/10.1016/j.ohx.2022.e00363

Gayadhankar, K., Patil, R., Chavan, P., Channe, P., & Patil, S. (2021). Image plagiarism detection using GAN - (Generative Adversarial Network). ITM Web of Conferences, 40, 03013. https://doi.org/10.1051/itmconf/20214003013

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-78). https://doi.org/10.1109/CVPR.2016.90

Jha, A., Uppal, M., & Pelz, J. B. (2018). Image forensics: Detecting duplication of scientific images with manipulation-invariant image similarity. arXiv preprint arXiv:1802.06515. https://arxiv.org/abs/1802.06515

Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547. https://doi.org/10.1109/TBDATA.2019.2921572

Li, J. (2024). Area under the ROC Curve has the most consistent evaluation for binary classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0316019

Lin, M., Chen, Q., & Yan, S. (2014). Network In Network. arXiv preprint arXiv:1312.4400. https://arxiv.org/abs/1312.4400

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

Ogunsanyaa, M., & Ibekwe, J. (2023). Grid Search Hyperparameter Tuning in Additive Manufacturing Processes. Manufacturing Letters, 36, 1031-1042. https://doi.org/10.1016/j.mfglet.2023.08.056

Parveen, R., & Kumar, R. (2023). Abnormal event detection model using an improved ResNet101 in context aware surveillance system. IET Cyber-Systems and Robotics. https://doi.org/10.1049/ccs2.12084

Rhee, V. H. (2019). A high-accuracy model average ensemble of convolutional neural networks for classification of cloud image patches on small datasets. ResearchGate. https://doi.org/10.3390/app9214500

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128, 336-359. https://doi.org/10.1007/s11263-019-01228-7

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/1409.1556

Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.

Tragoudaras, A., & Siozios, P. (2022). Design Space Exploration of a Sparse MobileNetV2 Using High-Level Synthesis and Sparse Matrix Techniques on FPGAs. Sensors, 22(12), 4318. https://doi.org/10.3390/s22124318

Zhou, A., & Ma, Y. (2022). Multi-head attention-based two-stream EfficientNet for action recognition. Multimedia Systems, 28, 487-498. http://dx.doi.org/10.1007/s00530-2022-00961-3