Nguyen Hoang Anh and Tran Thanh Dien *

* Corresponding author: Tran Thanh Dien (email: thanhdien@ctu.edu.vn)

Main Article Content

Abstract

This study presents a comparative analysis of state-of-the-art deep learning models–EfficientNetB0, MobileNetV2, and ResNet101–for image classification and content-based retrieval in scientific publications. A dataset of 4,303 images from 11 categories was curated from the Can Tho University Journal of Science and enhanced through tailored data augmentation strategies. The models were fine-tuned using transfer learning with hyperparameters optimised via Grid Search. Features were extracted using GlobalAveragePooling2D, and cosine similarity was combined with the FAISS library for efficient similarity search. Experimental results demonstrate a clear performance-efficiency trade-off: ResNet101 achieved the highest classification accuracy, while EfficientNetB0 and MobileNetV2 offered significant advantages in inference speed. A user-friendly web interface was developed to support practical image retrieval applications. These findings highlight the potential of deep learning in enhancing the management and integrity of scientific image resources.

Keywords: Comparative analysis, Content-based image retrieval (CBIR), Deep learning, FAISS, Grad-CAM, Image classification

Article Details

References

Ameen, Y. A., & Mohammed, D. (2023). Which data subset should be augmented for deep learning? A simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics.
https://doi.org/10.1186/s12859-023-05199-y

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305. https://dl.acm.org/doi/10.5555/2188385.2188395

Chang, P., Sun, H., Lee, J. & Kim, H. (2024). Extraction and evaluation of features of preterm patent ductus arteriosus in chest X-ray images using deep learning. Scientific Reports. https://doi.org/10.1038/s41598-024-79361-8

Falaschetti, L., Manoni, L., Leoa, D., Paub, D., Tomasellic, V., & Turchettia, C. (2022). A CNN-based image detector for plant leaf diseases classification. HardwareX, 12. https://doi.org/10.1016/j.ohx.2022.e00363

Gayadhankar, K., Patil, R., Chavan, P., Channe, P., & Patil, S. (2021). Image plagiarism detection using GAN - (Generative Adversarial Network). ITM Web of Conferences, 40, 03013.https://doi.org/10.1051/itmconf/20214003013

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-78). https://doi.org/10.1109/CVPR.2016.90

Jha, A., Uppal, M., & Pelz, J. B. (2018). Image forensics: Detecting duplication of scientific images with manipulation-invariant image similarity. arXiv preprint arXiv:1802.06515. https://arxiv.org/abs/1802.06515

Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547. https://doi.org/10.1109/TBDATA.2019.2921572

Li, J. (2024). Area under the ROC Curve has the most consistent evaluation for binary classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0316019

Lin, M., Chen, Q., & Yan, S. (2014). Network In Network. arXiv preprint arXiv:1312.4400. https://arxiv.org/abs/1312.4400

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

Ogunsanyaa, M., & Ibekwe, J. (2023). Grid Search Hyperparameter Tuning in Additive Manufacturing Processes. Manufacturing Letters, 36, 1031-1042. https://doi.org/10.1016/j.mfglet.2023.08.056

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520). https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00474

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128, 336-359. https://doi.org/10.1007/s11263-019-01228-7

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/1409.1556

Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. https://arxiv.org/abs/1905.11946

Tragoudaras, A., & Siozios, P. (2022). Design Space Exploration of a Sparse MobileNetV2 Using High-Level Synthesis and Sparse Matrix Techniques on FPGAs. Sensors, 22(12), 4318. https://doi.org/10.3390/s22124318

Zhou, A., & Ma, Y. (2022). Multi-head attention-based two-stream EfficientNet for action recognition. Multimedia Systems, 28, 487-498. http://dx.doi.org/10.1007/s00530-2022-00961-3