A comparative deep learning approach for image classification and retrieval in scientific publications
Main Article Content
Abstract
This study presents a comparative analysis of state-of-the-art deep learning models–EfficientNetB0, MobileNetV2, and ResNet101–for image classification and content-based retrieval in scientific publications. A dataset of 4,303 images from 11 categories was curated from the Can Tho University Journal of Science and enhanced through tailored data augmentation strategies. The models were fine-tuned using transfer learning with hyperparameters optimized via Grid Search. Features were extracted using GlobalAveragePooling2D, and cosine similarity combined with the FAISS library was employed for efficient similarity search. Experimental results demonstrate a clear performance-efficiency trade-off: ResNet101 achieved the highest classification accuracy, while EfficientNetB0 and MobileNetV2 offered significant advantages in inference speed. A user-friendly web interface was developed to support practical image retrieval applications. These findings highlight the potential of deep learning in enhancing the management and integrity of scientific image resources.
Article Details
Conflict of Interest

© 2026 The authors. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License.
References
Ameen, Y. A., & Mohammed, D. (2023). Which data subset should be augmented for deep learning? A simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics. https://doi.org/10.1186/s12859-023-05199-y
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.
Chang, P., Sun, H., & et al. (2024). Extraction and evaluation of features of preterm patent ductus arteriosus in chest X-ray images using deep learning. Scientific Reports. https://doi.org/10.1038/s41598-024-79361-8
Dien, T. T., Linh, T. T. T., Anh, L. D., Quyen, N. T. K., Dan, N. B., Hai, N. T., & Nghe, N. T. (2025). Image similarity detection in scientific articles using image processing and deep learning Resnet50. CTU Journal of Science, 61(2), 44-53. https://doi.org/10.22144/ctujos.2025.046 (in Vietnamese)
Falaschetti, L., Manoni, L., & et al. (2022). A CNN-based image detector for plant leaf diseases classification. HardwareX, 12. https://doi.org/10.1016/j.ohx.2022.e00363
Gayadhankar, K., Patil, R., Chavan, P., Channe, P., & Patil, S. (2021). Image plagiarism detection using GAN - (Generative Adversarial Network). ITM Web of Conferences, 40, 03013. https://doi.org/10.1051/itmconf/20214003013
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-78). https://doi.org/10.1109/CVPR.2016.90
Jha, A., Uppal, M., & Pelz, J. B. (2018). Image forensics: Detecting duplication of scientific images with manipulation-invariant image similarity. arXiv preprint arXiv:1802.06515. https://arxiv.org/abs/1802.06515
Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547. https://doi.org/10.1109/TBDATA.2019.2921572
Li, J. (2024). Area under the ROC Curve has the most consistent evaluation for binary classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0316019
Lin, M., Chen, Q., & Yan, S. (2014). Network In Network. arXiv preprint arXiv:1312.4400. https://arxiv.org/abs/1312.4400
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Ogunsanyaa, M., & Ibekwe, J. (2023). Grid Search Hyperparameter Tuning in Additive Manufacturing Processes. Manufacturing Letters, 36, 1031-1042. https://doi.org/10.1016/j.mfglet.2023.08.056
Parveen, R., & Kumar, R. (2023). Abnormal event detection model using an improved ResNet101 in context aware surveillance system. IET Cyber-Systems and Robotics. https://doi.org/10.1049/ccs2.12084
Rhee, V. H. (2019). A high-accuracy model average ensemble of convolutional neural networks for classification of cloud image patches on small datasets. ResearchGate. https://doi.org/10.3390/app9214500
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520).
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128, 336-359. https://doi.org/10.1007/s11263-019-01228-7
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/1409.1556
Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
Tragoudaras, A., & Siozios, P. (2022). Design Space Exploration of a Sparse MobileNetV2 Using High-Level Synthesis and Sparse Matrix Techniques on FPGAs. Sensors, 22(12), 4318. https://doi.org/10.3390/s22124318
Zhou, A., & Ma, Y. (2022). Multi-head attention-based two-stream EfficientNet for action recognition. Multimedia Systems, 28, 487-498. http://dx.doi.org/10.1007/s00530-2022-00961-3