A comparative deep learning approach for image classification and retrieval in scientific publications
Main Article Content
Abstract
This study presents a comparative analysis of state-of-the-art deep learning models–EfficientNetB0, MobileNetV2, and ResNet101–for image classification and content-based retrieval in scientific publications. A dataset of 4,303 images from 11 categories was curated from the Can Tho University Journal of Science and enhanced through tailored data augmentation strategies. The models were fine-tuned using transfer learning with hyperparameters optimised via Grid Search. Features were extracted using GlobalAveragePooling2D, and cosine similarity was combined with the FAISS library for efficient similarity search. Experimental results demonstrate a clear performance-efficiency trade-off: ResNet101 achieved the highest classification accuracy, while EfficientNetB0 and MobileNetV2 offered significant advantages in inference speed. A user-friendly web interface was developed to support practical image retrieval applications. These findings highlight the potential of deep learning in enhancing the management and integrity of scientific image resources.
Article Details
Conflict of Interest

© 2026 The authors. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License.
References
Ameen, Y. A., & Mohammed, D. (2023). Which data subset should be augmented for deep learning? A simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics.
https://doi.org/10.1186/s12859-023-05199-y
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305. https://dl.acm.org/doi/10.5555/2188385.2188395
Chang, P., Sun, H., Lee, J. & Kim, H. (2024). Extraction and evaluation of features of preterm patent ductus arteriosus in chest X-ray images using deep learning. Scientific Reports. https://doi.org/10.1038/s41598-024-79361-8
Falaschetti, L., Manoni, L., Leoa, D., Paub, D., Tomasellic, V., & Turchettia, C. (2022). A CNN-based image detector for plant leaf diseases classification. HardwareX, 12. https://doi.org/10.1016/j.ohx.2022.e00363
Gayadhankar, K., Patil, R., Chavan, P., Channe, P., & Patil, S. (2021). Image plagiarism detection using GAN - (Generative Adversarial Network). ITM Web of Conferences, 40, 03013.https://doi.org/10.1051/itmconf/20214003013
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-78). https://doi.org/10.1109/CVPR.2016.90
Jha, A., Uppal, M., & Pelz, J. B. (2018). Image forensics: Detecting duplication of scientific images with manipulation-invariant image similarity. arXiv preprint arXiv:1802.06515. https://arxiv.org/abs/1802.06515
Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535-547. https://doi.org/10.1109/TBDATA.2019.2921572
Li, J. (2024). Area under the ROC Curve has the most consistent evaluation for binary classification. PLOS ONE. https://doi.org/10.1371/journal.pone.0316019
Lin, M., Chen, Q., & Yan, S. (2014). Network In Network. arXiv preprint arXiv:1312.4400. https://arxiv.org/abs/1312.4400
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Ogunsanyaa, M., & Ibekwe, J. (2023). Grid Search Hyperparameter Tuning in Additive Manufacturing Processes. Manufacturing Letters, 36, 1031-1042. https://doi.org/10.1016/j.mfglet.2023.08.056
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520). https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00474
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2019). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision, 128, 336-359. https://doi.org/10.1007/s11263-019-01228-7
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/1409.1556
Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. https://arxiv.org/abs/1905.11946
Tragoudaras, A., & Siozios, P. (2022). Design Space Exploration of a Sparse MobileNetV2 Using High-Level Synthesis and Sparse Matrix Techniques on FPGAs. Sensors, 22(12), 4318. https://doi.org/10.3390/s22124318
Zhou, A., & Ma, Y. (2022). Multi-head attention-based two-stream EfficientNet for action recognition. Multimedia Systems, 28, 487-498. http://dx.doi.org/10.1007/s00530-2022-00961-3