BagViT: Bagged vision transformers for classifying chest X-ray images
Main Article Content
Abstract
In this paper, we propose a novel ensemble method, termed Bagged Vision Transformers (BagViT), to enhance the classification accuracy for Chest X-ray (CXR) images. BagViT constructs an ensemble of independent Vision Transformer (ViT) models, each of which is trained on a bootstrap sample (sampling with replacement) drawn from the original training dataset. To enhance model diversity, we use MixUp to generate synthetic training examples and introduce training randomness by varying the number of training epochs and selectively fine-tuning the top layers of each model. Final predictions are obtained through majority voting. Experimental results on a real-world dataset collected from Chau Doc Hospital (An Giang, Vietnam) demonstrate that BagViT significantly outperforms fine-tuned baselines such as VGG16, ResNet, DenseNet, ViT. Our BagViT achieves a classification accuracy of 72.25%, highlighting the effectiveness of ensemble learning with transformer architectures in scenarios with complex CXR images.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., … Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. https://www.tensorflow.org/
Adjei-Mensah, I., Zhang, X., Agyemang, I. O., Yussif, S. B., Baffour, A. A., Cobbinah, B. M., Sey, C., Fiasam, L. D., Chikwendu, I. A., & Arhin, J. R. (2024). Cov-Fed: Federated learning-based framework for COVID-19 diagnosis using chest X-ray scans. Engineering Applications of Artificial Intelligence, 128, 107448. https://doi.org/10.1016/j.engappai.2023.107448
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Callı, E., Sogancioglu, E., van Ginneken, B., van Leeuwen, K. G., & Murphy, K. (2021). Deep learning for chest X-ray analysis: A survey. Medical Image Analysis, 72, 102125. https://doi.org/10.1016/j.media.2021.102125
Chen, G.-Y., & Lin, C.-T. (2024). Multi-task supervised contrastive learning for chest X-ray diagnosis: A two-stage hierarchical classification framework for COVID-19 diagnosis. Applied Soft Computing, 155, 111478. https://doi.org/10.1016/j.asoc.2024.111478
Chicco, D. (2021). Siamese neural networks: An overview. In: Cartwright, H. (eds) Artificial Neural Networks. Methods in Molecular Biology, vol 2190. Humana, New York, NY (pp. 73-94). https://doi.org/10.1007/978-1-0716-0826-5_3
Chollet, F. (2015). Keras. https://keras.io/
Do, T.-N., Le, V.-T., & Doan, T.-H. (2022). SVM on top of deep networks for Covid-19 detection from chest X-ray images. Korea Institute of Information and Communication Engineering, 20(3), 219–225. https://doi.org/10.56977/jicce.2022.20.3.219
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929
Galán-Cuenca, A., Gallego, A. J., Saval-Calvo, M., & Pertusa, A. (2024). Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: An inter vs. intra domain study. Pattern Analysis and Applications, 27(3), 69. https://doi.org/10.1007/s10044-024-01285-w
Global Asthma Network. (2022). GAR 2022. http://globalasthmareport.org/gar2022.html
Hage Chehade, A., Abdallah, N., Marion, J.-M., Hatt, M., Oueidat, M., & Chauvet, P. (2024). A systematic review: Classification of lung diseases from chest X-ray images using deep learning algorithms. SN Computer Science, 5(4), 405. https://doi.org/10.1007/s42979-024-02751-2
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction, second edition. Springer Series in Statistics.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2261–2269). https://doi.org/10.1109/CVPR.2017.243
Itseez. (2015). Open source computer vision library. https://github.com/itseez/opencv
Koyyada, S. P., & Singh, T. P. (2024). A systematic survey of automatic detection of lung diseases from chest X-ray images: COVID-19, pneumonia, and tuberculosis. SN Computer Science, 5(2), 229.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library (Vol. 32). Curran Associates, Inc.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning with Python. Journal of Machine Learning Research, 12, 2825--2830.
Poloju, N., & Rajaram, A. (2024). Hybrid technique for lung disease classification based on machine learning and optimization using X-ray images. Multimedia Tools and Applications, 84(21), 23531–23553. https://doi.org/10.1007/s11042-024-19959-2
Shelke, A., Inamdar, M., Shah, V., Tiwari, A., Hussain, A., Chafekar, T., & Mehendale, N. (2021). Chest X-ray classification using deep learning for automated COVID-19 screening. SN Computer Science, 2(4), 300.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Truong, T.-D., Huynh, P.-H., Nguyen, V. H., & Do, T.-N. (2024). Enhancing the efficiency of lung disease classification based on multi-modal fusion model. Intelligent Systems and Data Science, 55–70. https://doi.org/10.1007/978-981-97-9616-8_5
Vapnik, V. (1995). The Nature of statistical learning theory. New York, NY: Springer-Verlag.
Verma, S., Devarajan, G. G., & Sharma, P. K. (2024). Comparative evaluation of feature extraction techniques in chest Xray image with different classification model. International Advanced Computing Conference, 197–209. https://doi.org/10.1007/978-3-031-56703-2_17
Vo, T.-T., & Do, T.-N. (2024). Improving chest X-ray image classification via integration of self-supervised learning and machine learning algorithms. Journal of Information and Communication Convergence Engineering, 22(2), 165–171. https://doi.org/10.56977/jicce.2024.22.2.165
World Health Organization. (2022). Pneumonia in children. https://www.who.int/news-room/fact-sheets/detail/pneumonia
Yadav, P., Menon, N., Ravi, V., & Vishvanathan, S. (2023). Lung-GANs: Unsupervised representation learning for lung disease classification using chest CT and X-ray images. IEEE Transactions on Engineering Management, 70(8), 2774–2786. https://doi.org/10.1109/TEM.2021.3103334
Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. In 6th International Conference on Learning Representations (ICLR 2018), Vancouver Convention Center, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/