The Object detection by the combination of generic roi extractor and dynamic R-CNN with side-aware boundary localization in aerial images

Nguyen Bao Tran; Tan Tai Pham; Cao Doanh Bui; Nguyen D. Vo; Khang Nguyen

doi:10.22144/ctu.jen.2023.006

Nguyen Bao Tran ^* , Tan Tai Pham , Cao Doanh Bui , Nguyen D. Vo and Khang Nguyen

* Corresponding author: Nguyen Bao Tran (email: 20520142@gm.uit.edu.vn)

Full Text: PDF

Received: 21 Apr 2022

Revised: 27 May 2022

Accepted: 08 Jun 2022

Published: 30 Mar 2023

DOI: 10.22144/ctu.jen.2023.006

Views

462

Downloads

326

How to Cite

Tran, N. B., Pham, T. T., Bui, C. D., Vo, N. D., & Nguyen, K. (2023). The Object detection by the combination of generic roi extractor and dynamic R-CNN with side-aware boundary localization in aerial images. CTU Journal of Innovation and Sustainable Development , 15(1), 49-57. https://doi.org/10.22144/ctu.jen.2023.006

Issue

Vol. 15 No. 1 (2023)

Section

Information Technology

Abstract

Unmanned Aerial Vehicles (UAVs) have recently gained popularity due to their simplicity and effectiveness in traffic monitoring and potential for rapid delivery, and rescue support. Moreover, UAVs have been employed as a supporting machine in data collection for object detection tasks, in particular vehicle detection tasks in object recognition. Although vehicle identification is a tough problem, many of its challenges have recently been overcome by two-stage approaches such as Faster R-CNN, one of the most successful vehicle detectors. However, many critical problems still remain, such as partial occlusion, object truncation, object multi-angle rotation, etc. In this paper, we combine the Generic RoI Extractor (GroIE) method with Dynamic R-CNN and Side-aware Boundary Localization (SABL) for both testing and evaluation on a challenging dataset XDUAV. Overall, 4344 images in the XDUAV dataset, divided into 3 subsets: 3485 training images, 869 testing images and 869 validating images were used. These consisted of six object classes: 33841 “car”; 2690 “bus”; 2848 “truck”; 173 “tanker”; 6656 “motor” and 2024 “bicycle”. With the ResNet-101 backbone, our approach showed competitive results compared with the original GRoIE method, surpassed by 1.2% on mAP score and by about 2% on most classes AP scores, except for the class 'tanker'.

Keywords: Vehicle detection, object detection, UAV datasets, XDUAV dataset

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154-6162).

Farahnak-Ghazani, F., & Baghshah, M. S. (2016, May). Multi-label classification with feature-aware implicit encoding and generalized cross-entropy loss. In 2016 24th Iranian conference on electrical engineering (ICEE) (pp. 1574-1579). IEEE.

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).

Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).

Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28. https://proceedings.neurips.cc/paper/2015

Rossi, L., Karimi, A., & Prati, A. (2021, January). A novel region of interest extraction layer for instance segmentation. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 2203-2209). IEEE.

Wang, J., Zhang, W., Cao, Y., Chen, K., Pang, J., Gong, T., ... & Lin, D. (2020, August). Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision (pp. 403-419). Springer, Cham.

Wan, J., Zhang, B., Zhao, Y., Du, Y., & Tong, Z. (2021). VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2820-2829).

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).

Xie, X., Yang, W., Cao, G., Yang, J., & Shi, G. (2018, September). The Collected XDUAV Dataset. Available online: https://share.weiyun.com/8rAu3kqr.

Zhang, H., Chang, H., Ma, B., Wang, N., & Chen, X. (2020, August). Dynamic R-CNN: Towards high quality object detection via dynamic training. In European conference on computer vision (pp. 260-275). Springer, Cham.

Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., ... & Smola, A. (2020). Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)