The Object detection by the combination of generic roi extractor and dynamic R-CNN with side-aware boundary localization in aerial images
Main Article Content
Unmanned Aerial Vehicles (UAVs) have recently gained popularity due to their simplicity and effectiveness in traffic monitoring and potential for rapid delivery, and rescue support. Moreover, UAVs have been employed as a supporting machine in data collection for object detection tasks, in particular vehicle detection tasks in object recognition. Although vehicle identification is a tough problem, many of its challenges have recently been overcome by two-stage approaches such as Faster R-CNN, one of the most successful vehicle detectors. However, many critical problems still remain, such as partial occlusion, object truncation, object multi-angle rotation, etc. In this paper, we combine the Generic RoI Extractor (GroIE) method with Dynamic R-CNN and Side-aware Boundary Localization (SABL) for both testing and evaluation on a challenging dataset XDUAV. Overall, 4344 images in the XDUAV dataset, divided into 3 subsets: 3485 training images, 869 testing images and 869 validating images were used. These consisted of six object classes: 33841 “car”; 2690 “bus”; 2848 “truck”; 173 “tanker”; 6656 “motor” and 2024 “bicycle”. With the ResNet-101 backbone, our approach showed competitive results compared with the original GRoIE method, surpassed by 1.2% on mAP score and by about 2% on most classes AP scores, except for the class 'tanker'.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154-6162).
Farahnak-Ghazani, F., & Baghshah, M. S. (2016, May). Multi-label classification with feature-aware implicit encoding and generalized cross-entropy loss. In 2016 24th Iranian conference on electrical engineering (ICEE) (pp. 1574-1579). IEEE.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580-587).
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28. https://proceedings.neurips.cc/paper/2015
Rossi, L., Karimi, A., & Prati, A. (2021, January). A novel region of interest extraction layer for instance segmentation. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 2203-2209). IEEE.
Wang, J., Zhang, W., Cao, Y., Chen, K., Pang, J., Gong, T., ... & Lin, D. (2020, August). Side-aware boundary localization for more precise object detection. In European Conference on Computer Vision (pp. 403-419). Springer, Cham.
Wan, J., Zhang, B., Zhao, Y., Du, Y., & Tong, Z. (2021). VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2820-2829).
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500).
Xie, X., Yang, W., Cao, G., Yang, J., & Shi, G. (2018, September). The Collected XDUAV Dataset. Available online: https://share.weiyun.com/8rAu3kqr.
Zhang, H., Chang, H., Ma, B., Wang, N., & Chen, X. (2020, August). Dynamic R-CNN: Towards high quality object detection via dynamic training. In European conference on computer vision (pp. 260-275). Springer, Cham.
Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., ... & Smola, A. (2020). Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955.