DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Ya-ling Li¹,
Yong Feng ORCID: orcid.org/0000-0002-8820-8388¹,
Ming-liang Zhou¹,
Xian-cai Xiong²,
Yong-heng Wang³ &
…
Bao-hua Qiang⁴

1472 Accesses
Explore all metrics

Abstract

Unmanned aerial vehicles are increasingly popular due to their ease of operation, low noise, and portability. However, existing object detection methods perform poorly in detecting small targets in densely arranged, sparsely distributed aerial images. To tackle this issue, we enhanced the general object detection method YOLOv5 and introduced a multi-scale detection method called Detach-Merge Attention YOLO (DMA-YOLO). Specifically, we proposed a Detach-Merge Convolution (DMC) module and embedded it into the backbone network to maximize feature retention. Furthermore, we embedded the Bottleneck Attention Module (BAM) into the detection head to suppress interference from complex background information without significantly increasing computational complexity. To represent and process multi-scale features more effectively, we have integrated an extra detection head and enhanced the neck network into the Bi-directional Feature Pyramid Network (BiFPN) structure. Finally, we adopted the SCYLLA-IoU (SIoU) as a loss function to expedite the convergence rate of our model and enhance the precision of detection results. A series of experiments on the VisDrone2019 and UAVDT datasets have illustrated the effectiveness of DMA-YOLO. Code is available at https://github.com/Yaling-Li/DMA-YOLO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LightUAV-YOLO: a lightweight object detection model for unmanned aerial vehicle image

Article 30 October 2024

BiPA-Net: An Effective Tiny-Object Detection Model in Aerial Images

Small object detection based on YOLOv8 in UAV perspective

Article 18 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The data that support the findings of this study are openly available at http://aiskyeye.com and https://sites.google.com/site/daviddo0323/projects/uavdt.

Code availability

The original code is available at https://github.com/Yaling-Li/DMA-YOLO.

References

Cai, E., Luo, Z., Baireddy, S., Guo, J., Yang, C., Delp, E.J.: High-resolution uav image generation for sorghum panicle detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1676–1685 (2022)
Liu, C., Szirányi, T.: Road condition detection and emergency rescue recognition using on-board UAV in the wildness. Remote Sens. 14(17), 4355 (2022)
Article Google Scholar
Abdelraouf, A., Abdel-Aty, M., Wu, Y.: Using vision transformers for spatial-context-aware rain and road surface condition detection on freeways. IEEE Trans. Intell. Transp. Syst. 23(10), 18546–18556 (2022)
Article Google Scholar
Cai, E., Luo, Z., Baireddy, S., Guo, J., Yang, C., Delp, E.J.: High-resolution UAV image generation for sorghum panicle detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1676–1685 (2022)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37 (2016). Springer
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755 (2014). Springer
Wang, Y., Yang, Y., Zhao, X.: Object detection using clustering algorithm adaptive searching regions in aerial images. In: European Conference on Computer Vision, pp. 651–664 (2020). Springer
Deng, S., Li, S., Xie, K., Song, W., Liao, X., Hao, A., Qin, H.: A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Process. 30, 1556–1569 (2020)
Article MathSciNet Google Scholar
Xu, J., Li, Y., Wang, S.: Adazoom: Adaptive zoom network for multi-scale object detection in large scenes. arXiv preprint arXiv:2106.10409 (2021)
Huang, Y., Chen, J., Huang, D.: Ufpmp-det: toward accurate and efficient object detection on drone imagery. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1026–1033 (2022)
Zhu, X., Lyu, S., Wang, X., Zhao, Q.: Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021)
Benjumea, A., Teeti, I., Cuzzolin, F., Bradley, A.: Yolo-z: Improving small object detection in yolov5 for autonomous vehicles. arXiv preprint arXiv:2112.11798 (2021)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005). IEEE
Papageorgiou, C., Poggio, T.: A trainable system for object detection. Int. J. Comput. Vis. 38, 15–33 (2000)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Article Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Yang, F., Fan, H., Chu, P., Blasch, E., Ling, H.: Clustered object detection in aerial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8311–8320 (2019)
Li, C., Yang, T., Zhu, S., Chen, C., Guan, S.: Density map guided object detection in aerial images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 190–191 (2020)
Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in uav images for object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3258–3267 (2021)
Liu, Z., Gao, G., Sun, L., Fang, Z.: Hrdnet: High-resolution detection network for small objects. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021). IEEE
Wang, J., Yu, J., He, Z.: Arfp: a novel adaptive recursive feature pyramid for object detection in aerial images. Appl. Intell. 52(11), 12844–12859 (2022)
Article Google Scholar
Chalavadi, V., Jeripothula, P., Datla, R., Ch, S.B.: Msodanet: a network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recogn. 126, 108548 (2022)
Article Google Scholar
Xi, Y., Jia, W., Miao, Q., Liu, X., Fan, X., Li, H.: Fifonet: fine-grained target focusing network for object detection in UAV images. Remote Sens. 14(16), 3919 (2022)
Article Google Scholar
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Ru, L., Zhan, Y., Yu, B., Du, B.: Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16846–16855 (2022)
Li, H., Zhang, Z., Zhao, X., Wang, Y., Shen, Y., Pu, S., Mao, H.: Enhancing multi-modal features using local self-attention for 3d object detection. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part X, pp. 532–549 (2022). Springer
Jeong, J.-Y., Hong, Y.-G., Kim, D., Jeong, J.-W., Jung, Y., Kim, S.-H.: Classification of facial expression in-the-wild based on ensemble of multi-head cross attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2353–2358 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Yuan, Y., Huang, L., Guo, J., Zhang, C., Chen, X., Wang, J.: Ocnet: object context for semantic segmentation. Int. J. Comput. Vis. 129(8), 2375–2398 (2021)
Article Google Scholar
Rolet, P., Sebag, M., Teytaud, O.: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the ECML Conference, pp. 1255–1263 (2012)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 354–370 (2016). Springer
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516 (2019)
Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7036–7045 (2019)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Qiao, S., Chen, L.-C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10213–10224 (2021)
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., Ling, H.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9259–9266 (2019)
Zhang, Y.-M., Hsieh, J.-W., Lee, C.-C., Fan, K.-C.: Sfpn: synthetic fpn for object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 1316–1320 (2022). IEEE
Sunkara, R., Luo, T.: No more strided convolutions or pooling: a new cnn building block for low-resolution images and small objects. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part III, pp. 443–459 (2023). Springer
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Gevorgyan, Z.: Siou loss: more powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740 (2022)
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7380–7399 (2022)
Article Google Scholar
Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 370–386 (2018)

Download references

Acknowledgements

This research was supported by National Nature Science Foundation of China (No. 62262006), State Key Laboratory of Geo-Information Engineering and Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, CASM (No.2023-04-03), Technology Innovation and Application Development Key Project of Chongqing (No. cstc2021jscx-gksbX0058), Guangxi Key Laboratory of Trusted Software (No. kx202006), and Zhejiang Lab (No. 2021KE0AB01).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400044, China
Ya-ling Li, Yong Feng & Ming-liang Zhou
Chongqing Institute of Planning and Natural Resources Investigation and Monitoring, Chongqing, 401121, China
Xian-cai Xiong
8# of Zhejiang Lab, Yuhang District, Hangzhou, 311121, China
Yong-heng Wang
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang

Authors

Ya-ling Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Ming-liang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xian-cai Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yong-heng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bao-hua Qiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Feng.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Yl., Feng, Y., Zhou, Ml. et al. DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images. Vis Comput 40, 4505–4518 (2024). https://doi.org/10.1007/s00371-023-03095-3

Download citation

Accepted: 09 September 2023
Published: 28 September 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00371-023-03095-3

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LightUAV-YOLO: a lightweight object detection model for unmanned aerial vehicle image

BiPA-Net: An Effective Tiny-Object Detection Model in Aerial Images

Small object detection based on YOLOv8 in UAV perspective

Availability of data and materials

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LightUAV-YOLO: a lightweight object detection model for unmanned aerial vehicle image

BiPA-Net: An Effective Tiny-Object Detection Model in Aerial Images

Small object detection based on YOLOv8 in UAV perspective

Explore related subjects

Availability of data and materials

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation