Abstract
High-resolution remote sensing images have the characteristics of complex background environment, clustering of objects, etc., the complex background makes the remote sensing image contain a large number of irrelevant ground objects with a high similarity or overlap, which makes the edge and texture of the objects not clear enough, and this leads to low recognition accuracy of ground objects such as airports, dams, and golf field, although the size of this object is large. Based on this problem, this paper proposes a remote sensing image object detection method based on the YOLOv5 network. By improving the backbone extraction network, the network structure can be deepened to get more information about large objects, and the detection effect can be improved by adding an attention mechanism and adding an output layer to enhance feature extraction and feature fusion. The pre-training weight is obtained by transfer learning and used as the training weight of the improved YOLOv5 to speed up the network convergence. The experiment is carried out on the DIOR dataset, the results show that the improved YOLOv5 network can significantly improve the accuracy of large object recognition compared with the YOLO series network and the EfficientDet model on DIOR dataset, and the mAP of the improved YOLOv5 network is 80.5%, which is 2% higher than the original YOLOv5 network.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The RSOD datasets generated during and/or analyzed during the current study are available in GitHub—RSIA-LIESMARS-WHU/RSOD-Dataset—an open dataset for object detection in remote sensing images. The DIOR datasets and NWPU VHR-10 dataset during and/or analyzed during the current study are not publicly available due to the link failure but are available from the corresponding author on reasonable request.
References
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv, https://arxiv.org/abs/2004.10934.
Chen J, Sun J, Li Y, Hou C (2021) Object detection in remote sensing images based on deep transfer learning. Multimed Tools Appl 81:12093–12109. https://doi.org/10.1007/s11042-021-10833-z
Cheng G, Han J, Zhou P, Lei G (2014) Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J Photogram Remote Sens 98:119–132
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer society conference on computer vision and pattern recognition, pp.886–893. https://doi.org/10.1109/CVPR.2005.177.
Elfwing S, Uchibe E, Doya K (2017) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11. https://doi.org/10.1016/j.neunet.2017.12.012
Girshick, R (2015) Fast R-CNN. arXiv e-prints. arXiv:1504.08083https://ui.adsabs.harvard.edu/abs/2015arXiv150408083G.
Girshick R, Donahue J, Darrell T (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp. 580–587. https://doi.org/10.1109/CVPR.2014.81.
Guo M, Shu S, Ma S, Wang L (2021) Using high-resolution remote sensing images to explore the spatial relationship between landscape patterns and ecosystem service values in regions of urbanization. Environ Sci Pollut Res Int 28(40):56139–56151. https://doi.org/10.1007/s11356-021-14596-w
Han Q, Yin Q, Zheng X, Chen Z (2021) Remote sensing image building detection method based on Mask R-CNN. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00322-z
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23
Herbert B, Andreas E, Tinne T, Luc VG (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359. https://doi.org/10.1016/j.cviu.2007.09.014
Jie H, Li S, Gang S, Albanie S (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Kaiming H, Georgia G, Piotr D, Ross G (2017) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/ICCV.2017.322
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539
Li K, Wan G, Cheng G et al (2020) Object detection in optical remote sensing images: a surveyand a new benchmark. ISPRS J Photogram Remote Sens 159:296–307. https://doi.org/10.1016/j.isprsjprs.2019.11.023
Li Y, Mao H, Liu R, Pei X, Shang R (2021) A lightweight keypoint-based oriented object detection of remote sensing images. Remote Sens 13(13):2459. https://doi.org/10.3390/rs13132459
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. IEEE Comput Soc. https://doi.org/10.1109/CVPR.2017.106
Liu F, Zhu J, Wang W (2021) Surface-to-air missile sites detection agent with remote sensing images. Sci China Inf Sci. https://doi.org/10.1007/s11432-019-9920-2
Liu S, Kong W, Chen X, Xu M, Yasir M, Zhao L, Li J (2022) Multi-scale ship detection algorithm based on a lightweight neural network for spaceborne SAR images. Remote Sens 14(5):1149
Long Y, Gong Y, Xiao Z, Liu Q (2017) Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans Geosci Remote Sens 55(5):2486–2498. https://doi.org/10.1109/TGRS.2016.2645610
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:90–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Lu Q (2021) An improved object detection algorithm based on SSD in remote sensing image. Comput Sci Appl 11(05):1579–1587. https://doi.org/10.12677/CSA.2021.115163
Melgani F, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42(8):1778–1790. https://doi.org/10.1109/TGRS.2004.831865
Redmon J (2018) YOLOv3: An Incremental Improvement. Arxiv. https://arxiv.org/abs/180-4.02767. Accessed 8 April 2018.
Redmon J, Divvala S, Girshick R (2016) You Only look once: unified, real-time object detection. In: IEEE International conference on computer vision (ICCV), pp. 779–788. https://doi.org/10.1109/CVPR.2016.91.
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. IEEE Conf Comput Vision Pattern Recogn. https://doi.org/10.1109/CVPR.2017.690
Ren S, He K, Girshick R (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp. 658–666 https://doi.org/10.1109/CVPR.2019.00075.
Soui M, Mansouri N, Alhamad R, Kessentini M, Ghedira K (2021) NSGA-II as feature selection technique and AdaBoost classifier for COVID-19 prediction using patient’s symptoms. Nonlinear Dyn 106:1453–1475. https://doi.org/10.1007/s11071-021-06504-1
Tan M, Pang R, Le Q (2020) EfficientDet: scalable and efficient object detection. In: IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079.
Ultralytics (2020) yolov5. Github. https://github.com/ultralytics/yolov5. Accessed 18 May 2020.
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57:137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Wang C, Liao HYM, Wu Y et al (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In: IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp. 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203.
Wei Z, Liu Y (2021) Construction of super-resolution model of remote sensing image based on deep convolutional neural network. Comput Commun 178:191–200. https://doi.org/10.1016/j.comcom.2021.06.022
Wei L, Anguelov D et al (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision. https://doi.org/10.1007/978-3-319-46448-0_2.
Wu C, Zhang F, Xia J, Xu Y, Li G, Xie J, Du Z, Liu R (2021) Building damage detection using U-Net with attention mechanism from pre- and post-disaster remote sensing datasets. Remote Sens 13:905. https://doi.org/10.3390/rs13050905
Xu D, Wu Y (2020) Improved YOLO-V3 with DenseNet for multi-scale remote sensing target detection. Sensors 20(15):4276. https://doi.org/10.3390/s20154276
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) UnitBox: an advanced object detection network. ACM. https://doi.org/10.1145/2964284.2967274
Zhang Y, Ning G, Chen S, Yang Y (2021) Impact of rapid urban sprawl on the local meteorological observational environment based on remote sensing images and GIS technology. Remote Sens 13:2624
Zhou Q (2021) Climatic data analysis and computer data simulation of inland cities based on cloud computing and remote sensing images. Arab J Geosci 14:1010. https://doi.org/10.1007/s12517-021-07275-0
Zhou L, Yan H, Shan Y, Zheng C, Yang L, Zuo X, Qiao B, Li Y (2021) Aircraft detection for remote sensing images based on deep convolutional neural networks. J Electr Comput Eng 2021:4685644. https://doi.org/10.1155/2021/4685644
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. CoRR. https://arxiv.org/abs/1904.07850.
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by JX. The first draft of the manuscript was written by JX, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Communicated by Shah Nazir.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xue, J., Zheng, Y., Dong-Ye, C. et al. Improved YOLOv5 network method for remote sensing image-based ground objects recognition. Soft Comput 26, 10879–10889 (2022). https://doi.org/10.1007/s00500-022-07106-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07106-8