An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images
<p>Farm scene mask image.</p> "> Figure 2
<p>Data processing flowchart.</p> "> Figure 3
<p>SparseInst network architecture. The SparseInst network architecture comprises three main components: the backbone, the encoder, and the IAM-based decoder. The backbone extracts multi-scale image features from the input image, specifically {stage2, stage3, stage4}. The encoder uses a pyramid pooling module (PPM) [<a href="#B30-sensors-24-05990" class="html-bibr">30</a>] to expand the receptive field and integrate the multi-scale features. The notation ‘4×’ or ‘2×’ indicates upsampling by a factor of 4 or 2, respectively. The IAM-based decoder is divided into two branches: the instance branch and the mask branch. The instance branch utilizes the ‘IAM’ module to predict instance activation maps (shown in the right column), which are used to extract instance features for recognition and mask generation. The mask branch provides mask features M, which are combined with the predicted kernels to produce segmentation masks.</p> "> Figure 4
<p>Improved SparseInst neck network PPM refers to the pyramid pooling module, MSA refers to the multi-scale attention module, 2× and 4× denote upsampling by a factor of 2 and 4, respectively, 3 × 3 denotes a convolution operation with a kernel size of 3, + denotes element-wise summation, and CA refers to the coordinate attention module.</p> "> Figure 5
<p>Channel attention mechanism. GAP stands for global average pooling, relu is the rectified linear unit activation function, σ represents the Sigmoid activation function, C denotes the number of channels, and × denotes element-wise multiplication.</p> "> Figure 6
<p>Dense connection diagram padding refers to the dilation rate of the convolution kernel, and C denotes feature concatenation.</p> "> Figure 7
<p>Multi-scale attention module (MSA). GAP stands for global average pooling, relu is the rectified linear unit activation function, <span class="html-italic">σ</span> represents the activation function, padding refers to the dilation rate coefficient, and c denotes concatenation. + is element-by-element addition. × is a matrix product.</p> "> Figure 8
<p>PADPN network architecture.</p> "> Figure 9
<p>Coordinate attention blocks X Y (avg pool) denote global pooling along the h and w directions, BatchNorm refers to batch normalization, non-linear represents the non-linear activation function, split denotes splitting along the channel dimension, and Sigmoid represents the activation function.</p> "> Figure 10
<p>Visualization results.</p> "> Figure 11
<p>High-resolution image visualization results.</p> "> Figure 12
<p>HRSID visualization results.</p> ">
Abstract
:1. Introduction
2. Related Works
3. Instance Dataset
4. Methodology
4.1. SparseInst Instance Segmentation
4.2. Multi-Scale Attention Mechanism
4.3. Feature Fusion Network
4.4. CA Attention Mechanism
4.5. Loss Function
5. Experimental Results and Analysis
5.1. Evaluation Indicators
- AP50: average precision when the IoU threshold is 0.5.
- AP75: average precision when the IoU threshold is 0.75.
- APs: average precision for small objects (area < 322).
- APm: average precision for medium objects (322 ≤ area < 962).
- APl: average precision for large objects (area ≥ 962).
5.2. Experimental Environment
5.3. Ablation Study
5.3.1. Comparison with Other Baselines
5.3.2. Ablation Study Results and Analysis
5.3.3. Visual Analysis of Experimental Outcomes
5.3.4. HRSID Validation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Urvina, R.P.; Guevara, C.L.; Vásconez, J.P.; Prado, A.J. An Integrated Route and Path Planning Strategy for Skid–Steer Mobile Robots in Assisted Harvesting Tasks with Terrain Traversability Constraints. Agriculture 2024, 14, 1206. [Google Scholar] [CrossRef]
- Wei, S.; Lu, Y. Adoption mode of agricultural machinery and food productivity: Evidence from China. Front. Sustain. Food Syst. 2024, 7, 1257918. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Du, X.; Huang, D.; Dai, L.; Du, X. Recognition of Plastic Film in Terrain-Fragmented Areas Based on Drone Visible Light Images. Agriculture 2024, 14, 736. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhu, L. A review on unmanned aerial vehicle remote sensing: Platforms, sensors, data processing methods, and applications. Drones 2023, 7, 398. [Google Scholar] [CrossRef]
- Su, H.; Wei, S.; Liu, S.; Liang, J.; Wang, C.; Shi, J.; Zhang, X. HQ-ISNet: High-quality instance segmentation for remote sensing imagery. Remote Sens. 2020, 12, 989. [Google Scholar] [CrossRef]
- Yin, C.; Tang, J.; Yuan, T.; Xu, Z.; Wang, Y. Bridging the gap between semantic segmentation and instance segmentation. IEEE Trans. Multimed. 2021, 24, 4183–4196. [Google Scholar] [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A.M. Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Comput. Appl. 2022, 34, 9511–9536. [Google Scholar] [CrossRef] [PubMed]
- Sang, Y.; Han, J. Improved DeepLabV3+ algorithm for scene segmentation. Electro-Opt. Control 2022, 29, 47–52. [Google Scholar]
- Xu, B.; Fan, J.; Chao, J.; Arsenijevic, N.; Werle, R.; Zhang, Z. Instance segmentation method for weed detection using UAV imagery in soybean fields. Comput. Electron. Agric. 2023, 211, 107994. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 20–26 October 2019. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 20–26 October 2019. [Google Scholar]
- Tian, Z.; Zhang, B.; Chen, H.; Shen, C. Instance and panoptic segmentation using conditional convolutions. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 669–680. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16; Springer International Publishing: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Cheng, T.; Wang, X.; Chen, S.; Zhang, W.; Zhang, Q. Sparse instance activation for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Zhou, X.; Wang, H.; Chen, C.; Nagy, G.; Jancso, T.; Huang, H. Detection of Growth Change of Young Forest Based on UAV RGB Images at Single-Tree Level. Forests 2023, 14, 141. [Google Scholar] [CrossRef]
- Braga, J.R.G.; Peripato, V.; Dalagnol, R.; Ferreira, M.P.; Tarabalka, Y.; Aragäo, L.E.O.C.; Velho, H.F.d.C.; Shiguemori, E.H.; Wagner, F.H. Tree Crown Delineation Algorithm Based on a Convolutional Neural Network. Remote Sens. 2020, 12, 1288. [Google Scholar] [CrossRef]
- Jian, L.; Pu, Z.; Zhu, L.; Yao, T.; Liang, X. SS R-CNN: Self-Supervised Learning Improving MaskR-CNN for Ship Detection in Remote Sensing Images. Remote Sens. 2022, 14, 4383. [Google Scholar] [CrossRef]
- Wang, Y.; Chao, W.; Hong, Z. Integrating HA-α with fully convolutional networks for fully PolSAR classification. In Proceedings of the 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP), Shanghai, China, 18–21 May 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Wu, Y.; Meng, F.; Qin, Y.; Qian, Y.; Xu, F.; Jia, L. UAV imagery based potential safety hazard evaluation for high-speed railroad using Real-time instance segmentation. Adv. Eng. Inform. 2023, 55, 101819. [Google Scholar] [CrossRef]
- Zou, R.; Liu, J.; Pan, H.; Tang, D.; Zhou, R. An Improved Instance Segmentation Method for Fast Assessment of Damaged Buildings Based on Post-Earthquake UAV Images. Sensors 2024, 24, 4371. [Google Scholar] [CrossRef]
- Guan, Z.; Miao, X.; Mu, Y.; Sun, Q.; Ye, Q.; Gao, D. Forest fire segmentation from Aerial Imagery data Using an improved instance segmentation model. Remote Sens. 2022, 14, 3159. [Google Scholar] [CrossRef]
- Liu, H.; Li, W.; Jia, W.; Sun, H.; Zhang, M.; Song, L.; Gui, Y. Clusterformer for pine tree disease identification based on UAV remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5609215. [Google Scholar] [CrossRef]
- Torralba, A.; Russell, B.C.; Yuen, J. Labelme: Online image annotation and applications. Proc. IEEE 2010, 98, 1467–1484. [Google Scholar] [CrossRef]
- Talebi, H.; Milanfar, P. Learning to resize images for computer vision tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13; Springer International Publishing: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.-S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Cheng, D.; Meng, G.; Cheng, G.; Pan, C. SeNet: Structured edge network for sea–land segmentation. IEEE Geosci. Remote Sens. Lett. 2016, 14, 247–251. [Google Scholar] [CrossRef]
- Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: Seattle, WA, USA, 2021. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. arXiv 2016, arXiv:1701.04128. [Google Scholar]
- Huang, G.; Liu, S.; Van der Maaten, L.; Weinberger, K.Q. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Liu, J.; Li, C.; Liang, F.; Lin, C.; Sun, M.; Yan, J.; Ouyang, W.; Xu, D. Inception convolution with efficient dilation search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Padilla, R.; Netto, S.L.; Da Silva, E.A. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Ishii, M.; Sato, A. Layer-wise weight decay for deep neural networks. In Image and Video Technology: 8th Pacific-Rim Symposium, PSIVT 2017, Wuhan, China, 20–24 November 2017; Revised Selected Papers 8; Springer International Publishing: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Wang, G.; Xiong, Z.; Liu, D.; Luo, C. Cascade mask generation framework for fast small object detection. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; IEEE: Piscataway, NJ, USA, 2018. [Google Scholar]
- Wang, X.; Zhang, R.; Kong, T.; Li, L.; Shen, C. Solov2: Dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 17721–17732. [Google Scholar]
Plotland | Forest_Belt | Outletpile | Arable Road | Wild_Grasland | River | Highway | Watercourses | Total |
---|---|---|---|---|---|---|---|---|
9465 | 2852 | 1689 | 2244 | 5493 | 491 | 978 | 968 | 24,180 |
Model | mAP/% | AP50/% | AP75/% | APs/% | APm/% | APl/% | GFLOPs | CV (mAP) | FPS |
---|---|---|---|---|---|---|---|---|---|
Mask R-CNN | 31.1 | 19.6 | 9.4 | 0 | 0 | 12.5 | 226 | 29.0 | 23.1 |
Cascade–Mask | 39.0 | 54.0 | 39.4 | 0.3 | 2.0 | 45.3 | 1749 | 33.2 | 12.7 |
SOLOv2 | 51.0 | 66.7 | 52.5 | 0.7 | 9.8 | 59.5 | 224 | 43.8 | 27.5 |
Condinst | 49.0 | 62.8 | 50.7 | 2.7 | 5.3 | 57.2 | 178 | 24.3 | 21.6 |
Improving SparseInst | 61.8 | 80.9 | 62.6 | 3.2 | 6.0 | 70.7 | 157 | 57.3 | 38.8 |
MSA | PAFPN | CV | mAP/% | AP50/% | AP75/% | Aps/% | APm/% | APl/% |
---|---|---|---|---|---|---|---|---|
× | × | × | 59.751 | 80.227 | 59.465 | 3.585 | 4.340 | 68.057 |
√ | × | × | 61.071 | 79.991 | 61.565 | 3.028 | 8.105 | 69.984 |
√ | √ | × | 61.050 | 79.443 | 62.421 | 3.340 | 4.735 | 69.882 |
√ | √ | √ | 61.854 | 80.935 | 62.632 | 3.264 | 6.008 | 70.747 |
Category | Original SparseInst | +MSA | +MSA + PAFPN | +MSA + PAFPN + CA |
---|---|---|---|---|
Plotland | 93.464 | 94.205 | 94.382 | 94.587 |
Forest_Belt | 62.705 | 65.743 | 64.831 | 66.861 |
Outletpile | 16.541 | 14.089 | 15.690 | 15.741 |
Highway | 82.038 | 82.097 | 85.235 | 82.872 |
Arable Road | 41.045 | 45.709 | 48.430 | 47.844 |
Wild_Grassland | 30.697 | 34.119 | 32.912 | 36.338 |
Watercourse | 52.532 | 55.038 | 50.105 | 53.692 |
River | 93.987 | 97.568 | 96.816 | 96.900 |
Model | mAP/% | AP50/% | AP75/% | APs/% | APm/% | APl/% | GFLOPs | FPS |
---|---|---|---|---|---|---|---|---|
SOLOv2 | 27.2 | 57.0 | 22.2 | 26.5 | 34.5 | 2.5 | 178 | 19.8 |
Condinst | 31.9 | 65.8 | 28.9 | 31.5 | 38.9 | 6.7 | 210 | 11.2 |
SparseInst | 31.4 | 55.8 | 35.1 | 31.1 | 37.3 | 7.6 | 118 | 26.0 |
Improving SparseInst | 34.5 | 61.0 | 38.7 | 34.2 | 40.4 | 6.3 | 139 | 22.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lv, F.; Zhang, T.; Zhao, Y.; Yao, Z.; Cao, X. An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images. Sensors 2024, 24, 5990. https://doi.org/10.3390/s24185990
Lv F, Zhang T, Zhao Y, Yao Z, Cao X. An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images. Sensors. 2024; 24(18):5990. https://doi.org/10.3390/s24185990
Chicago/Turabian StyleLv, Feixiang, Taihong Zhang, Yunjie Zhao, Zhixin Yao, and Xinyu Cao. 2024. "An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images" Sensors 24, no. 18: 5990. https://doi.org/10.3390/s24185990
APA StyleLv, F., Zhang, T., Zhao, Y., Yao, Z., & Cao, X. (2024). An Improved Instance Segmentation Method for Complex Elements of Farm UAV Aerial Survey Images. Sensors, 24(18), 5990. https://doi.org/10.3390/s24185990