An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images
<p>Proposed network architecture.</p> "> Figure 2
<p>Flowchart of image stitching.</p> "> Figure 3
<p>Architecture of EfficientDet-D0.</p> "> Figure 4
<p>Basic block of EfficientNet.</p> "> Figure 5
<p>AAFM architecture.</p> "> Figure 6
<p>IoU change (black and green denote predicted box and ground truth).</p> "> Figure 7
<p>Data samples in DIOR.</p> "> Figure 8
<p>Comparison of the instance objects number.</p> "> Figure 9
<p>Typical object detection results (<b>left</b>) ground truth; (<b>right</b>) bounding box.</p> "> Figure 10
<p>Visualization of feature maps. (<b>a</b>) original image. (<b>b</b>) model B. (<b>c</b>) model A. (<b>d</b>) model A + model B in parallel. (<b>e</b>) model A + model B with adaptive factors.</p> "> Figure 11
<p>mAP comparison of the different loss function.</p> "> Figure 12
<p>FLOPs comparison of different models.</p> "> Figure 13
<p>Visual detection results comparison of different loss functions.</p> ">
Abstract
:1. Introduction
- (1)
- We propose an adaptive attention fusion mechanism. AAFM is constructed by channel attention and the introduced spatial attention, combining in a parallel manner. Specifically, the learnable fusion factors are adopted in AAFM for fusing features adaptively both intra-module and inter-module. AAFM can be widely incorporated into the existing detectors to boost the representation power.
- (2)
- We design an AAFM-Enhanced EfficientDet network of object detection, which employs several advanced techniques, including the stitcher scheme, the AAFM-integrated architecture, and the CIoU loss. These techniques are applied together to improve the accuracy and robustness of the network.
2. Related Work
2.1. One-Stage Object Detection Network of Remote Sensing Imagery
2.2. Attention Mechanism
3. Materials and Methods
3.1. Image Stitching Method
3.2. AAFM-Enhanced EfficientDet
3.2.1. EfficientDet
3.2.2. AAFM
3.3. CIoU Loss
4. Dataset
5. Results and Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Janakiramaiah, B.; Kalyani, G.; Karuna, A.; Prasad, L.V.N.; Krishna, M. Military object detection in defense using multi-level capsule networks. Soft Comput. 2021, 1–15. [Google Scholar] [CrossRef]
- Hu, Q.; Paisitkriangkrai, S.; Shen, C.; Hengel, A.V.D.; Porikli, F. Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework. IEEE Trans. Intell. Transp. Syst. 2015, 17, 1002–1014. [Google Scholar] [CrossRef] [Green Version]
- Pi, Y.; Nath, N.D.; Behzadan, A.H. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv. Eng. Inform. 2020, 43, 101009. [Google Scholar] [CrossRef]
- Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, CA, USA, 20–25 June 2005.
- Aytekin, Ö.; Zongur, U.; Halici, U. Texture-Based Airport Runway Detection. IEEE Geosci. Remote. Sens. Lett. 2012, 10, 471–475. [Google Scholar] [CrossRef]
- Weber, J.; Lefevre, S. A Multivariate Hit-or-Miss Transform for Conjoint Spatial and Spectral Template Matching; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef] [Green Version]
- Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and Robust Matching for Multimodal Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef] [Green Version]
- Zhou, L.; Ye, Y.; Tang, T.; Nan, K.; Qin, Y. Robust Matching for SAR and Optical Images Using Multiscale Convolutional Gradient Features. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
- Shen, H.; Jiang, M.; Li, J.; Yuan, Q.; Wei, Y.; Zhang, L. Spatial–Spectral Fusion by Combining Deep Learning and Variational Model. In IEEE Transactions on Geoscience and Remote Sensing; Institute of Electrical and Electronics Engineers (IEEE): Manhattan, NY, USA, 2019. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Seg-mentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R. Fast R-CNN. arXiv 2015, arXiv:1504.08083. [Google Scholar]
- Yun, R.; Changren, Z.; Shunping, X. Small Object Detection in Optical Remote Sensing Images via Modified Faster R-CNN. Appl. Sci. 2018, 8, 813. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-J.M. YOLOv4 Optimal Speed and Accuracy of Object Detection. In Proceedings of the Computer Vision and Pattern Recognition. arxiv 2020, arXiv:2004.10934. [Google Scholar]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef] [PubMed]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. arXiv 2018, arXiv:1808.01244. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Tan, M.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and Efficient Object Detection. arXiv 2019, arXiv:1911.09070. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over union: A metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Zhang, P.; Li, Z.; Li, Y.; Zhang, X.; Meng, G.; Xiang, S.; Sun, J.; Jia, J. Stitcher: Feedback-driven Data Provider for Object Detection. arXiv 2020, arXiv:2004.12432. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv 2020, arXiv:2004.12432. [Google Scholar]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ju, M.; Luo, H.; Wang, Z. An improved YOLO V3 for small vehicles detection in aerial images. In Proceedings of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 24–26 December 2020. [Google Scholar]
- Wang, G.; Zhuang, Y.; Wang, Z.; Chen, H.; Shi, H.; Chen, L. Spatial Enhanced-SSD For Multiclass Object Detection in Remote Sensing Images. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 318–321. [Google Scholar]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE Transactions on Pattern Analysis & Machine Intelligence, Venice, Italy; 2017; pp. 2999–3007. [Google Scholar]
- Khoroshevsky, F.; Khoroshevsky, S.; Bar-Hillel, A. Parts-per-Object Count in Agricultural Images: Solving Phenotyping Problems via a Single Deep Neural Network. Remote. Sens. 2021, 13, 2496. [Google Scholar] [CrossRef]
- He, Y.; Yang, Y.; Bai, X.; Feng, S.; Liang, B.; Dai, W. Research on Mount Wilson Magnetic Classification Based on Deep Learning. Adv. Astron. 2021, 2021, 5529383. [Google Scholar] [CrossRef]
- Liu, Y.; Yang, J.; Cui, W. Simple, Fast, Accurate Object Detection based on Anchor-Free Method for High Resolution Remote Sensing Images. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020. [Google Scholar]
- Lin, Z.; Guo, W. Cotton Stand Counting from Unmanned Aerial System Imagery Using MobileNet and CenterNet Deep Learning Models. Remote Sens. 2021, 13, 2822. [Google Scholar] [CrossRef]
- Qin, H.; Li, Y.; Lei, J.; Xie, W.; Wang, Z. A Specially Optimized One-Stage Network for Object Detection in Remote Sensing Images. IEEE Geosci. Remote. Sens. Lett. 2021, 18, 401–405. [Google Scholar] [CrossRef]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent Models of Visual Attention. Adv. Neural Inf. Processing Syst. 2014, 2, 2204–2212. [Google Scholar]
- Chorowski, J.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. arXiv 2015, arXiv:1506.07503. [Google Scholar]
- Max, J.; Karen, S.; Andrew, Z.; Koray, K. Spatial Transformer Network. Adv. Neural Inf. Processing Syst. 2015, 28, 2017–2025. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 2018 European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef] [Green Version]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv 2019, arXiv:1807.11626. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef] [Green Version]
- Luong, M.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
Object Scales | Small | Medium | Large |
---|---|---|---|
number account (%) | 68.3 | 12 | 19.7 |
images included (%) | 46.3 | 29.3 | 79.6 |
Object Scales | Small | Medium | Large |
---|---|---|---|
number account in stitched images (%) | 11.5 | 9.6 | 78.9 |
images included in stitched images (%) | 96.5 | 78.5 | 97.2 |
Class | SSD | YOLOv4 | EfficientDet | Our Model |
---|---|---|---|---|
airplane | 0.668 | 0.682 | 0.688 | 0.716 |
airport | 0.687 | 0.702 | 0.742 | 0.751 |
baseball field | 0.704 | 0.759 | 0.803 | 0.826 |
basketball court | 0.763 | 0.806 | 0.778 | 0.81 |
bridge | 0.334 | 0.414 | 0.403 | 0.459 |
chimney | 0.668 | 0.713 | 0.683 | 0.704 |
dam | 0.565 | 0.603 | 0.643 | 0.69 |
Expressway-Service area | 0.648 | 0.776 | 0.816 | 0.832 |
Expressway-toll station | 0.574 | 0.663 | 0.671 | 0.682 |
golf field | 0.662 | 0.755 | 0.775 | 0.784 |
ground track field | 0.675 | 0.755 | 0.795 | 0.808 |
harbor | 0.395 | 0.472 | 0.468 | 0.483 |
overpass | 0.495 | 0.56 | 0.576 | 0.598 |
ship | 0.697 | 0.734 | 0.746 | 0.768 |
stadium | 0.66 | 0.696 | 0.807 | 0.81 |
storage tank | 0.496 | 0.561 | 0.532 | 0.566 |
tennis court | 0.771 | 0.833 | 0.84 | 0.856 |
train station | 0.538 | 0.583 | 0.579 | 0.605 |
vehicle | 0.375 | 0.443 | 0.43 | 0.456 |
windmill | 0.674 | 0.757 | 0.759 | 0.765 |
mAP | 0.602 | 0.673 | 0.677 | 0.698 |
Algorithm | Original Dataset | Augmented Dataset | mAP Improvement |
---|---|---|---|
SSD | 0.58 | 0.602 | 2.20% |
YOLOv4 | 0.667 | 0.673 | 0.60% |
EfficientDet | 0.663 | 0.677 | 1.40% |
Our method | 0.687 | 0.698 | 1.10% |
Scales | Small (mAP) | Medium (mAP) | Large (mAP) |
---|---|---|---|
original dataset | 0.478 | 0.707 | 0.824 |
augmented dataset | 0.483 | 0.716 | 0.841 |
improvement | 0.50% | 0.90% | 1.70% |
Fusion Methods | Accuracy (mAP) | |
---|---|---|
module A the channel attention | MaxPool + AvgPool | 68.34 |
W1·MaxPool + W2·AvgPool | 68.97 | |
module B the spatial attention | MaxPool + AvgPool | 68.1 |
MaxPool + AvgPool + conv1 × 1 | 68.3 | |
P1·MaxPool + P2·AvgPool + P3·conv1 × 1 + (k = 3) | 68.45 | |
P1·MaxPool + P2·AvgPool + P3·conv1 × 1 + (k = 7) | 68.62 | |
module A + module B | module A + module B (in parallel) | 69.25 |
m1·module A + m2·module B (in sequential) | 69.67 | |
m1·module A + m2·module B (in parallel) | 69.83 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, Y.; Ren, X.; Zhu, B.; Tang, T.; Tan, X.; Gui, Y.; Yao, Q. An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images. Remote Sens. 2022, 14, 516. https://doi.org/10.3390/rs14030516
Ye Y, Ren X, Zhu B, Tang T, Tan X, Gui Y, Yao Q. An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images. Remote Sensing. 2022; 14(3):516. https://doi.org/10.3390/rs14030516
Chicago/Turabian StyleYe, Yuanxin, Xiaoyue Ren, Bai Zhu, Tengfeng Tang, Xin Tan, Yang Gui, and Qin Yao. 2022. "An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images" Remote Sensing 14, no. 3: 516. https://doi.org/10.3390/rs14030516
APA StyleYe, Y., Ren, X., Zhu, B., Tang, T., Tan, X., Gui, Y., & Yao, Q. (2022). An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images. Remote Sensing, 14(3), 516. https://doi.org/10.3390/rs14030516