Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks
"> Figure 1
<p>Illustration of EDSR architecture with four residual blocks, with layers for convolution (blue), normalization (green), ReLU activation (red), and pixel rearrangement (orange). <math display="inline"><semantics> <msub> <mi>x</mi> <mrow> <mi>L</mi> <mi>R</mi> </mrow> </msub> </semantics></math> is the input low-resolution image, and <math display="inline"><semantics> <msub> <mi>x</mi> <mrow> <mi>S</mi> <mi>R</mi> </mrow> </msub> </semantics></math> is the output super-resolved image.</p> "> Figure 2
<p>Illustration of the results provided by EDSR: high-resolution (HR) image at 12.5 cm/pixel and its artificially reduced (LR) version at 50 cm/pixel (enlarged for better visualization), SR results (12.5 cm/pixel) were provided by bicubic interpolation and the EDSR technique with a factor 4 (EDSR-4).</p> "> Figure 3
<p><b>Top</b>: super-resolution by a factor of 8: HR image at 12.5 cm/pixel and its artificially low-resolution version (LR) at 1 m/pixel (enlarged for better visualization), SR results (12.5 cm/pixel) provided by bicubic interpolation and by EDSR-8. <b>Bottom</b>: zoom on a vehicle of size <math display="inline"><semantics> <mrow> <mn>40</mn> <mo>×</mo> <mn>20</mn> </mrow> </semantics></math> pixels.</p> "> Figure 4
<p>Comparison of super-resolved images generated by EDSR-8 with 16 residual blocks of size <math display="inline"><semantics> <mrow> <mn>64</mn> <mo>×</mo> <mn>64</mn> </mrow> </semantics></math> (<b>left</b>) and the improved version including 32 blocks of size <math display="inline"><semantics> <mrow> <mn>96</mn> <mo>×</mo> <mn>96</mn> </mrow> </semantics></math> (<b>right</b>).</p> "> Figure 5
<p>SR-WGAN architecture with the generator (super-resolution network) at the top and the discriminator at the bottom. The layers are: convolution (blue), reduction <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>×</mo> </mrow> </semantics></math>1 (light blue), ReLU activation (red), normalization (green), and rearrangement (orange).</p> "> Figure 6
<p>Architecture of a cycle network: super-resolution generator and its discriminator (<b>top</b>), and low-resolution generator and its discriminator (<b>bottom</b>).</p> "> Figure 7
<p>SR-WGAN architecture with the addition of YOLOv3 as an auxiliary network (i.e., SR-WGAN-Yolo).</p> "> Figure 8
<p>Visualization of the super-resolution results (factor 8) provided by the EDSR, SR-WGAN, SR-CWGAN and SR-CWGAN-Yolo compared to the LR version (1 m/pixel) and HR version (12.5 cm/pixel).</p> "> Figure 9
<p>Examples of detection results.</p> "> Figure 10
<p>Comparison of detection performance (with YOLOv3 as detector and IoU 0.05) provided by different SR networks.</p> "> Figure 11
<p>Illustration of a residual channel attention block in RCAN. Color codes: convolution (blue), global pooling (gray), ReLU activation (red).</p> "> Figure 12
<p>Example of object reconstruction with super-resolution (factor of 4) of an xView satellite image. Left, LR 30-cm xView image zoomed by factor 4; Right: super-resolved image by SR-WGAN learned from HR aerial images of the Potsdam images.</p> "> Figure 13
<p>Precision–recall curves of detection performance using Faster-RCNN detector on both standard and super-resolved images using SR-WGAN (xView dataset) with different levels of IoU.</p> "> Figure 14
<p>Detection results from original xView images (<b>top</b>) and from the super-resolved images by SR-WGAN (<b>bottom</b>). Color codes: green: True Positive, red: False Positive, blue: False Negative.</p> ">
Abstract
:1. Introduction
2. Residual Block-Based Super-Resolution
3. Network Improvements
3.1. Adversarial Network
3.2. Cycle Network
3.3. Integration of an Auxiliary Network
- of generator;
- of discriminator (Wasserstein GAN) as in Equation (1);
- of the YOLOv3 detector, which seeks to minimize the difference between the detected bounding boxes and the ground truth bounding boxes:
4. Results and Discussion
4.1. Evaluation of the Improvements in Detection Performance
4.2. Performance with Another Baseline Super-Resolution Network
4.3. Evaluation with Various Object Detectors
4.4. Transfer Learning to Satellite Images
5. Conclusions and Perspectives
Author Contributions
Funding
Conflicts of Interest
References
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing System, Montreal, QC, Cananda, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21—26 July 2017; pp. 2117–2125. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar]
- Cao, C.; Wang, B.; Zhang, W.; Zeng, X.; Yan, X.; Feng, Z.; Liu, Y.; Wu, Z. An improved faster R-CNN for small object detection. IEEE Access 2019, 7, 106838–106846. [Google Scholar] [CrossRef]
- Cao, G.; Xie, X.; Yang, W.; Liao, Q.; Shi, G.; Wu, J. Feature-fused SSD: Fast detection for small objects. In Proceedings of the Ninth International Conference on Graphic and Image Processing (ICGIP 2017), Qingdao, China, 14–16 October 2018; Volume 10615, p. 106151E. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4203–4212. [Google Scholar]
- Guan, L.; Wu, Y.; Zhao, J. Scan: Semantic context aware network for accurate small object detection. Int. J. Comput. Intell. Syst. 2018, 11, 951–961. [Google Scholar] [CrossRef] [Green Version]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 103910. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, S.; Thachan, S.; Chen, J.; Qian, Y. Deconv R-CNN for small object detection on remote sensing images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 2483–2486. [Google Scholar]
- Yan, J.; Wang, H.; Yan, M.; Diao, W.; Sun, X.; Li, H. IoU-adaptive deformable R-CNN: Make full use of IoU for multi-class object detection in remote sensing imagery. Remote Sens. 2019, 11, 286. [Google Scholar] [CrossRef] [Green Version]
- Liu, M.; Wang, X.; Zhou, A.; Fu, X.; Ma, Y.; Piao, C. UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective. Sensors 2020, 20, 2238. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pham, M.T.; Courtrai, L.; Friguet, C.; Lefèvre, S.; Baussard, A. YOLO-Fine: One-Stage Detector of Small Objects Under Various Backgrounds in Remote Sensing Images. Remote Sens. 2020, 12, 2501. [Google Scholar] [CrossRef]
- Froidevaux, A.; Julier, A.; Lifschitz, A.; Pham, M.T.; Dambreville, R.; Lefèvre, S.; Lassalle, P. Vehicle detection and counting from VHR satellite images: Efforts and open issues. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Waikoloa, HI, USA, 19–24 July 2020. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, L.; Wang, Z.; Luo, Y.; Xiong, Z. Separability and compactness network for image recognition and superresolution. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3275–3286. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Kwon Lee, J.; Mu Lee, K. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR-WS), Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11065–11074. [Google Scholar]
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
- Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.H.; Liao, Q. Deep learning for single image super-resolution: A brief review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef] [Green Version]
- Anwar, S.; Khan, S.; Barnes, N. A deep journey into super-resolution: A survey. arXiv 2019, arXiv:1904.07523. [Google Scholar]
- Ferdous, S.N.; Mostofa, M.; Nasrabadi, N.M. Super resolution-assisted deep aerial vehicle detection. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, Baltimore, MD, USA, 15–17 April 2019; Volume 11006, p. 1100617. [Google Scholar]
- Shermeyer, J.; Van Etten, A. The effects of super-resolution on object detection performance in satellite imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR-WS), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Rabbi, J.; Ray, N.; Schubert, M.; Chowdhury, S.; Chao, D. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sens. 2020, 12, 1432. [Google Scholar] [CrossRef]
- Rottensteiner, F.; Sohn, G.; Jung, J.; Gerke, M.; Baillard, C.; Benitez, S.; Breitkopf, U. The ISPRS benchmark on urban object classification and 3D building reconstruction. In Proceedings of the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences I-3 (2012), Nr. 1, Melbourne, Australia, 25 August–1 September 2012; pp. 293–298. [Google Scholar]
- Lam, D.; Kuzma, R.; McGee, K.; Dooley, S.; Laielli, M.; Klaric, M.; Bulatov, Y.; McCord, B. xView: Objects in context in overhead imagery. arXiv 2018, arXiv:1802.07856. [Google Scholar]
- Lei, S.; Shi, Z.; Zou, Z. Coupled Adversarial Training for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3633–3643. [Google Scholar] [CrossRef]
- Audebert, N.; Le Saux, B.; Lefèvre, S. Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images. Remote Sens. 2017, 9, 368. [Google Scholar] [CrossRef] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5767–5777. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Yann, H. Pytorch-Retinanet. 2018. Available online: https://github.com/yhenon/pytorch-retinanet (accessed on 16 September 2020).
- zylo117. Yet-Another-EfficientDet-Pytorch. 2020. Available online: https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch. (accessed on 16 September 2020).
Method | TP | FP | F1-Score | mAP |
---|---|---|---|---|
HR | 707 | 32 | 0.90 | 93.57 |
Bicubic | 268 | 13 | 0.48 | 71.82 |
EDSR-4 | 648 | 27 | 0.86 | 90.73 |
Method | TP | FP | F1-Score | mAP |
---|---|---|---|---|
HR | 707 | 32 | 0.90 | 93.57 |
Bicubic | 34 | 9 | 0.02 | 17.40 |
EDSR-8 | 14 | 1 | 0.03 | 18.10 |
Version | TP | FP | F1-Score | mAP |
---|---|---|---|---|
HR | 707 | 32 | 0.90 | 93.57 |
EDSR-8 (16 ) | 14 | 1 | 0.03 | 18.10 |
EDSR-8 (32 ) | 116 | 8 | 0.25 | 42.32 |
Cycle | TP | FP | F1-Score |
---|---|---|---|
1 | 116 | 8 | 0.25 |
2 | 53 | 12 | 0.12 |
3 | 85 | 15 | 0.18 |
4 | 105 | 8 | 0.22 |
5 | 46 | 10 | 0.10 |
Mean | 81 | 10.6 | 0.17 |
Standard deviation | ±27 | ±2.65 | ±0.06 |
Method | IoU = 0.05 | IoU = 0.25 | IoU = 0.5 |
---|---|---|---|
HR | 96.36 | 93.57 | 82.14 |
Bicubic | 22.80 | 17.40 | 09.53 |
EDSR | 47.85 | 42.32 | 34.43 |
SR-WGAN | 63.76 | 59.54 | 44.67 |
SR-CWGAN | 66.72 | 62.82 | 47.18 |
SR-CWGAN-Yolo | 76.74 | 71.31 | 55.05 |
Method | IoU = 0.05 | IoU = 0.25 | IoU = 0.5 |
---|---|---|---|
HR | 96.36 | 93.57 | 82.14 |
RCAN | 44.58 | 40.21 | 33.67 |
RCAN-WGAN | 64.67 | 60.76 | 47.73 |
RCAN-CWGAN | 67.42 | 63.28 | 47.80 |
RCAN-CWGAN-Yolo | 76.89 | 72.01 | 55.76 |
Method | HR | Bicubic | EDSR-8 | SR-CWGAN-Yolo |
---|---|---|---|---|
YOLOv3 | 96.36 | 22.80 | 47.85 | 76.74 |
Faster R-CNN | 96.55 | 38.34 | 68.95 | 86.04 |
RetinaNet-50 | 91.37 | 15.64 | 27.32 | 63.03 |
EfficientDet(D0) | 96.90 | 45.08 | 67.77 | 85.70 |
Method | IoU = 0.10 | 0.20 | 0.30 | 0.40 | 0.50 | 0.60 | 0.70 | 0.80 | 0.90 |
---|---|---|---|---|---|---|---|---|---|
Faster R-CNN | 62.05 | 60.82 | 55.64 | 48.90 | 32.78 | 21.84 | 7.54 | 1.29 | 0.31 |
SR + Faster R-CNN | 83.36 | 83.14 | 82.85 | 78.41 | 75.26 | 54.55 | 27.44 | 10.39 | 0.65 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Courtrai, L.; Pham, M.-T.; Lefèvre, S. Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sens. 2020, 12, 3152. https://doi.org/10.3390/rs12193152
Courtrai L, Pham M-T, Lefèvre S. Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sensing. 2020; 12(19):3152. https://doi.org/10.3390/rs12193152
Chicago/Turabian StyleCourtrai, Luc, Minh-Tan Pham, and Sébastien Lefèvre. 2020. "Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks" Remote Sensing 12, no. 19: 3152. https://doi.org/10.3390/rs12193152
APA StyleCourtrai, L., Pham, M. -T., & Lefèvre, S. (2020). Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sensing, 12(19), 3152. https://doi.org/10.3390/rs12193152