VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images
<p>The overall flowchart of the proposed algorithm. The input image is divided into an equal-sized <math display="inline"><semantics> <mrow> <mi>s</mi> <mo>×</mo> <mi>s</mi> </mrow> </semantics></math> grids (<b>a</b>), each grid includes three different scales and each scale owns three bounding boxes with different sizes (<b>b</b>,<b>c</b>) to predict the object (<b>d</b>).</p> "> Figure 2
<p>Framework of proposed method.</p> "> Figure 3
<p>Architecture of VaryBlock, consisting of five layers exclusive of layer zero.</p> "> Figure 4
<p>The route operation: (<b>a</b>) A = [[(1,2), (3,4)], [(5,6), (7,8)]];(<b>b</b>) B = [[(9,10), (11,12)], [(13,14), (15,16)]]; (<b>c</b>) C = [[(1,2,9,10), (3,4,11,12)], [(5,6,13,14), (7,8,15,16)]]. C is the route operation of A and B.</p> "> Figure 5
<p>The structure of feature extraction model with a mixture of 1 × 1, 3 × 3 convolution layer, VaryBlock and Residual layer.</p> "> Figure 6
<p>The overall structure of proposed method.</p> "> Figure 7
<p>Negative samples after using cycleGAN. Given content and style, then generate the fake negative samples. output-1 and output-2 show two different fake samples. (<b>a</b>) Mountain (<b>b</b>) Town; (<b>c</b>) Swamp.</p> "> Figure 8
<p>Positive samples after using cycleGAN. Given content and style, then generate the fake positive samples. output-1 and output-2 show two different fake samples. (<b>a</b>) Airport; (<b>b</b>) Baseball field; (<b>c</b>) Sports park.</p> "> Figure 9
<p>Example of Data Augmentation.</p> "> Figure 9 Cont.
<p>Example of Data Augmentation.</p> "> Figure 10
<p>Some examples of the NWPU VHR-10 dataset.</p> "> Figure 10 Cont.
<p>Some examples of the NWPU VHR-10 dataset.</p> "> Figure 11
<p>Histogram of the number of classes in the NWPU VHR-10 dataset.</p> "> Figure 12
<p>Detection results using our method. The first column are results of our method; the second column are the relevant original image. (<b>a</b>) Harbors detected from scene (<b>b</b>); (<b>c</b>) Ground track field and baseball diamound detected from scene (<b>d</b>); (<b>e</b>) Baseball diamound detected from scene (<b>f</b>); (<b>g</b>) Airplane detected from scene (<b>h</b>).</p> "> Figure 13
<p>The precision-recall curves of the results using YOLOv3 and proposed method on the NWPU VHR-10 dataset.</p> "> Figure 13 Cont.
<p>The precision-recall curves of the results using YOLOv3 and proposed method on the NWPU VHR-10 dataset.</p> "> Figure 14
<p>The training loss curves of our proposed method and YOLOv3.</p> ">
Abstract
:1. Introduction
- 1.
- A block named VaryBlock is proposed to treat the information loss and damage resulted from the downsample operation. When down-sampled, part of the image information will be discarded. Though, VaryBlock can effectively solve this problem.
- 2.
- A modified YOLO framework is designed, which succeeded in making better use of the semantic information of the original image without losing huge amounts of information. It not only enables raising the possibility that small objects could be detected, but also the other scales.
- 3.
- Some training trips that can improve the performance are presented and advanced data augmentation methods are used.
2. Related Works
2.1. CNN-Based Object Detection Algorithm
2.2. Shortcut Connection
3. Details of Network Structure and Proposed Method
3.1. Overall Architecture
3.2. VaryBlock
3.3. Structure of the Proposed Method
3.3.1. Feature Extraction Model
3.3.2. Detection Model
3.3.3. Overall Network Structure
Algorithm 1 Object detection algorithm based on modified-YOLO. |
Input: Image and Model Output: Prediction accuracy probability P of and the corresponding anchors position C
|
3.4. Data Augmentation
3.5. Generate Priori Anchors
Algorithm 2 generate priors anchors |
Input: Anchors size set X of objects in the dataset Output: The size of 9 different scale anchors set Y
|
4. Experiments and Datasets
4.1. Datasets
4.2. Experimental Metric
4.3. Implementation Details
5. Results
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Lu, H.; Li, Y.; Chen, M.; Kim, H.; Serikawa, S. Brain Intelligence: Go beyond Artificial Intelligence. Mobile Netw. Appl. 2018, 23, 368–375. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Analy. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- Hu, F.; Xia, G.; Hu, J.; Zhang, L. Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Radovic, M.; Adarkwa, O.; Wang, Q. Object Recognition in Aerial Images Using Convolutional Neural Networks. J. Imaging 2017, 3, 21. [Google Scholar] [CrossRef]
- Chen, F.; Ren, R.; Van de Voorde, T.; Xu, W.; Zhou, G.; Zhou, Y. Fast Automatic Airport Detection in Remote Sensing Images Using Convolutional Neural Networks. Remote Sens. 2018, 10, 443. [Google Scholar] [CrossRef]
- Sakai, Y.; Lu, H.; Tan, J.K.; Kim, H. Environment Recognition for Electric Wheelchair Based on YOLOv2. In Proceedings of the 3rd International Conference on Biomedical Signal and Image Processing, New York, NY, USA, 22–24 August 2018; pp. 112–117. [Google Scholar]
- Li, Y.; Lu, H.; Nakayama, Y.; Kim, H.; Serikawa, S. Automatic road detection system for an air-land amphibious car drone. Future Gener. Comput. Syst. 2018, 85, 51–59. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Yang, X.; Sun, H.; Fu, K.; Yang, J.; Sun, X.; Yan, M.; Guo, Z. Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens. 2018, 10, 132. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, T.; Ouyang, C. End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef]
- Xu, Y.; Zhu, M.; Li, S.; Feng, H.; Ma, S.; Che, J. End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks. Remote Sens. 2018, 10, 1516. [Google Scholar] [CrossRef]
- Shen, Z.; Liu, Z.; Li, J.; Jiang, Y.; Chen, Y.; Xue, X. DSOD: Learning Deeply Supervised Object Detectors from Scratch. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1937–1945. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2019. [Google Scholar] [CrossRef]
- Everingham, M.; Zisserman, A.; Williams, C.; Gool, L.V. The PASCAL Visual Object Classes Challenge 2006 Results. Technical Report. September 2006. Available online: http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf (accessed on 28 November 2019).
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In Proceedings of the 13th European Conference on Computer Vision Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 346–361. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Tang, C.; Ling, Y.; Yang, X.; Jin, W.; Zheng, C. Multi-View Object Detection Based on Deep Learning. Appl. Sci. 2018, 8, 1423. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yang, G.; Yang, J.; Sheng, W.; Junior, F.E.F.; Li, S. Convolutional Neural Network-Based Embarrassing Situation Detection under Camera for Social Robot in Smart Homes. Sensors 2018, 18, 1530. [Google Scholar] [CrossRef]
- Lin, M.; Chen, Q.; Yan, S. Network In Network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Zhu, J.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Arthur, D.; Vassilvitskii, S. K-means++: The Advantages of Careful Seeding. In Proceedings of the 18th Annual ACM-SIAM Aymposium on Discrete Algorithms, Philadelphia, PA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogram. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
- Wen, X.; Shao, L.; Fang, W.; Xue, Y. Efficient Feature Selection and Classification for Vehicle Detection. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 508–517. [Google Scholar]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
- Han, X.; Zhong, Y.; Zhang, L. An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens. 2017, 9, 666. [Google Scholar] [CrossRef] [Green Version]
width | 10 | 16 | 33 | 30 | 62 | 59 | 116 | 156 | 373 |
height | 13 | 30 | 23 | 61 | 45 | 119 | 90 | 198 | 326 |
width | 18 | 25 | 38 | 41 | 87 | 55 | 95 | 165 | 122 |
height | 27 | 49 | 34 | 64 | 59 | 107 | 203 | 140 | 260 |
Transferred CNN | YOLOv2 | RICNN with Fine-Tuning | SSD | R-P-Faster R-CNN (Single VGG16) | Faster R-CNN | YOLOv3 | OURS | |
---|---|---|---|---|---|---|---|---|
Airplane | 0.661 | 0.733 | 0.884 | 0.957 | 0.904 | 0.946 | 0.909 | 0.995 |
Ship | 0.569 | 0.749 | 0.773 | 0.829 | 0.750 | 0.823 | 0.817 | 0.816 |
Storage tank | 0.843 | 0.344 | 0.853 | 0.856 | 0.444 | 0.653 | 0.908 | 0.911 |
Baseball diamond | 0.816 | 0.889 | 0.881 | 0.966 | 0.899 | 0.955 | 0.909 | 0.998 |
Tennis court | 0.350 | 0.291 | 0.408 | 0.821 | 0.790 | 0.819 | 0.908 | 0.913 |
Basketball court | 0.459 | 0.276 | 0.585 | 0.860 | 0.776 | 0.897 | 0.995 | 0.935 |
Ground track field | 0.800 | 0.988 | 0.867 | 0.582 | 0.877 | 0.924 | 0.9959 | 0.9976 |
Harbor | 0.620 | 0.754 | 0.686 | 0.548 | 0.791 | 0.724 | 0.899 | 0.912 |
Bridge | 0.423 | 0.518 | 0.615 | 0.419 | 0.682 | 0.575 | 0.903 | 0.927 |
Vehicle | 0.429 | 0.513 | 0.711 | 0.756 | 0.732 | 0.778 | 0.724 | 0.799 |
mean AP | 0.597 | 0.605 | 0.726 | 0.759 | 0.765 | 0.809 | 0.897 | 0.920 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Wu, J.; Liu, Y.; Yu, J. VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images. Sensors 2019, 19, 5284. https://doi.org/10.3390/s19235284
Zhang H, Wu J, Liu Y, Yu J. VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images. Sensors. 2019; 19(23):5284. https://doi.org/10.3390/s19235284
Chicago/Turabian StyleZhang, Heng, Jiayu Wu, Yanli Liu, and Jia Yu. 2019. "VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images" Sensors 19, no. 23: 5284. https://doi.org/10.3390/s19235284
APA StyleZhang, H., Wu, J., Liu, Y., & Yu, J. (2019). VaryBlock: A Novel Approach for Object Detection in Remote Sensed Images. Sensors, 19(23), 5284. https://doi.org/10.3390/s19235284