Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Segmentation based rotated bounding boxes prediction and image synthesizing for object detection of high resolution aerial images

Published: 07 May 2020 Publication History

Abstract

Object detection for aerial images is becoming an active topic in computer vision with many real-world applications. It is a very challenging task due to many factors such as highly complex background, arbitrary object orientations, high input resolution, etc. In this paper, we develop a new training and inference mechanism, which is shown to significantly improve the detection accuracy for high resolution aerial images. Instead of estimating the orientations of objects using direct regressions like in previous methods, we propose to predict the rotated bounding boxes by leveraging a segmentation task, which is easier to train and yields more accurate detection results. In addition, an image synthesizing based data augmentation strategy is presented to address the data imbalance issues in aerial object detection. Extensive experiments have been conducted to verify our contribution. The proposed method sets new state-of-the-art performance on the challenging DOTA dataset. The source codes will be available at http://ice.dlut.edu.cn/lu/publications.html.

References

[1]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[2]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, MobileNetV2: inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[3]
N. Ma, X. Zhang, H.-T. Zheng, J. Sun, ShuffleNetV2: practical guidelines for efficient CNN architecture design, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
[4]
L. Wang, W. Ouyang, X. Wang, H. Lu, Visual tracking with fully convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3119–3127.
[5]
M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, Atom: accurate tracking by overlap maximization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[6]
B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: evolution of siamese visual tracking with very deep networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[7]
M. Jian, Q. Qi, H. Yu, J. Dong, C. Cui, X. Nie, H. Zhang, Y. Yin, K.-M. Lam, The extended marine underwater environment database and baseline evaluations, Appl. Soft Comput. 80 (2019) 425–437.
[8]
M. Jian, W. Zhang, H. Yu, C. Cui, X. Nie, H. Zhang, Y. Yin, Saliency detection based on directional patches extraction and principal local color contrast, J. Vis. Commun. Image Represent. 57 (2018) 1–11.
[9]
M. Jian, Q. Qi, J. Dong, Y. Yin, K.-M. Lam, Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection, J. Vis. Commun. Image Represent. 53 (2018) 31–41.
[10]
M. Jian, K.-M. Lam, J. Dong, L. Shen, Visual-patch-attention-aware saliency detection, IEEE Trans. Cybern. 45 (8) (2014) 1575–1586.
[11]
Q. Wang, S. Tang, D. Zhai, X. Hu, Salience based object tracking in complex scenes, Neurocomputing 314 (2018) 132–142.
[12]
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
[13]
R. Girshick, Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
[14]
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Proceedings of the Advances in Neural Information Processing Systems, 2015, pp. 91–99.
[15]
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969.
[16]
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
[17]
J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263–7271.
[18]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, SSD: single shot multibox detector, Proceedings of the European Conference on Computer Vision, Springer, 2016, pp. 21–37.
[19]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
[20]
B. Singh, M. Najibi, L.S. Davis, Sniper: efficient multi-scale training, Proceedings of the Advances in Neural Information Processing Systems, 2018, pp. 9310–9320.
[21]
K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, et al., Hybrid task cascade for instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4974–4983.
[22]
Y. Long, G.-S. Xia, Q. Lu, J. Ding, N. Xue, Learning RoI transformer for detecting oriented objects in aerial images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[23]
G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, L. Zhang, Dota: a large-scale dataset for object detection in aerial images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[24]
Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, Z. Luo, R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection, (2017), arXiv:1706.09579.
[25]
Y. Li, H. Zheng, Z. Yan, L. Chen, Detail preservation and feature refinement for object detection, Neurocomputing 359 (2019) 209–218.
[26]
Q. Zhong, C. Li, Y. Zhang, D. Xie, S. Yang, S. Pu, Cascade region proposal and global context for deep object detection, Neurocomputing (2019) In press.
[27]
S.M. Azimi, E. Vig, R. Bahmanyar, M. Körner, P. Reinartz, Towards multi-class object detection in unconstrained remote sensing imagery, Proceedings of the Asian Conference on Computer Vision, Springer, 2018, pp. 150–165.
[28]
P. Tang, X. Wang, S. Bai, W. Shen, X. Bai, W. Liu, A.L. Yuille, PCL: proposal cluster learning for weakly supervised object detection, IEEE Trans. Pattern Anal. Mach. Intell. (2018).
[29]
Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, Mask scoring R-CNN, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6409–6418.
[30]
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, East: an efficient and accurate scene text detector, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[31]
Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural image with connectionist text proposal network, Proceedings of the European Conference on Computer Vision, Springer, 2016, pp. 56–72.
[32]
M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, Textboxes: a fast text detector with a single deep neural network, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[33]
B. Shi, X. Bai, S. Belongie, Detecting oriented text in natural images by linking segments, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2550–2558.
[34]
J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue, Arbitrary-oriented scene text detection via rotation proposals, IEEE Trans. Multimed. 20 (11) (2018) 3111–3122.
[35]
D. Deng, H. Liu, X. Li, D. Cai, Pixellink: detecting scene text via instance segmentation, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[36]
Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, X. Bai, Multi-oriented text detection with fully convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159–4167.
[37]
T. He, W. Huang, Y. Qiao, J. Yao, Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network, (2016), arXiv:1603.09423.
[38]
C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, Z. Cao, Scene Text Detection via Holistic, Multi-Channel Prediction, (2016), arXiv:1606.09002.
[39]
Y. Zhang, J. Lai, P.C. Yuen, Text string detection for loosely constructed characters with arbitrary orientations, Neurocomputing 168 (2015) 970–978.
[40]
M. Gao, R. Yu, A. Li, V.I. Morariu, L.S. Davis, Dynamic zoom-in network for fast object detection in large images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6926–6935.
[41]
M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, K. Cho, Augmentation for Small Object Detection, 2019, arXiv:1902.07296
[42]
S. Razakarivony, F. Jurie, Vehicle detection in aerial imagery: a small target detection benchmark, J. Vis. Commun. Image Represent. 34 (2016) 187–203.
[43]
H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, J. Jiao, Orientation robust object detection in aerial images using deep convolutional neural network, Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), IEEE, 2015, pp. 3735–3739.
[44]
Z. Liu, H. Wang, L. Weng, Y. Yang, Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds, IEEE Geosci. Remote Sens. Lett. 13 (8) (2016) 1074–1078.
[45]
G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Trans. Geosci. Remote Sens. 54 (12) (2016) 7405–7415.
[46]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125.
[47]
G. Toussaint, Solving geometric problems with the rotating calipers, Proceedings of the IEEE MELECON’83, 2000.
[48]
G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, 2000.
[49]
L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2018) 834–848.
[50]
X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, Z. Guo, Automatic ship detection in remote sensing images from Google earth of complex scenes based on multiscale rotation dense feature pyramid networks, Remote Sens. 10 (1) (2018) 132.
[51]
X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, K. Fu, Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network, IEEE Access 6 (2018) 50839–50849.

Cited By

View all
  • (2022)Multi-view damage inspection using single-view damage projectionMachine Vision and Applications10.1007/s00138-022-01295-w33:3Online publication date: 1-May-2022
  • (2021)Spine-Transformers: Vertebra Detection and Localization in Arbitrary Field-of-View Spine CT with TransformersMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87199-4_9(93-103)Online publication date: 27-Sep-2021

Index Terms

  1. Segmentation based rotated bounding boxes prediction and image synthesizing for object detection of high resolution aerial images
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Neurocomputing
            Neurocomputing  Volume 388, Issue C
            May 2020
            347 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 07 May 2020

            Author Tags

            1. Object detection
            2. Arbitrary orientations
            3. Aerial images
            4. High resolution images

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 09 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2022)Multi-view damage inspection using single-view damage projectionMachine Vision and Applications10.1007/s00138-022-01295-w33:3Online publication date: 1-May-2022
            • (2021)Spine-Transformers: Vertebra Detection and Localization in Arbitrary Field-of-View Spine CT with TransformersMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87199-4_9(93-103)Online publication date: 27-Sep-2021

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media