research-article

Segmentation based rotated bounding boxes prediction and image synthesizing for object detection of high resolution aerial images

Authors:

You HeAuthors Info & Claims

Volume 388, Issue C

Pages 202 - 211

https://doi.org/10.1016/j.neucom.2020.01.039

Published: 07 May 2020 Publication History

Abstract

Object detection for aerial images is becoming an active topic in computer vision with many real-world applications. It is a very challenging task due to many factors such as highly complex background, arbitrary object orientations, high input resolution, etc. In this paper, we develop a new training and inference mechanism, which is shown to significantly improve the detection accuracy for high resolution aerial images. Instead of estimating the orientations of objects using direct regressions like in previous methods, we propose to predict the rotated bounding boxes by leveraging a segmentation task, which is easier to train and yields more accurate detection results. In addition, an image synthesizing based data augmentation strategy is presented to address the data imbalance issues in aerial object detection. Extensive experiments have been conducted to verify our contribution. The proposed method sets new state-of-the-art performance on the challenging DOTA dataset. The source codes will be available at http://ice.dlut.edu.cn/lu/publications.html.

References

[1]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[2]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, MobileNetV2: inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[3]

N. Ma, X. Zhang, H.-T. Zheng, J. Sun, ShuffleNetV2: practical guidelines for efficient CNN architecture design, Proceedings of the European Conference on Computer Vision (ECCV), 2018.

[4]

L. Wang, W. Ouyang, X. Wang, H. Lu, Visual tracking with fully convolutional networks, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3119–3127.

[5]

M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, Atom: accurate tracking by overlap maximization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[6]

B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, SiamRPN++: evolution of siamese visual tracking with very deep networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[7]

M. Jian, Q. Qi, H. Yu, J. Dong, C. Cui, X. Nie, H. Zhang, Y. Yin, K.-M. Lam, The extended marine underwater environment database and baseline evaluations, Appl. Soft Comput. 80 (2019) 425–437.

[8]

M. Jian, W. Zhang, H. Yu, C. Cui, X. Nie, H. Zhang, Y. Yin, Saliency detection based on directional patches extraction and principal local color contrast, J. Vis. Commun. Image Represent. 57 (2018) 1–11.

[9]

M. Jian, Q. Qi, J. Dong, Y. Yin, K.-M. Lam, Integrating QDWD with pattern distinctness and local contrast for underwater saliency detection, J. Vis. Commun. Image Represent. 53 (2018) 31–41.

[10]

M. Jian, K.-M. Lam, J. Dong, L. Shen, Visual-patch-attention-aware saliency detection, IEEE Trans. Cybern. 45 (8) (2014) 1575–1586.

[11]

Q. Wang, S. Tang, D. Zhai, X. Hu, Salience based object tracking in complex scenes, Neurocomputing 314 (2018) 132–142.

[12]

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.

Digital Library

[13]

R. Girshick, Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.

Digital Library

[14]

S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Proceedings of the Advances in Neural Information Processing Systems, 2015, pp. 91–99.

[15]

K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969.

[16]

J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.

[17]

J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263–7271.

[18]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, SSD: single shot multibox detector, Proceedings of the European Conference on Computer Vision, Springer, 2016, pp. 21–37.

[19]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.

[20]

B. Singh, M. Najibi, L.S. Davis, Sniper: efficient multi-scale training, Proceedings of the Advances in Neural Information Processing Systems, 2018, pp. 9310–9320.

[21]

K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, et al., Hybrid task cascade for instance segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4974–4983.

[22]

Y. Long, G.-S. Xia, Q. Lu, J. Ding, N. Xue, Learning RoI transformer for detecting oriented objects in aerial images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[23]

G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, L. Zhang, Dota: a large-scale dataset for object detection in aerial images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[24]

Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, Z. Luo, R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection, (2017), arXiv:1706.09579.

[25]

Y. Li, H. Zheng, Z. Yan, L. Chen, Detail preservation and feature refinement for object detection, Neurocomputing 359 (2019) 209–218.

Digital Library

[26]

Q. Zhong, C. Li, Y. Zhang, D. Xie, S. Yang, S. Pu, Cascade region proposal and global context for deep object detection, Neurocomputing (2019) In press.

[27]

S.M. Azimi, E. Vig, R. Bahmanyar, M. Körner, P. Reinartz, Towards multi-class object detection in unconstrained remote sensing imagery, Proceedings of the Asian Conference on Computer Vision, Springer, 2018, pp. 150–165.

[28]

P. Tang, X. Wang, S. Bai, W. Shen, X. Bai, W. Liu, A.L. Yuille, PCL: proposal cluster learning for weakly supervised object detection, IEEE Trans. Pattern Anal. Mach. Intell. (2018).

[29]

Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, Mask scoring R-CNN, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6409–6418.

[30]

X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, East: an efficient and accurate scene text detector, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[31]

Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural image with connectionist text proposal network, Proceedings of the European Conference on Computer Vision, Springer, 2016, pp. 56–72.

[32]

M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, Textboxes: a fast text detector with a single deep neural network, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.

[33]

B. Shi, X. Bai, S. Belongie, Detecting oriented text in natural images by linking segments, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2550–2558.

[34]

J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, X. Xue, Arbitrary-oriented scene text detection via rotation proposals, IEEE Trans. Multimed. 20 (11) (2018) 3111–3122.

Digital Library

[35]

D. Deng, H. Liu, X. Li, D. Cai, Pixellink: detecting scene text via instance segmentation, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[36]

Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, X. Bai, Multi-oriented text detection with fully convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159–4167.

[37]

T. He, W. Huang, Y. Qiao, J. Yao, Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network, (2016), arXiv:1603.09423.

[38]

C. Yao, X. Bai, N. Sang, X. Zhou, S. Zhou, Z. Cao, Scene Text Detection via Holistic, Multi-Channel Prediction, (2016), arXiv:1606.09002.

[39]

Y. Zhang, J. Lai, P.C. Yuen, Text string detection for loosely constructed characters with arbitrary orientations, Neurocomputing 168 (2015) 970–978.

[40]

M. Gao, R. Yu, A. Li, V.I. Morariu, L.S. Davis, Dynamic zoom-in network for fast object detection in large images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6926–6935.

[41]

M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, K. Cho, Augmentation for Small Object Detection, 2019, arXiv:1902.07296

[42]

S. Razakarivony, F. Jurie, Vehicle detection in aerial imagery: a small target detection benchmark, J. Vis. Commun. Image Represent. 34 (2016) 187–203.

Digital Library

[43]

H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, J. Jiao, Orientation robust object detection in aerial images using deep convolutional neural network, Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), IEEE, 2015, pp. 3735–3739.

[44]

Z. Liu, H. Wang, L. Weng, Y. Yang, Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds, IEEE Geosci. Remote Sens. Lett. 13 (8) (2016) 1074–1078.

[45]

G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images, IEEE Trans. Geosci. Remote Sens. 54 (12) (2016) 7405–7415.

[46]

T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117–2125.

[47]

G. Toussaint, Solving geometric problems with the rotating calipers, Proceedings of the IEEE MELECON’83, 2000.

[48]

G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, 2000.

[49]

L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell. 40 (4) (2018) 834–848.

[50]

X. Yang, H. Sun, K. Fu, J. Yang, X. Sun, M. Yan, Z. Guo, Automatic ship detection in remote sensing images from Google earth of complex scenes based on multiscale rotation dense feature pyramid networks, Remote Sens. 10 (1) (2018) 132.

[51]

X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, K. Fu, Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network, IEEE Access 6 (2018) 50839–50849.

Cited By

van Ruitenbeek RBhulai S(2022)Multi-view damage inspection using single-view damage projectionMachine Vision and Applications10.1007/s00138-022-01295-w33:3Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s00138-022-01295-w
Tao RZheng G(2021)Spine-Transformers: Vertebra Detection and Localization in Arbitrary Field-of-View Spine CT with TransformersMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87199-4_9(93-103)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-87199-4_9

Index Terms

Segmentation based rotated bounding boxes prediction and image synthesizing for object detection of high resolution aerial images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Interest point and salient region detections
        Object recognition
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Orientation Robust Object Detection in Aerial Images Based on R-NMS
Abstract
Object detection in aerial images is a challenging task which plays an important role in many fields, such as intelligent traffic management, fishery management and so on. Different from object detection in natural images, the orientation of ...
Aerial Image Object Detection Based on Superpixel-Related Patch
Image and Graphics
Abstract
Aerial image object detection and recognition has attracted increasing attention in recent years. Many excellent detectors have been proposed. However, due to the high-resolution of aerial images, these detectors are difficult to directly apply to ...
Dense-and-Similar Object detection in aerial images
Abstract
The general object detection performance has been improving significantly due to the prosperity of deep learning. When applied to aerial images, these algorithms perform poorly. There are, as we summarized, two practical reasons: (1) photographed ...
Highlights
- Introduce a separate detector for dense and small objects, and cluster detection results to form foreground region images.
- Treat similar classes as one merged class, and take advantage of their common features to achieve better ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 388, Issue C

May 2020

347 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © 2020.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 07 May 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

van Ruitenbeek RBhulai S(2022)Multi-view damage inspection using single-view damage projectionMachine Vision and Applications10.1007/s00138-022-01295-w33:3Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s00138-022-01295-w
Tao RZheng G(2021)Spine-Transformers: Vertebra Detection and Localization in Arbitrary Field-of-View Spine CT with TransformersMedical Image Computing and Computer Assisted Intervention – MICCAI 202110.1007/978-3-030-87199-4_9(93-103)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-87199-4_9

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents