Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks
<p>The overall architecture of the proposed method to address target recognition in the infrared circumferential scanning system (IRCSS). We perform the overlapping segmentation on the single-frame image of IRCSS. After getting the sub-frame image, it is sent to the recognition network. The backbone structure is detailed in <a href="#sec4-sensors-20-01922" class="html-sec">Section 4</a>. The feature fusion follows the design in the feature pyramid network (FPN) [<a href="#B15-sensors-20-01922" class="html-bibr">15</a>]. The region proposal network (RPN) and the region convolutional neural network (RCNN) subnet follows the design of the Faster RCNN [<a href="#B16-sensors-20-01922" class="html-bibr">16</a>]. The region proposals are processed by RoI align [<a href="#B17-sensors-20-01922" class="html-bibr">17</a>]. In the bounding box regression for target localization, the loss function is the smoother L1.</p> "> Figure 2
<p>The single-frame image obtained by the IRCSS. In this paper, its size is <math display="inline"><semantics> <mrow> <mn>768</mn> <mo>×</mo> <mn>40,000</mn> </mrow> </semantics></math>. To make it clearer to display, we zoom in on a helicopter target.</p> "> Figure 3
<p>The indirect acquisition of sub-frame images by overlapping segmentation. One target is complete in at least one sub-frame image.</p> "> Figure 4
<p>The process of building the infrared target recognition dataset. After getting the sub-frame image, we separated the target. The target was then embedded into the scene images.</p> "> Figure 5
<p>Some of the infrared target recognition dataset, including changes of target type, aspect orientation (front and side), size, contrast, and scene (road (<b>a</b>), (<b>h</b>), tree (<b>b</b>), desert (<b>c</b>), grassland (<b>d</b>), mountain (<b>e</b>), building (<b>f</b>), and car (<b>g</b>)).</p> "> Figure 6
<p>The curves of the smoother L1 with different <math display="inline"><semantics> <mi>α</mi> </semantics></math> and <math display="inline"><semantics> <mi>β</mi> </semantics></math>. We set <math display="inline"><semantics> <mi>α</mi> </semantics></math> to control the gradient of outliers, and <math display="inline"><semantics> <mi>β</mi> </semantics></math> to control the changing trend of the gradient of inliers. (<b>a</b>) The gradient curve of the smoother L1. (<b>b</b>) The loss curve of the smoother L1.</p> "> Figure 7
<p>The data augmentation for each image, including horizontal flipping, Gaussian noise, rotation.</p> "> Figure 8
<p>The visualization of training: (<b>a</b>) classification loss in RPN; (<b>b</b>) localization loss in RPN; (<b>c</b>) classification loss; (<b>d</b>) localization loss in bounding box regression; (<b>e</b>) total loss; (<b>f</b>) accuracy.</p> "> Figure 9
<p>The visualization of training with the smoother L1 (<math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>2</mn> <mo>,</mo> <mo> </mo> <mi>β</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>) and the smooth L1: (<b>a</b>) classification loss; (<b>b</b>) localization loss in bounding box regression.</p> "> Figure 10
<p>Some of the recognition results. The target is in the grassland (<b>a</b>), road (<b>b</b>), (<b>c</b>), desert (<b>d</b>), (<b>e</b>), trees (<b>f</b>), and mountains (<b>g</b>), (<b>h</b>). In (<b>d</b>), (<b>e</b>), (<b>g</b>), and (<b>h</b>), the original armored vehicles and aircraft in the scene were also recognized.</p> ">
Abstract
:1. Introduction
- We realize end-to-end target recognition on high-resolution imaging results of the IRCSS via the DCNN.
- We build an infrared target recognition dataset to both overcome the shortage of data and enhance the adaptability of the algorithm in various scenes, including two types of targets in seven types of scenes with two types of aspect orientations, four types of sizes and twelve types of contrasts.
- We design a loss function called the smoother L1 in the bounding box regression for better localization performance.
2. Related Work
2.1. Target Recognition and Tracking in Infrared Images
2.2. DCNN-Based Target Recognition
3. Methodology
3.1. Sub-Frame Images of the IRCSS
3.2. Infrared Target Recognition Dataset
3.3. Smoother L1
4. Experiments
4.1. Implementation Details
4.2. Comparison of Methods
4.3. Exploiting the Optimal Cross-Domain Transfer Learning Strategy
4.3.1. Weight Initialization
4.3.2. Frozen Stages
4.4. Ablation Studies on the Smoother L1
4.5. Scene Adaptability of the DCNN-Based Method
5. Conclusions and Prospect
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Vollmer, M.; Möllmann, K.-P. Infrared Thermal Imaging: Fundamentals, Research and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
- De Visser, M.; Schwering, P.B.; De Groot, J.F.; Hendriks, E.A. Passive ranging using an infrared search and track sensor. Opt. Eng. 2006, 45. [Google Scholar] [CrossRef] [Green Version]
- Fan, H. A high performance IRST system based on 1152 × 6 LWIR detectors. Infrared Technol. 2010, 32, 20–24. [Google Scholar]
- Fan, Q.; Fan, H.; Lin, Y.; Zhang, J.; Lin, D.; Yang, C.; Zhu, L.; Li, W. Multi-object extraction methods based on long-line column scanning for infrared panorama imaging. Infrared Technol. 2019, 41, 118–126. [Google Scholar]
- Weihua, W.; Zhijun, L.; Jing, L.; Yan, H.; Zengping, C. A Real-time Target Detection Algorithm for Panorama Infrared Search and Track System. Procedia Eng. 2012, 29, 1201–1207. [Google Scholar] [CrossRef] [Green Version]
- Hu, M. Research on Detection Technology of Dim and Small Targets in Large Field of View and Complicated Background; National University of Defense Technology: Changsha, China, 2008. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
- Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D. Cascade Object Detection with Deformable Part Models. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2241–2248. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 1, 1097–1105. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.F. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in Neural Information Processing Systems. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Zhang, L.; Gonzalez-Garcia, A.; van de Weijer, J.; Danelljan, M.; Khan, F.S. Synthetic data generation for end-to-end thermal infrared tracking. IEEE Trans. Image Process. 2018, 28, 1837–1850. [Google Scholar] [CrossRef] [Green Version]
- Maji, S.; Malik, J. Object Detection Using a Max-Margin Hough Transform. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1038–1045. [Google Scholar]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient graph-based image segmentation. Int. J. Comput. Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
- Patel, V.M.; Nasrabadi, N.M.; Chellappa, R. Sparsity-motivated automatic target recognition. Appl. Opt. 2011, 50, 1425–1433. [Google Scholar] [CrossRef] [PubMed]
- Khan, M.N.A.; Fan, G.; Heisterkamp, D.R.; Yu, L. Automatic Target Recognition in Infrared Imagery Using Dense Hog Features and Relevance Grouping of Vocabulary. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 293–298. [Google Scholar]
- Blackman, S.; Blackman, S.S.; Popoli, R. Design and Analysis of Modern Tracking Systems; Artech House Books: London, UK, 1999. [Google Scholar]
- Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 583–596. [Google Scholar] [CrossRef] [Green Version]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Cehovin Zajc, L.; Vojir, T.; Hager, G.; Lukezic, A.; Eldesokey, A. The Visual Object Tracking Vot2017 Challenge Results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 1949–1972. [Google Scholar]
- Yu, X.; Yu, Q.; Shang, Y.; Zhang, H. Dense structural learning for infrared object tracking at 200+ Frames per Second. Pattern Recognit. Lett. 2017, 100, 152–159. [Google Scholar] [CrossRef]
- Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M.-M.; Hicks, S.L.; Torr, P.H. Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 2096–2109. [Google Scholar] [CrossRef] [Green Version]
- Danelljan, M.; Hager, G.; Shahbaz Khan, F.; Felsberg, M. Learning Spatially Regularized Correlation Filters for Visual Tracking. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4310–4318. [Google Scholar]
- Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J.-K.; Cehovin Zajc, L.; Drbohlav, O.; Lukezic, A.; Berg, A. The Seventh Visual Object Tracking Vot2019 Challenge Results. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Danelljan, M.; Bhat, G.; Shahbaz Khan, F.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6638–6646. [Google Scholar]
- Kang, M.; Ji, K.; Leng, X.; Xing, X.; Zou, H. Synthetic aperture radar target recognition with feature fusion based on a stacked autoencoder. Sensors 2017, 17, 192. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Chen, Y.; Yang, T.; Zhang, X.; Meng, G.; Xiao, X.; Sun, J. DetNAS: Backbone Search for Object Detection. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 6638–6648. [Google Scholar]
- Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Liu, Y.; Wang, Y.; Wang, S.; Liang, T.; Zhao, Q.; Tang, Z.; Ling, H. Cbnet: A novel composite backbone network architecture for object detection. arXiv 2019, arXiv:1909.03625. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 777–778. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. arXiv 2019, arXiv:1911.09070. [Google Scholar]
- Tan, M.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Zhang, Y.; Zhang, Y.; Shi, Z.; Zhang, J.; Wei, M. Design and Training of Deep CNN-Based Fast Detector in Infrared SUAV Surveillance System. IEEE Access 2019, 7, 137365–137377. [Google Scholar] [CrossRef]
- Yardimci, O.; Ayyıldız, B.Ç. Comparison of SVM and CNN Classification Methods for Infrared Target Recognition. In Proceedings of the Automatic Target Recognition XXVIII, Orlando, FL, USA, 30 April 2018; p. 1064804. [Google Scholar]
- Tanner, I.L.; Mahalanobis, A. Fundamentals of Target Classification Using Deep Learning. In Proceedings of the Automatic Target Recognition XXIX, Baltimore, MD, USA, 14 May 2019; p. 1098809. [Google Scholar]
- d’Acremont, A.; Fablet, R.; Baussard, A.; Quin, G. CNN-based target recognition and identification for infrared imaging in defense systems. Sensors 2019, 19, 2040. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Liang, X.; Lu, Y.; Zhao, N.; Tang, J. RGB-T object tracking: Benchmark and baseline. Pattern Recognit. 2019, 96, 106977. [Google Scholar] [CrossRef] [Green Version]
- Science Data Bank: A Dataset for Dim-Small Target Detection and Tracking of Aircraft in Infrared Image Sequences. 2019. Available online: www.csdata.org/p/387/ (accessed on 27 March 2020).
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Advances in neural information processing systems. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 2, 3320–3328. [Google Scholar]
- Singh, B.; Davis, L.S. An Analysis of Scale Invariance in Object Detection Snip. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3578–3587. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
IoU > | TP | FN |
IoU <or Repetitive recognition * | FP | TN |
Layer Name | Stage0 | Stage1 | Stage2 | Stage3 | Stage4 |
---|---|---|---|---|---|
Operation* | maxpool () |
Method | Test | ||
---|---|---|---|
mAP | AP50 | AP75 | |
SSD(VGG) | 72.5 | 90.5 | 85.8 |
RetinaNet | 78.3 | 97.2 | 90.5 |
Faster RCNN | 79.7 | 97.9 | 91.6 |
Faster RCNN+FPN | 81.5 | 98.0 | 93.7 |
Ours | 82.7 | 98.1 | 95.2 |
Weight Initialization | Validation | ||
---|---|---|---|
mAP | AP50 | AP75 | |
Xavier | \ | \ | \ |
ImageNet | 80.1 | 95.9 | 94.0 |
COCO | 83.7 | 97.0 | 97.0 |
Frozen Stages | Time Consumption | Validation | ||
---|---|---|---|---|
mAP | AP50 | AP75 | ||
None | 1 h 55 min | 83.6 | 97.5 | 96.3 |
1 | 1 h 37 min | 83.7 | 97.0 | 97.0 |
1, 2 | 1 h 30 min | 82.2 | 97.0 | 95.9 |
1, 2, 3 | 1 h 18 min | 80.7 | 96.5 | 95.2 |
1, 2, 3, 4 | 1 h 11 min | 80.1 | 96.5 | 95.4 |
Smooth L1 | 82.7 | |||||
---|---|---|---|---|---|---|
1 | 1.5 | 2 | 2.5 | 3 | ||
2 | 83.2 | 83.2 | 83.7 | 83.6 | 82.9 | |
3 | 82.4 | 83.4 | 83.4 | 83.4 | 83.4 | |
4 | 82.9 | 83.2 | 83.1 | 82.8 | 82.9 |
Test Scene | mAP | AP50 | AP75 |
---|---|---|---|
Grassland | 67.2 | 97.3 | 77.8 |
Mountain | 79.5 | 97.6 | 95.4 |
Road | 80.9 | 96.8 | 92.6 |
Trees | 74.5 | 93.6 | 89.8 |
Desert | 76.0 | 93.5 | 92.4 |
Buildings | 78.3 | 92.1 | 91.3 |
Cars | 76.5 | 94.3 | 92.7 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, G.; Wang, W. Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks. Sensors 2020, 20, 1922. https://doi.org/10.3390/s20071922
Chen G, Wang W. Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks. Sensors. 2020; 20(7):1922. https://doi.org/10.3390/s20071922
Chicago/Turabian StyleChen, Gao, and Weihua Wang. 2020. "Target Recognition in Infrared Circumferential Scanning System via Deep Convolutional Neural Networks" Sensors 20, no. 7: 1922. https://doi.org/10.3390/s20071922