Multi-View Object Detection Based on Deep Learning
<p>Imaging principle of the human eye.</p> "> Figure 2
<p>Multi-view segmentation.</p> "> Figure 3
<p>Model for multi-view object detection.</p> "> Figure 4
<p>Evaluation of adjacent boundary lines in multiple views.</p> "> Figure 5
<p>Comparisons of small object detection results between the classical You Only Look Once (YOLO) method (the first and third columns) and the Multi-view YOLO method (the second and fourth columns).</p> "> Figure 6
<p>Comparisons of small object detection results between the classical YOLOv2 method (the first and third columns) and the Multi-view YOLOv2 method (the second and fourth columns).</p> "> Figure 7
<p>Comparisons of small object detection results between the classical SSD method (the first and third columns) and the Multi-view SSD method (the second and fourth columns).</p> "> Figure 8
<p>Comparison of the object retrieval abilities of the Multi-view YOLO and classical YOLO methods.</p> "> Figure 9
<p>Comparison of the object retrieval abilities of the Multi-view YOLOv2 and classical YOLOv2 methods.</p> "> Figure 10
<p>Comparison of the object retrieval abilities of the Multi-view SSD and classical SSD methods.</p> ">
Abstract
:1. Introduction
2. Disadvantage of Classical Object Detection Methods Based on Regression
3. Multi-View Modeling
3.1. Establishment of a Multi-View Model
3.2. Model Implementation
Algorithm 1 RM (Region merging) |
1: Input: Candidate bounding box set from stage one |
2: Output: Region-merged object box set 3: Initialize the region-merged object box set , the index set of , overlap threshold , overlap maximum , , , , 4: Obtain the set of the indexes according to the order of candidate bounding boxes sorted by the coordinate 5: While is not null do 6: Obtain the last index of : 7: Set the suppressing bounding box set 8: Append the last index to : 9: Foreach index in do 10: 11: Calculate the overlap using the theory of IoU 12: If do 13: Append the index n to the set : 14: If do 15: 16: Calculate the area of and the area of 17: If do 18: Remove the last index in : 19: Append the index to : 20: Remove the suppressing bounding box set in : 21: Foreach in do 22: Extract the object bounding boxes : |
4. Experiments and Analysis
4.1. Comparison of Small Object Test Results
4.2. Comparison of Object Retrieval Ability
4.3. Comparison of Object Detection Accuracy
4.4. Comparisons of Detection Performance between Multi-View Methods and State-of-the-Art Methods Based on Region Proposals
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Erhan, D.; Szegedy, C.; Toshev, A.; Anguelov, D. Scalable object detection using deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 2147–2154. [Google Scholar]
- Borji, A.; Cheng, M.M.; Jiang, H.; Li, J. Salient object detection: A benchmark. IEEE Trans. Image Process. 2015, 24, 5706–5722. [Google Scholar] [CrossRef] [PubMed]
- Jeong, Y.N.; Son, S.R.; Jeong, E.H.; Lee, B.K. An Integrated Self-Diagnosis System for an Autonomous Vehicle Based on an IoT Gateway and Deep Learning. Appl. Sci. 2018, 7, 1164. [Google Scholar] [CrossRef]
- Wu, X.; Huang, G.; Sun, L. Fast visual identification and location algorithm for industrial sorting robots based on deep learning. Robot 2016, 38, 711–719. [Google Scholar]
- Merlin, P.M.; Farber, D.J. A parallel mechanism for detecting curves in pictures. IEEE Trans. Comput. 1975, 100, 96–98. [Google Scholar] [CrossRef]
- Singla, N. Motion detection based on frame difference method. Int. J. of Inf. Comput. Tech. 2014, 4, 1559–1565. [Google Scholar]
- Lee, D.S. Effective Gaussian mixture learning for video background subtraction. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 827–832. [Google Scholar] [PubMed] [Green Version]
- Horn, B.K.P.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef] [Green Version]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; pp. 511–518. [Google Scholar]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 886–893. [Google Scholar]
- Panning, A.; Al-Hamadi, A.K.; Niese, R.; Michaelis, B. Facial expression recognition based on Haar-like feature detection. Pattern Recognit Image Anal. 2008, 18, 447–452. [Google Scholar] [CrossRef]
- Burges, C.J.C. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 1998, 2, 121–167. [Google Scholar] [CrossRef]
- Zhu, J.; Zou, H.; Rosset, S.; Hastie, T. Multi-class adaboost. Stat. Interface 2009, 2, 349–360. [Google Scholar]
- Kong, T.; Yao, A.; Chen, Y.; Sun, F. HyperNet: Towards accurate region proposal generation and joint object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 845–853. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 346–361. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Boston, MA, USA, 8–10 June 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the International Conference on Neural Information Processing Systems, Barcelona, Spain, 9 December 2016; pp. 379–387. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Eggert, C.; Zecha, D.; Brehm, S.; Lienhart, R. Improving Small Object Proposals for Company Logo Detection. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, Bucharest, Romania, 6–9 June 2017; pp. 167–174. [Google Scholar]
- Fu, C.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. 2017. 1701.06659. Cornell University. Available online: https://arxiv.org/abs/1701.06659 (accessed on 23 July 2018).
- Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands, 8–16 October; pp. 354–370.
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. 2016. 1612.08242. Cornell University. Available online: https://arxiv.org/abs/1612.08242 (accessed on 23 July 2018).
Convolution Layer | Feature Layer | |||
---|---|---|---|---|
Layer | Convolution Receptive Field | Output Scale | Default Boxes Ratio | Mapping Region Scale |
Conv4_3 | 92 × 92 | 38 × 38 | 0.1 | 30 × 30 |
Conv7 | 260 × 260 | 19 × 19 | 0.2 | 60 × 60 |
Conv8_2 | 292 × 292 | 10 × 10 | 0.38 | 114 × 114 |
Conv9_2 | 356 × 356 | 5 × 5 | 0.56 | 168 × 168 |
Conv10_2 | 485 × 485 | 3 × 3 | 0.74 | 222 × 222 |
Conv11_2 | 612 × 612 | 1 × 1 | 0.92 | 276 × 276 |
Method | mAP (%) | AP (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Aero | Bird | Boat | Bottle | Car | Chair | Cow | Dog | Horse | Person | Plant | Sheep | ||
YOLO | 24.3 | 20.2 | 19.9 | 7.4 | 3.9 | 29.8 | 19.4 | 44.2 | 37.2 | / | 24.3 | 20.3 | 40.6 |
Multi-view YOLO | 38.6 | 46.9 | 32.0 | 34.6 | 20.5 | 52.4 | 39.7 | 39.7 | 43.1 | / | 26.6 | 30.2 | 59.3 |
YOLO* | 17.9 | 18.9 | 12.6 | 3.5 | 3.3 | 22.1 | 14.7 | 28.9 | 30.2 | / | 14.9 | 16.4 | 31.1 |
Multi-view YOLO* | 30.0 | 44.1 | 20.3 | 16.5 | 17.2 | 38.9 | 30.2 | 41.5 | 35.0 | / | 16.3 | 24.4 | 45.4 |
YOLOv2 | 48.8 | 40.0 | 40.0 | 38.2 | 31.5 | 65.4 | 52.0 | 75.4 | 51.9 | 28.6 | 55.8 | 39.1 | 68.1 |
Multi-view YOLOv2 | 56.2 | 68.1 | 40.7 | 50.1 | 49.4 | 80.1 | 57.9 | 71.4 | 71.0 | 31.4 | 56.5 | 32.6 | 65.3 |
YOLOv2* | 34.7 | 37.5 | 25.4 | 18.2 | 26.6 | 48.5 | 39.5 | 49.3 | 42.2 | 11.1 | 34.3 | 31.6 | 52.2 |
Multi-view YOLOv2* | 40.6 | 64.1 | 25.9 | 23.9 | 41.7 | 59.4 | 44.0 | 46.7 | 57.7 | 12.2 | 34.7 | 26.3 | 50.0 |
SSD | 51.3 | 65.6 | 61.2 | 62.0 | 18.2 | 72.1 | 44.7 | 59.8 | 55.2 | 28.6 | 53.6 | 28.8 | 66.1 |
Multi-view SSD | 64.4 | 72.9 | 66.2 | 72.9 | 27.0 | 87.5 | 66.1 | 80.5 | 66.2 | 54.3 | 59.2 | 45.2 | 75.2 |
SSD* | 36.3 | 61.8 | 38.9 | 29.6 | 15.6 | 53.4 | 34.0 | 39.1 | 44.8 | 11.1 | 32.9 | 23.3 | 50.6 |
Multi-view SSD* | 45.3 | 68.6 | 42.1 | 34.8 | 22.8 | 65.0 | 50.2 | 52.7 | 53.8 | 21.1 | 36.3 | 36.5 | 59.9 |
Method | mAP (%) | AP (%) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Aero | Bird | Boat | Bottle | Car | Chair | Cow | Dog | Horse | Person | Plant | Sheep | ||
Faster RCNN (VGG) | 66.1 | 74.2 | 59.2 | 67.7 | 73.7 | 85.2 | 48.7 | 76.0 | 69.1 | 23.8 | 73.9 | 56.2 | 85.6 |
Faster RCNN (VGG)* | 47.8 | 69.8 | 37.6 | 32.3 | 62.1 | 63.2 | 37.0 | 49.7 | 56.2 | 9.3 | 45.4 | 45.4 | 65.5 |
RFCN | 60.9 | 67.3 | 66.1 | 51.8 | 46.1 | 68.7 | 47.6 | 77.5 | 83.4 | 37.1 | 72.8 | 30.7 | 81.5 |
RFCN * | 43.5 | 63.4 | 42.0 | 24.7 | 38.9 | 51.0 | 36.2 | 50.7 | 67.8 | 14.4 | 44.7 | 24.9 | 63.4 |
SPP-net | 29.8 | 42.6 | 32.6 | 27.5 | 11.9 | 45.1 | 26.2 | 54.7 | 10.3 | 14.3 | 34.5 | 15.3 | 42.7 |
SPP-net * | 21.3 | 40.1 | 20.7 | 13.1 | 10.0 | 33.5 | 19.9 | 35.8 | 10.4 | 5.6 | 21.2 | 12.3 | 32.7 |
Method | Multi-View YOLOv2 | Multi-View SSD | Faster RCNN (VGG) | RFCN | SPP-net |
---|---|---|---|---|---|
mAP (%) | 56.2 | 64.4 | 66.1 | 60.9 | 29.8 |
Run time (sec/img) | 0.09 | 0.10 | 0.23 | 0.20 | 0.42 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, C.; Ling, Y.; Yang, X.; Jin, W.; Zheng, C. Multi-View Object Detection Based on Deep Learning. Appl. Sci. 2018, 8, 1423. https://doi.org/10.3390/app8091423
Tang C, Ling Y, Yang X, Jin W, Zheng C. Multi-View Object Detection Based on Deep Learning. Applied Sciences. 2018; 8(9):1423. https://doi.org/10.3390/app8091423
Chicago/Turabian StyleTang, Cong, Yongshun Ling, Xing Yang, Wei Jin, and Chao Zheng. 2018. "Multi-View Object Detection Based on Deep Learning" Applied Sciences 8, no. 9: 1423. https://doi.org/10.3390/app8091423
APA StyleTang, C., Ling, Y., Yang, X., Jin, W., & Zheng, C. (2018). Multi-View Object Detection Based on Deep Learning. Applied Sciences, 8(9), 1423. https://doi.org/10.3390/app8091423