Pedestrian Detection with Semantic Regions of Interest
<p>Many algorithms detect many candidate boxes with no pedestrian. Green boxes are true positives. Red ones are false positives.</p> "> Figure 2
<p>Key idea of our approach for pedestrian detection. (<b>a</b>) is the ordinary image, transferred to the heat map (<b>b</b>) with the deep neural network. The regions of interest are extracted from the heat map and then zoomed to the same scale. The HOG + SVM algorithm is used for pedestrian detection in the zoomed regions of interest (<b>c</b>). Finally, the detection results are mapped to the ordinary image (<b>d</b>).</p> "> Figure 3
<p>Labels of the dataset (<b>left</b>) and the transformed one (<b>right</b>). In the right picture, values in the red area are set to one, in the blue area to zero.</p> "> Figure 4
<p>The figure maps of VGG19 in our approach. Blue ones are got from convolutional layer, and red ones are got from max-pooling layer. All the convolution kernels are in the size of 3 × 3. The size of the output is 15 × 20, 1/32 of the original image. At the same time, the figure maps got from pooling3 and pooling4 are also sent to next stage.</p> "> Figure 5
<p>The maps of the heat map network in our approach. Red ones are obtained from the last stage, blue ones from convolutional layers, purple ones from dropout layers, yellow ones from deconvolutional layers and green ones from concat layers.</p> "> Figure 6
<p>The process from the heat map to regions of interest. (<b>a</b>) is the heat map we obtained from the last stage. Then, it is transferred into a binary image (<b>b</b>). With the help of the morphology opening and closing operation, the noise is removed, and a series of connected regions is obtained at the same time in (<b>c</b>). Finally in (<b>d</b>), the bounding boxes of the connected regions are mapped to the original image to obtain the semantic regions of interest (SROI).</p> "> Figure 7
<p>Results without non-maximum suppression (<b>left</b>) and with non-maximum suppression (<b>right</b>).</p> "> Figure 8
<p>Result on the Caltech Pedestrian Detection Benchmark compared with HOG + SVM.</p> "> Figure 9
<p>Results of HOG + SVM and our approach.</p> "> Figure 10
<p>ROC curves of algorithms with and without our SROI.</p> "> Figure 11
<p>PR curves of algorithms with and without our SROI.</p> "> Figure 12
<p>Results on the TUD-Brussels Pedestrian Dataset.</p> "> Figure 13
<p>Results on the TUD-Brussels Pedestrian Dataset.</p> "> Figure 14
<p>Results on the ETH Pedestrian Dataset.</p> "> Figure 15
<p>Results on the ETH Pedestrian Dataset.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Our Approach
3.1. Label Processing
3.2. Visual Geometry Group
3.3. Heat Map
3.4. Details of Training End-To-End
3.5. Semantic Regions of Interest
3.6. HOG
4. Experiments
4.1. Results on Caltech Pedestrian Dataset
4.1.1. Results Compared with Hog
4.1.2. Results of Other Algorithms with Our SROI
4.2. Results on the TUD-Brussels Pedestrian Dataset
4.3. Results on the ETH Pedestrian Dataset
5. Conclusions
Author Contributions
Conflicts of Interest
References
- Benenson, R.; Omran, M.; Hosang, J.; Schiele, B. Ten years of pedestrian detection, what have we learned? arXiv, 2014; arXiv:1411.4304. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Dollár, P.; Appel, R.; Belongie, S.; Perona, P. Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 1532–1545. [Google Scholar] [CrossRef] [PubMed]
- Ouyang, W.; Wang, X. Joint deep learning for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2056–2063. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2014; arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv, 2017; arXiv:1602.07261. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 3320–3328. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Viola, P.; Jones, M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–374. [Google Scholar] [CrossRef]
- Gu, C.; Lim, J.J.; Arbelaez, P.; Malik, J. Recognition using regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1030–1037. [Google Scholar]
- Carreira, J.; Sminchisescu, C. Constrained parametric min-cuts for automatic object segmentation. In Proceedings of the IEEE Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3241–3248. [Google Scholar]
- Endres, I.; Hoiem, D. Category-Independent Object Proposals with Diverse Ranking. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 222–234. [Google Scholar] [CrossRef] [PubMed]
- Pont-Tuset, J.; Barron, J.; Marques, F.; Malik, J. Multiscale Combinatorial Grouping. In Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 328–335. [Google Scholar]
- Sande, K.E.A.V.D.; Uijlings, J.R.R.; Gevers, T.; Smeulders, A.W.M. Segmentation as selective search for object recognition. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1879–1886. [Google Scholar]
- Hosang, J.; Benenson, R.; Schiele, B. How good are detection proposals, really? arXiv, 2014; arXiv:1406.6962. [Google Scholar]
- Xu, J.; Kim, K.; Zhang, Z.; Chen, H.W. 2D/3D Sensor Exploitation and Fusion for Enhanced Object Detection. In Proceedings of the Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 778–784. [Google Scholar]
- Benenson, R.; Mathias, M.; Timofte, R.; Gool, L.V. Fast stixel computation for fast pedestrian detection. Lect. Notes Comput. Sci. 2012, 7585, 11–20. [Google Scholar]
- González, A.; Villalonga, G.; Ros, G.; Vázquez, D.; López, A.M. 3D-Guided Multiscale Sliding Window for Pedestrian Detection; Springer International Publishing: Cham, Switzerland, 2015; pp. 560–568. [Google Scholar]
- Dollár, P.; Tu, Z.; Perona, P.; Belongie, S. Integral Channel Features. In Proceedings of the British Machine Vision Conference, London, UK, 7–10 September 2009. [Google Scholar]
- Nam, W.; Dollár, P.; Han, J.H. Local decorrelation for improved pedestrian detection. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 424–432. [Google Scholar]
- Paisitkriangkrai, S.; Shen, C.; van den Hengel, A. Strengthening the Effectiveness of Pedestrian Detection with Spatially Pooled Features; Springer: Berlin, Germany, 2014; pp. 546–561. [Google Scholar]
- Tuzel, O.; Porikli, F.; Meer, P. Pedestrian Detection via Classification on Riemannian Manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1713–1727. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Han, T.X.; Yan, S. An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2010; pp. 32–39. [Google Scholar]
- Yang, J.; Yu, K.; Gong, Y.; Huang, T. Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1794–1801. [Google Scholar]
- Ohn-Bar, E.; Trivedi, M.M. To boost or not to boost? On the limits of boosted trees for object detection. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancún, Mexico, 4–8 December 2016; pp. 3350–3355. [Google Scholar]
- Zhang, S.; Benenson, R.; Omran, M.; Hosang, J.; Schiele, B. How far are we from solving pedestrian detection? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1259–1267. [Google Scholar]
- Wang, L.; Ouyang, W.; Wang, X.; Lu, H. Visual tracking with fully convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3119–3127. [Google Scholar]
- Wojek, C.; Walk, S.; Schiele, B. Multi-cue onboard pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 794–801. [Google Scholar]
- Ess, A.; Leibe, B.; Schindler, K.; Gool, L.V. Moving obstacle detection in highly dynamic scenes. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 4451–4458. [Google Scholar]
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, M.; Luo, H.; Chang, Z.; Hui, B. Pedestrian Detection with Semantic Regions of Interest. Sensors 2017, 17, 2699. https://doi.org/10.3390/s17112699
He M, Luo H, Chang Z, Hui B. Pedestrian Detection with Semantic Regions of Interest. Sensors. 2017; 17(11):2699. https://doi.org/10.3390/s17112699
Chicago/Turabian StyleHe, Miao, Haibo Luo, Zheng Chang, and Bin Hui. 2017. "Pedestrian Detection with Semantic Regions of Interest" Sensors 17, no. 11: 2699. https://doi.org/10.3390/s17112699
APA StyleHe, M., Luo, H., Chang, Z., & Hui, B. (2017). Pedestrian Detection with Semantic Regions of Interest. Sensors, 17(11), 2699. https://doi.org/10.3390/s17112699