Abstract
Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe autonomous navigation. These are so-called human detection and action recognition. In this paper, moving humans are target objects. Compared to computer vision, the real-time performance of robot vision is more important. For this challenge, we propose a robot vision system. In this system, images described by the optical flow are used as an input. For the classification of humans and actions in the input images, we use Convolutional Neural Network, CNN, rather than coding invariant features. Moreover, we present a novel detector, local search window, for clipping partial images around target objects. Through the experiment, finally, we show that the robot vision system is able to detect the moving human and recognize the action in real time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that the right image was only used in this experiment. In future works, we will use the 3D information obtained from both the images.
- 2.
The optical flow was used as an input to the CNN classifier.
- 3.
The mean shift clustering [18] was used for integrating the windows.
References
Ojala, T., et al.: Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In: International Conference on Pattern Recognition, vol. 1, pp. 582–585 (1994)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, pp. 1150–1157 (1999)
Csurka, G., et al.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, pp. 59–74 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Dollar, P., et al.: Behavior recognition via sparse spatio-temporal features. In: International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
van de Sande, K.E.A., et al.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision, pp. 1879–1886 (2011)
Uijlings, J.R.R., et al.: Selective search for object recognition. In: International Journal of Computer Vision, vol. 104, pp. 154–171 (2013)
LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1137–1149 (2016)
Farneb\({\rm \ddot{a}}\)ck, G.: Two-frame motion estimation based on polynomial expansion. In: Scandinavian Conference on Image Analysis, vol. 2749, pp. 363–370 (2003)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2749, pp. 1–8 (2008)
Jain, M., et al.: Better exploiting motion for better action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2555–2562 (2013)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (2015)
Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
LeCun, Y., et al.: Deep learning. Nature 521(7553), 436–444 (2015)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Goudail, F., et al.: Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images. J. Opt. Soc. Am. A 21(7), 1231–1240 (2004)
Comaniciu, D., et al.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Oliveira, L., et al.: On exploration of classifier ensemble synergism in pedestrian detection. IEEE Trans. Intell. Transp. Syst. 11(1), 16–27 (2010)
Wang, H., Schmid, C.: LEAR-INRIA submission for the THUMOS workshop. In: ICCV Workshop on Action Recognition with a Large Number of Classes, vol. 2, no. 7 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hoshino, S., Niimura, K. (2019). Robot Vision System for Real-Time Human Detection and Action Recognition. In: Strand, M., Dillmann, R., Menegatti, E., Ghidoni, S. (eds) Intelligent Autonomous Systems 15. IAS 2018. Advances in Intelligent Systems and Computing, vol 867. Springer, Cham. https://doi.org/10.1007/978-3-030-01370-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-01370-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01369-1
Online ISBN: 978-3-030-01370-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)