Abstract
In recent years, voice recognition devices such as smart speakers have been in the limelight, but intuitive communication such as instructing the target with ambiguous expressions is difficult with voice alone. In order to realize such communication, it is effective to combine gestures such as pointing with voice. Therefore, we propose a method to detect the pointing position from the image of the omnidirectional camera using a convolutional neural network. Although many methods have been proposed to detect the pointing position using a normal camera, the standing position of the person who gives instructions is limited since the observation area is small. We solve this problem by using an omnidirectional camera. First, the proposed method converts a hemisphere image taken from an omnidirectional camera to a panoramic image. Next, the bounding box surrounding the person with pointing gesture is detected in the panoramic image by the object detection network. Finally, the pointing position estimation network estimates the pointing position in the panoramic image from the image in the bounding box and its location. Since it is difficult to prepare a large number of pointing gesture images, CG images created by Unity are used for pre-training. Experiments using real images of pointing gesture shows that the proposed method is effective for pointing position detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dhingra, N., Valli, E., Kunz, A.: Recognition and localisation of pointing gestures using a RGB-D camera. In: arXiv:2001.03687 (2020)
Droeschel, D., Stuckler, J., Behnke, S.: Learning to interpret pointing gestures with a time-of-flight camera. In: Proceedings of 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI) (2011)
Joic, J., Brumitt, B., Meyers, B., Harris, S., Haung, T.: Detection and estimation of pointing gestures in dense disparity maps. In: Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition (2000).
Park, C., Lee, S.: Real-time 3D pointing gesture recognition for mobile robots with cascade HMM and particle filter. Image Vis. Comput. 29(1), 51–63 (2011)
Hu, K., Canavan, S., Yin, L.: Hand pointing estimation for human computer interaction based on two orthogonal-views. In: Proceedings of 2010 International Conference on Pattern Recognition (2010)
Watanabe, H., Yasumoto, M., Yamamoto, K.: Detection and estimation of omni-directional pointing gestures using multiple cameras. In: Proceedings of IAPR Workshop on Machine Vision Applications (2000)
Cernekova, Z., Malerczyk, C., Nikolaidis, N.: Single camera pointing gesture recognition for interaction in edutainment applications. In: Proceedings of the 24th Spring Conference on Computer Graphics (SCCG) (2008)
Huang, Y., Liu, X., Zhang, X., Jin, L.: A Pointing gesture based egocentric interaction system: dataset, approach and application. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (2016)
Mukeherjee, S., Ahmed, S.A., Dogra, D.P., Kar, S., Roy, P.P.: Fingertip detection and tracking for recognition of air-writing in videos. Expert Syst. Appl. 136, 217–229 (2019)
Jaiswal, S., Mishra, P., Nandi, G.C.: Deep learning based command pointing direction estimation using a single RGB camera. In: Proceedings of 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Neural Information Processing Systems (NIPS) (2015)
Kingma, D. P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2015).
Xie, S., Dollar, G. P., Tu, Z., He, K., He.: Aggregated residual transformations for deep neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2017)
Szegedy, C., et al.: Going Deeper with Convolutions. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shiratori, Y., Onoguchi, K. (2021). Detection of Pointing Position by Omnidirectional Camera. In: Huang, DS., Jo, KH., Li, J., Gribova, V., Bevilacqua, V. (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science(), vol 12836. Springer, Cham. https://doi.org/10.1007/978-3-030-84522-3_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-84522-3_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84521-6
Online ISBN: 978-3-030-84522-3
eBook Packages: Computer ScienceComputer Science (R0)