Abstract
In implicit human computer interaction, computers are required to understand users’ actions and intentions so as to provide proactive services. Visual processing has to detect and understand human actions and then transform them as the implicit input. In this paper an adaptive vision system is presented to solve visual processing tasks in dynamic meeting context. Visual modules and dynamic context analysis tasks are organized in a bidirectional scheme. Firstly human objects are detected and tracked to generate global features. Secondly current meeting scenario is inferred based on these global features, and in some specific scenarios face and hand blob level visual processing tasks are fulfilled to extract visual information for the analysis of individual and interactive events, which can further be adopted as implicit input to the computer system. The experiments in our smart meeting room demonstrate the effectiveness of the proposed framework.
Chapter PDF
Similar content being viewed by others
References
Bett, M., Gross, R., Yu, H.: Multimodal Meeting Tracker. In: Proc. RIAO (2000)
Carletta, J., Ashby, S., Bourban, S.: The AMI meeting corpus: A pre-announcement. In: Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), (July 2005)
Rybski, P.E., De la Torre, F., Patil, R., Vallespi, C., Veloso, M., Browning, B.: CAMEO: Camera Assisted Meeting Event Observer. In: Proc. Int. Conf. on Robotics and Automation (ICRA 2004), vol. 2, pp. 1634–1639, (May 2004)
McCowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., Zhang, D.: Automatic Analysis of Multimodal Group Actions in Meetings. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI 2005), vol. 27(3) (March 2005)
Zhang, D., Gatica-Perez, D., Bengio, S., McCowan, I.: Modeling individual and group actions in meetings with layered HMMs. IEEE Trans. on Multimedia 8(3) (June 2006)
Hakeem, A., Shah, M.: Ontology and taxonomy collaborated framework for meeting classification. In: Proc. 17th Int. Conf. on Pattern Recognition (ICPR 2004), vol. 4, pp. 219–222 (2004)
Al-Hames, M., Rigoll, G.: A Multi-Modal Graphical Model for Robust Recognition of Group Actions in Meetings from Disturbed Videos. In: Proc. IEEE Int. Conf. on Image Processing (ICIP 2005) (2005)
Trivedi, M.M., Huang, K.S., Mikic, I.: Dynamic Context Capture and Distributed Video Arrays for Intelligent Spaces. IEEE Trans. on Systems, Man, and Cybernetics—PART A: Systems and Humans 35(1) (January 2005)
Jiang, L., Zhang, X., Tao, L., Xu, G.: Behavior Analysis Oriented Consistent Foreground Object Detection. In: Proc. 2rd Chinese Conf. on Harmonious Human Machine Environment (HHME 2006) (2006)
Wang, Y., Liu, Y., Tao, L., Xu, G.: Real-Time Multi-View Face Detection and Pose Estimation in Video Stream. In: Proc. 18th Int. Conf. on Pattern Recognition (ICPR 2006), vol. 4, pp. 354–357 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dai, P., Tao, L., Zhang, X., Dong, L., Xu, G. (2007). An Adaptive Vision System Toward Implicit Human Computer Interaction. In: Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Ambient Interaction. UAHCI 2007. Lecture Notes in Computer Science, vol 4555. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73281-5_87
Download citation
DOI: https://doi.org/10.1007/978-3-540-73281-5_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73280-8
Online ISBN: 978-3-540-73281-5
eBook Packages: Computer ScienceComputer Science (R0)