Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1891903.1891969acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Conversation scene analysis based on dynamic Bayesian network and image-based gaze detection

Published: 08 November 2010 Publication History

Abstract

This paper presents a probabilistic framework, which incorporates automatic image-based gaze detection, for inferring the structure of multiparty face-to-face conversations. This framework aims to infer conversation regimes and gaze patterns from the nonverbal behaviors of meeting participants, which are captured from image and audio streams with cameras and microphones. The conversation regime corresponds to a global conversational pattern such as monologue and dialogue, and the gaze pattern indicates "who is looking at whom". Input nonverbal behaviors include presence/absence of utterances, head directions, and discrete head-centered eye-gaze directions. In contrast to conventional meeting analysis methods that focus only on the participant's head pose as a surrogate of visual focus of attention, this paper newly incorporates vision-based gaze detection combined with head pose tracking into a probabilistic conversation model based on dynamic Bayesian network. Our gaze detector is able to differentiate 3 to 5 different eye gaze directions, e.g. left, straight and right. Experiments on four-person conversations confirm the power of the proposed framework in identifying conversation structure and in estimating gaze patterns with higher accuracy then previous models.

References

[1]
M. Argyle. Bodily Communication -- 2nd ed. Routledge, London and New York, 1988.
[2]
M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge University Press, Cambridge, 1976.
[3]
S. O. Ba and J.-M. Odobez. Multi-party Focus of Attention Recognition in Meetings from Head Pose and Multimodal Contextual Cues In ICASSP, 2008
[4]
J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley & Sons, Ltd., 1994.
[5]
M. S. Choi and W. Y. Kim. A novel two stage template matching method for rotation and illumination invariance. In Pattern Recognition 35, pages 119--129, 2002.
[6]
A. Doshi and M. M Trivedi. Head and Gaze Dynamics in Visual Attention and Context Learning. In IEEE CVPR Joint Workshop for Visual and Contextual Learning and Visual Scene Understanding, June 2009
[7]
A. T. Duchowski. Eye Tracking Methodology: Theory and Practice. Springer, 2007.
[8]
S. Duncan. Some signals and rules for taking speaking turns in conversations. In Journal of Personality and Social Psychology, 23(2), 283--292, 1972.
[9]
S. Fruhwirth-Schnatter. Finite Mixture and Markov Switching Models. Springer, 2006.
[10]
D. Gatica-Perez. Automatic nonverbal analysis of social interaction in small groups: A review. In Image and Vision Computing, 27 pages 1775--1787, 2009.
[11]
W. R. Gilks, S. Richardson and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC, 1996.
[12]
C. Goodwin. Conversational Organization: Interaction between Speakers and Hearers. Academic Press, 1981.
[13]
C.-W. Hsu, C.-C. Chang, C.-J. Lin. A Practical Guide to Support Vector Classification. Technical Report, Department of Computer Science & Information Engineering, National Taiwan University, 2008.
[14]
A. Kendon. Some functions of gaze-direction in social interaction. In Acta Psychologica, 26: pages 22--63, 1967.
[15]
C.-J. Kim and C. R. Nelson. State-Space Models with Regime Switching. MIT Press, 1999.
[16]
G. Loy and A. Zelinsky. Fast Radial Symmetry for Detecting Points of Interest. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 8, August 2003.
[17]
O. M. Lozano and K. Otsuka. Real-time Visual Tracker by Stream Processing: Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter. In Journal of Signal Processing Systems, Vol. 57, No. 2 pages 285--295, November, 2009
[18]
P. Morasso, G. Sandini, V. Tagliasco and R. Zaccaria. Control Strategies in the Eye-Head Coordination System. In IEEE Trans. Systems, Man and Cibernetics, Vol SMC-7 No. 9, pages 639--651, 1977.
[19]
L. P. Morency, C. M. Christoudias, T. Darrell. Recognizing gaze aversion gestures in embodied conversational discourse. In Proc. ICMI'06, pages 287--294, 2006.
[20]
K. Otsuka, Y. Takemae, J. Yamato, and H. Murase. A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proc. ICMI'05, pages 191--198, 2005.
[21]
K. Otsuka, J. Yamato, and H. Murase. Conversation scene analysis with dynamic Bayesian network based on visual head tracking. In Proc. ICMI'06, pages 949--952, 2006.
[22]
K. Otsuka, H. Sawada and J. Yamato. Automatic Inference of Cross-modal Nonverbal interactions in Multiparty Conversations. In Proc. ICMI'07, 2007.
[23]
P. Qvarfordt, S. Zhai. Conversing with the user based on eye-gaze patterns. In Proc. CHI'05, pages 221--230, 2005.
[24]
V. P. Richmon and J. C. McCroskey. Nonverbal Behavior in Interpersonal Relations -- 5th ed. Allyn and Bacon/Pearson, Boston, 2004.
[25]
H. Sawada, S. Araki, K. Otsuka, M. Fujimoto and K. Ishizuka. Voice activity detection for multiple speakers with multiple pin microphones. In 2007 Spring Meeting, Acoustical Society of Japan, 2007.
[26]
R. Stiefelhagen, J. Zhu. Head Orientation and Gaze Direction in meetings. In Proc. ACM CHI '02, 2002.
[27]
W. H. Zangemeister and L. Stark. Types of Gaze Movement: Variable Interactions of Eye and Head Movements. In Experimental Neurology, 77, pages 563--577, 1982.

Cited By

View all
  • (2022)A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW56347.2022.00552(5037-5046)Online publication date: Jun-2022
  • (2021)Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW53098.2021.00352(3147-3155)Online publication date: Jun-2021
  • (2018)Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural NetworksProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3242973(191-199)Online publication date: 2-Oct-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
November 2010
311 pages
ISBN:9781450304146
DOI:10.1145/1891903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Gaussian mixture
  2. Gibbs sampler
  3. Markov Chain Monte Carlo
  4. dynamic Bayesian network
  5. eye-gaze
  6. face-to-face multiparty conversation
  7. image-based gaze detection
  8. visual focus of attention

Qualifiers

  • Research-article

Conference

ICMI-MLMI '10
Sponsor:

Acceptance Rates

ICMI-MLMI '10 Paper Acceptance Rate 41 of 100 submissions, 41%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW56347.2022.00552(5037-5046)Online publication date: Jun-2022
  • (2021)Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW53098.2021.00352(3147-3155)Online publication date: Jun-2021
  • (2018)Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural NetworksProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3242973(191-199)Online publication date: 2-Oct-2018
  • (2018)Behavioral Analysis of Kinetic Telepresence for Small Symmetric Group-to-Group MeetingsIEEE Transactions on Multimedia10.1109/TMM.2017.277139620:6(1432-1447)Online publication date: Jun-2018
  • (2018)Predicting Turn-Taking by Compact Gazing Transition Patterns in Multiparty ConversationImage and Video Technology10.1007/978-3-319-75786-5_35(437-447)Online publication date: 15-Feb-2018
  • (2017)A Multifaceted Study on Eye Contact based Speaker Identification in Three-party ConversationsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025644(3011-3021)Online publication date: 2-May-2017
  • (2017)Collective First-Person Vision for Automatic Gaze Analysis in Multiparty ConversationsIEEE Transactions on Multimedia10.1109/TMM.2016.260800219:1(107-122)Online publication date: 1-Jan-2017
  • (2017)Analysis of Small GroupsSocial Signal Processing10.1017/9781316676202.025(349-367)Online publication date: 13-Jul-2017
  • (2016)MMSpace: Kinetically-augmented telepresence for small group-to-group conversations2016 IEEE Virtual Reality (VR)10.1109/VR.2016.7504684(19-28)Online publication date: Mar-2016
  • (2016)Inference of Conversation Partners by Cooperative Acoustic Sensing in Smartphone NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2015.246537615:6(1387-1400)Online publication date: 1-Jun-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media