Perceiving Visual Emotions with Speech

Zhigang Deng²³,
Jeremy Bailenson²⁴,
J. P. Lewis²⁵ &
…
Ulrich Neumann²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4133))

Included in the following conference series:

International Workshop on Intelligent Virtual Agents

2083 Accesses
10 Citations

Abstract

Embodied Conversational Agents (ECAs) with realistic faces are becoming an intrinsic part of many graphics systems employed in HCI applications. A fundamental issue is how people visually perceive the affect of a speaking agent. In this paper we present the first study evaluating the relation between objective and subjective visual perception of emotion as displayed on a speaking human face, using both full video and sparse point-rendered representations of the face. We found that objective machine learning analysis of facial marker motion data is correlated with evaluations made by experimental subjects, and in particular, the lower face region provides insightful emotion clues for visual emotion perception. We also found that affect is captured in the abstract point-rendered representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modeling Multimodal Behaviors from Speech Prosody

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Article 01 September 2022

Emotional Valence Recognition on Virtual, Robotic, and Human Faces: a Comparative Study

Article 08 October 2020

References

Ahlberg, J., Pandzic, I.S., You, L.: Evaluating MPEG-4 Facial Animation Players. In: Pandzic, I.S., Forchhimer, R. (eds.) MPEG-4 Facial Animation: the standard, implementation and applications, pp. 287–291 (2002)
Google Scholar
Andre, E., Rist, M., Muller, J.: Guiding the User through Dynamically Generated Hypermedia Presentations with a Life-like Character. In: IUI 1998, pp. 21–28 (1998)
Google Scholar
Bassili, J.N.: Emotion Recognition: The Role of Facial Movement and the Relative Importance of Upper and Lower Areas of the Face. Journal of the Personality and Social Psychology (37), 2049–2058 (1979)
Google Scholar
Blanz, V., Basso, C., Poggio, T., Vetter, T.: Reanimating Faces in Images and Video. Computer Graphics Forum 22(3) (2003)
Google Scholar
Brand, M.: Voice Puppetry. In: Proc. of ACM SIGGRAPH 1999, pp. 21–28. ACM Press, New York (1999)
Google Scholar
Bregler, C., Covell, M., Slaney, M.: Video Rewrite: Driving Visual Speech with Audio. In: Proc. of ACM SIGGRAPH 1997, pp. 353–360. ACM Press, New York (1997)
Google Scholar
Busso, C., Deng, Z., Neumann, U., Narayanan, S.: Natural Head Motion Synthesis Driven by Acoustic Prosody Features. The Journal of Computer Animation and Virtual Worlds 16(3-4), 283–290 (2005)
Article Google Scholar
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated Conversation: Rule-Based Generation of Facial Expression, Gesture and Spoken intonation for Multiple Conversational Agents. In: Proc. of ACM SIGGRAPH 1994, pp. 413–420. ACM Press, New York (1994)
Chapter Google Scholar
Cassell, J., Sullivan, J., Prevost, S., Churchill, E.: Embodied Conversational Agents. MIT Press, Cambridge (2000)
Google Scholar
Chuang, E.S., Deshpande, H., Bregler, C.: Facial Expression Space Learning. In: Proc. of Pacific Graphics 2002, pp. 68–76 (2002)
Google Scholar
Cohen, M.M., Massaro, D.W.: Modeling Coarticulation in Synthetic Visual Speech. In: Magnenat-Thalmann, N., Thalmann, D. (eds.) Models and Techniques in Computer Animation, pp. 139–156. Springer, Heidelberg (1993)
Google Scholar
Costantini, E., Pianesi, F., Cosi, P.: Evaluation of Synthetic Faces: Human Recognition of Emotional Facial Displays. In: Dybkiaer, L., Minker, W., Heisterkamp, P. (eds.) Affective Dialogue Systems (2004)
Google Scholar
Costantini, E., Pianesi, F., Prete, M.: Recognising emotions in human and synthetic faces: the role of the upper and lower parts of the face. In: Proc. of IUI 2005, pp. 20–27. ACM Press, New York (2005)
Chapter Google Scholar
Deng, Z., Neumann, U., Lewis, J.P., Kim, T.Y., Bulut, M., Narayanan, S.: Expressive Facial Animation Synthesis by Learning Speech Co-Articulation and Expression Space. IEEE Transaction on Visualization and Computer Graphics 12(6) (November/December 2006)
Google Scholar
Deng, Z., Bulut, M., Neumann, U., Narayanan, S.: Automatic Dynamic Expression Synthesis for Speech Animation. In: Proc. of IEEE Computer Animation and Social Agents 2004, July 2004, pp. 267–274 (2004)
Google Scholar
Deng, Z., Lewis, J.P., Neumann, U.: Synthesizing Speech Animation by Learning Compact Speech Co-Articulation Models. In: Proc. of Computer Graphics International 2005, June 2005, pp. 19–25 (2005)
Google Scholar
Ekman, P., Friesen, W.V.: Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues. Prentice-Hall, Englewood Cliffs (1975)
Google Scholar
Essa, I.A., Pentland, A.P.: Coding, Analysis, Interpretation, and Recognition of Facial Expressions. IEEE Transaction on Pattern Analysis and Machine Intelligence 19(7), 757–763 (1997)
Article Google Scholar
Ezzat, T., Geiger, G., Poggio, T.: Trainable Videorealistic Speech Animation. ACM Trans. Graph. 21(3), 388–398 (2002)
Article Google Scholar
Gratch, J., Marsella, S.: Evaluating a Computational Model of Emotion. Journal of Autonomous Agents and Multiagent Systems 11(1), 23–43
Google Scholar
Gratch, J., Rickel, J., Andre, E., Badler, N., Cassell, J., Petajan, E.: Creating Interactive Virtual Humans: Some Assembly Required. IEEE Intelligent Systems, 54–63 (July/August 2002)
Google Scholar
Hastie, T., Ribshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)
MATH Google Scholar
Katsyri, J., Klucharev, V., Frydrych, M., Sams, M.: Identification of Synthetic and Natural Emotional Facial Expressions. In: Proc. of AVSP 2003, pp. 239–244 (2003)
Google Scholar
Kshirsagar, S., Thalmann, N.M.: Visyllable Based Speech Animation. Computer Graphics Forum 22(3) (2003)
Google Scholar
Walker, J.H., Sproull, L., Subramani, R.: Using a human face in an interface. In: Proc. of CHI 1994, pp. 85–91. ACM Press, New York (1994)
Google Scholar
Lee, Y., Terzopoulos, D., Waters, K.: Realistic modeling for facial animation. In: Proc. of ACM SIGGRAPH 1995, pp. 55–62. ACM Press, New York (1995)
Google Scholar
Lewis, J.P.: Automated lip-sync: Background and techniques. J. of Visualization and Computer Animation, 118–122 (1991)
Google Scholar
Lewis, J.P., Purcell, P.: Soft Machine: A Personable Interface. In: Proc. of Graphics Interface, vol. 84, pp. 223–226.
Google Scholar
Marsella, S., Gratch, J.: Modeling the Interplay of Plans and Emotions in Multi-Agent Simulations. In: Proc. of the Cognitive Science Society (2001)
Google Scholar
Nass, C., Kim, E.Y., Lee, E.J.: When My Face is the Interface: An Experimental Comparison of Interacting with One’s Own Face or Someone Else’s Face. In: Proc. of CHI 1998, pp. 148–154. ACM Press, New York (1998)
Google Scholar
Noh, J.Y., Neumann, U.: Expression Cloning. In: Proc. of ACM SIGGRAPH 2001, pp. 277–288. ACM Press, New York (2001)
Google Scholar
Pandzic, I.S., Ostermann, J., Millen, D.: User evaluation: synthetic talking faces for interactive services. The Visual Computer 15, 330–340 (1999)
Article Google Scholar
Parke, F.: Computer Generated Animation of Faces. In: Proc. ACM Nat’l Conf., pp. 451–457. ACM Press, New York (1972)
Google Scholar
Pelachaud, C., Badler, N., Steedman, M.: Linguistic Issues in Facial Animation. In: Proc. of Computer Animation 1991 (1991)
Google Scholar
Pelachaud, C., Badler, N., Steedman, M.: Generating Facial Expressions for Speech. Cognitive Science 20(1), 1–46 (1994)
Article Google Scholar
Rist, M., Andre, E., Muller, J.: Adding animated presentation agents to the interface. In: IUI 1997: Proc. of Intelligent user interfaces, pp. 79–86. ACM Press, New York (1997)
Chapter Google Scholar
Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A. 4(3), 519–524 (1987)
Article Google Scholar
Turk, M.A., Pentland, A.P.: Face Recognition Using Eigenfaces. In: IEEE CVPR 1991, pp. 586–591 (1991)
Google Scholar
Uttkay, Z., Doorman, C., Noot, H.: Evaluating ECAs - What and How? In: Proc. of the AAMAS 2002 Workshop on Embodied Conversational Agents (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, Houston, TX
Zhigang Deng
Department of Communication, Stanford University, CA
Jeremy Bailenson
Computer Graphics Lab, Stanford University, CA
J. P. Lewis
Department of Computer Science, University of Southern California, Los Angeles, CA
Ulrich Neumann

Authors

Zhigang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Bailenson
View author publications
You can also search for this author in PubMed Google Scholar
J. P. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Creative Technologies, University of Southern California, 13274 Fiji Way, 90292, Marina Del Rey, CA, USA
Jonathan Gratch
Music Department, Goldsmiths, University of London, New Cross, SE14 6NW, London, UK
Michael Young
School of Mathematical and Computer Sciences, Heriot Watt University, EH14 4AS, Edinburgh, Scotland
Ruth Aylett
BT plc, Adastral Park, IP5 3RE, Ipswich, UK
Daniel Ballin
School of Computing Science, Culture Lab, Newcastle University, NE1 7RU, Newcastle upon Tyne, UK
Patrick Olivier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, Z., Bailenson, J., Lewis, J.P., Neumann, U. (2006). Perceiving Visual Emotions with Speech. In: Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P. (eds) Intelligent Virtual Agents. IVA 2006. Lecture Notes in Computer Science(), vol 4133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11821830_9

Download citation

DOI: https://doi.org/10.1007/11821830_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37593-7
Online ISBN: 978-3-540-37594-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Perceiving Visual Emotions with Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Emotional Valence Recognition on Virtual, Robotic, and Human Faces: a Comparative Study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Perceiving Visual Emotions with Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Emotional Valence Recognition on Virtual, Robotic, and Human Faces: a Comparative Study

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation