Towards Learning Nonverbal Identities from the Web: Automatically Identifying Visually Accentuated Words

AmirAli B. Zadeh²¹,
Kenji Sagae²¹ &
Louis Philippe Morency²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8637))

Included in the following conference series:

International Conference on Intelligent Virtual Agents

3027 Accesses

Abstract

This paper presents a novel long-term idea to learn automatically from online multimedia content, such as videos from YouTube channels, a portfolio of nonverbal identities in the form of computational representation of prototypical gestures of a speaker. As a first step towards this vision, this paper presents proof-of-concept experiments to automatically identify visually accentuated words from a collection of online videos of the same person. The experimental results are promising with many accentuated words automatically identified and specific head motion patterns were associated with these words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Bluegrass corpus: Audio-visual stimuli to investigate foreign accents

Article 04 May 2021

Learnable PINs: Cross-modal Embeddings for Person Identity

Case Study: The AusTalk Corpus

References

Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Douville, B., Prevost, S., And Stone, M.: Animated conversation: Rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents 413–420 (1994)
Google Scholar
Decarlo, D., Stone, M., Revilla, C., And Venditti, J.J.: Specifying and animating fa-cial signals for discourse in embodied conversational agents. Computer Animation and Virtual Worlds 15(1), 27–38 (2004)
Article Google Scholar
Bergmann, K., And Kopp, S.: Increasing the expressiveness of virtual agents: auto-nomous generation of speech and gesture for spatial description tasks. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-, vol. 1, pp. 361–368 (2009)
Google Scholar
Neff, M., Kipp, M., Albrecht, I., And Seidel, H.-P.: Gesture modeling and animation based on a probabilistic recreation of speaker style. ACM Transactions on Graphics 27(1), 5 (2008)
Article Google Scholar
Lee, J., Marsella, S.C.: Nonverbal behavior generator for embodied conversational agents. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 243–255. Springer, Heidelberg (2006)
Chapter Google Scholar
Stone, M., Decarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: Creating animated conversational characters from recordings of human performance. In: Proc. SIGGRAPH 2004, pp. 506–513 (2004)
Google Scholar
Busso, C., Deng, Z., Grimm, M., Neumann, U., And Narayanan, S.: Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Transactions on Audio, Speech, and Language Processing 15(3), 1075–1086 (2007)
Article Google Scholar
Albrecht, I., Haber, J., Peter Seidel, H.: Automatic generation of non-verbal facial expressions from speech. In: Proc. Computer Graphics International 2002, pp. 283–293 (2002)
Google Scholar
Levine, S., Krähenbühl, P., Thrun, S., And Koltun, V.: Gesture controllers. In: ACM SIGGRAPH 2010 papers, SIGGRAPH 2010, pp. 124:1–124:11. ACM, New York (2010)
Google Scholar
Yuan, J., Liberman, M.: Speaker identification on the SCOTUS corpus. In: Proceedings of Acoustics, pp. 5687–5690 (2008)
Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its applica-tions to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2013)
Google Scholar
Brand, M.: Voice puppetry. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1999, pp. 21–28. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA (1999)
Chapter Google Scholar
Cassel, J., Vilhjálmsson, H., And Bickmore, T.: BEAT: The Behavior Expression Animation Toolkit. In: Proc. SIGGRAPH 2001, pp. 477–486 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Creative Technologies, University of Southern California, 12015 E Waterfront Drive, Playa Vista, CA, 90094-2546, USA
AmirAli B. Zadeh, Kenji Sagae & Louis Philippe Morency

Authors

AmirAli B. Zadeh
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Sagae
View author publications
You can also search for this author in PubMed Google Scholar
Louis Philippe Morency
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computer and Information Science, Northeastern University, 202 West Village H, 360 Huntington Avenue, 02115, Boston, MA, USA
Timothy Bickmore & Stacy Marsella &
Department of Computer Science, Worcester Polytechnic Institute, 01609, Worcester, MA, USA
Candace Sidner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zadeh, A.B., Sagae, K., Morency, L.P. (2014). Towards Learning Nonverbal Identities from the Web: Automatically Identifying Visually Accentuated Words. In: Bickmore, T., Marsella, S., Sidner, C. (eds) Intelligent Virtual Agents. IVA 2014. Lecture Notes in Computer Science(), vol 8637. Springer, Cham. https://doi.org/10.1007/978-3-319-09767-1_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-09767-1_60
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09766-4
Online ISBN: 978-3-319-09767-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Learning Nonverbal Identities from the Web: Automatically Identifying Visually Accentuated Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Bluegrass corpus: Audio-visual stimuli to investigate foreign accents

Learnable PINs: Cross-modal Embeddings for Person Identity

Case Study: The AusTalk Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards Learning Nonverbal Identities from the Web: Automatically Identifying Visually Accentuated Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Bluegrass corpus: Audio-visual stimuli to investigate foreign accents

Learnable PINs: Cross-modal Embeddings for Person Identity

Case Study: The AusTalk Corpus

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation