Abstract
Embodied agents have great potential for education field, which are promising to maximize the learner’s learning gains and enjoyment. In many education applications, multimodal representation of embodied agents is a powerful approach for obtaining the above benefit, which requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization method. Our case study reveals that two issues (i.e. duration and timing) play an important role in synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we employ a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the gesture in a second step. Subjective evaluation has demonstrated that the proposed method achieves more accurate synchronization with respect to both duration and timing, and higher motion quality than the state-of-the-art methods.
Furthermore, we have implemented the proposed synchronization method in an authoring tool for education applications. We have conducted several experiments in a university, whose results have demonstrated that our system makes the creation of attractive animations easier and faster (only about 10 % operation time) than manual creation of equal quality, and it is effective to use embodied agents in education applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A growth point is assumed to be a minimal psychological unit with special focus on speech-gesture synchrony and co-expressivity.
- 2.
T-test is most commonly applied to determine if two sets of data are significantly different from each other when the samples follow a normal distribution.
- 3.
A lower p value means a higher probability that a significant difference exists.
- 4.
In this paper, the so called MikuMikuDance has two meanings. First, it may mean the authoring tool used in Sect. 3.1. Second, it may mean the specifications for mesh model, motion, and other data in animation.
References
Arikan, O., Forsyth, D.: Interactive motion generation from examples. ACM Trans. Graph. 21(3), 483–490 (2002)
Arikan, O., Forsyth, D.A., O’Brien, J.F.: Motion synthesis from annotations. ACM Trans. Graph. 22(3), 402–408 (2003)
Beaudoin, P., Coros, S., van de Panne, M., Poulin, P.: Motion-motif graphs. In: SCA 2008, pp. 117–126 (2008)
Beskow, J., Engwall, O., Granstrom, B., Wik, P.: Design strategies for a virtual language tutor. In: INTERSPEECH-2004, pp. 1693–1696 (2004)
Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H., Yan, H.: Embodiment in conversational interfaces: Rea. In: CHI 1999, pp. 520–527 (1999)
Cassell, J., Sullivan, J., Prevost, S., Churchill, E.F.: Embodied Conversational Agents, 1st edn. The MIT Press, Cambridge (2000)
Cassell, J., Vilhjálmsson, H.H., Bickmore, T.: Beat: the behavior expression animation toolkit. In: ACM SIGGRAPH 2001, pp. 477–486 (2001)
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, New York (2001)
Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM. A Human Face, Salt Lake City (2002)
Forsyth, D.A., Arikan, O., Ikemoto, L., O’Brien, J.F.: Computational studies of human motion: Part 1, tracking and motion synthesis. Found. Trends Comput. Graph. Vis. 1(2), 77–254 (2006)
Gleicher, M., Shin, H.J., Kovar, L., Jepsen, A.: Snap-together motion: assembling run-time animations. In: I3D 2003, pp. 181–188 (2003)
Gulz, A., Haake, M., Silvervarg, A.: Extending a teachable agent with a social conversation module – effects on student experiences and learning. In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS, vol. 6738, pp. 106–114. Springer, Heidelberg (2011)
Huang, J., Pelachaud, C.: Expressive body animation pipeline for virtual agent. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 355–362. Springer, Heidelberg (2012)
Ieronutti, L., Chittaro, L.: Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 49(1), 93–109 (2007)
Kopp, S., Krenn, B., Marsella, S.C., Marshall, A.N., Pelachaud, C., Pirker, H., Thórisson, K.R., Vilhjálmsson, H.H.: Towards a common framework for multimodal generation: the behavior markup language. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 205–217. Springer, Heidelberg (2006)
Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Trans. Graph. 21(3), 473–482 (2002)
Lee, J., Chai, J., Reitsma, P., Hodgins, J., Pollard, N.: Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21(3), 491–500 (2002)
Lee, J., Lee, K.H.: Precomputing avatar behavior from human motion data. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 79–87 (2004)
van Luin, J., op den Akker, R., Nijholt, A.: A dialogue agent for navigation support in virtual reality. In: CHI EA 2001, pp. 117–118 (2001)
Maldonado, H., Lee, J.E.R., Brave, S., Nass, C., Nakajima, H., Yamada, R., Iwamura, K., Morishima, Y.: We learn better together: enhancing elearning with emotional characters. In: CSCL 2005, pp. 408–417 (2005)
Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., Shapiro, A.: Virtual character performance from speech. In: SCA 2013, pp. 25–35 (2013)
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746–748 (1976)
McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)
McNeill, D.: So you think gestures are nonverbal? Psychol. Rev. 92(3), 350–371 (1985)
Miller, L.M., D’Esposito, M.: Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J. Neurosci. 25(25), 5884–5893 (2005)
Mizuguchi, M., Buchanan, J., Calvert, T.: Data driven motion transitions for interactive games. In: Proceedings of EUROGRAPHICS 2001 short papers (2001)
Neff, M., Kipp, M., Albrecht, I., Seidel, H.P.: Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Trans. Graph. 27(1), 5:1–5:24 (2008)
Ng-Thow-Hing, V., Luo, P., Okita, S.: Synchronized gesture and speech production for humanoid robots. In: IEEE/RSJ IROS 2010, pp. 4617–4624 (2010)
Niewiadomski, R., Bevacqua, E., Mancini, M., Pelachaud, C.: Greta: an interactive expressive ECA system. In: AAMAS 2009, vol. 2. pp. 1399–1400 (2009)
Nishida, T.: Conversational Informatics: An Engineering Approach. Wiley, New York (2007)
Noma, T., Zhao, L., Badler, N.: Design of a virtual human presenter. IEEE Comput. Graph. Appl. 20(4), 79–85 (2000)
Ogan, A., Finkelstein, S., Mayfield, E., D’Adamo, C., Matsuda, N., Cassell, J.: “oh dear stacy!": Social interaction, elaboration, and learning with teachable agents. In: CHI 2012, pp. 39–48 (2012)
Oura, K., Yamamoto, D., Takumi, I., Lee, A., Tokuda, K.: On-campus, user-participatable, and voice-interactive digital signage. J. Jpn Soc. Artif. Intell. 28(1), 60–67 (2013)
Reitsma, P.S.A., Pollard, N.S.: Evaluating motion graphs for character animation. ACM Trans. Graph. 26(4), 18 (2007)
Ren, C., Zhao, L., Safonova, A.: Human motion synthesis with optimization-based graphs. Comput. Graph. Forum 29(2), 545–554 (2010)
Rist, T., Andr, E., Baldes, S., Gebhard, P., Klesen, M., Kipp, M., Rist, P., Schmitt, M.: A review of the development of embodied presentation agents and their application fields. In: Prendinger, H., Ishizuka, M. (eds.) Life-Like Characters. Cognitive Technologies, pp. 377–404. Springer, Berlin (2004)
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980)
Safonova, A., Hodgins, J.K.: Construction and optimal search of interpolated motion graphs. ACM Trans. Graph. 26(3), 106 (2007)
Shoemake, K.: Animating rotation with quaternion curves. In: ACM SIGGRAPH 1985, pp. 245–254 (1985)
Soliman, M., Guetl, C.: Intelligent pedagogical agents in immersive virtual learning environments: a review. In: MIPRO 2010, pp. 827–832 (2010)
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., Bregler, C.: Speaking with hands: creating animated conversational characters from recordings of human performance. ACM Trans. Graph. 23(3), 506–513 (2004)
Čerekovič, A., Pandžič, I.: Multimodal behavior realization for embodied conversational agents. Multimedia Tools Appl. 54(1), 143–164 (2011)
Wang, J., Bodenheimer, B.: An evaluation of a cost metric for selecting transitions between motion segments. In: SCA 2003, pp. 232–238 (2003)
Xu, J., Myodo, E., Sakazawa, S.: Motion synthesis for affective agents using piecewise principal component regression. In: IEEE ICME 2013, pp. 1–7 (2013)
Xu, J., Takagi, K., Sakazawa, S.: Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs. IEEE ICME 2011, pp. 1–6 (2011)
Zhao, L., Safonova, A.: Achieving good connectivity in motion graphs. Graph. Models 71(4), 139–152 (2009)
Acknowledgments
All the participants in our experiments, especially Prof. Shirotomo Aizawa and his students in Nagoya University of Arts and Sciences, Japan, are greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Xu, J., Nagai, Y., Takayama, S., Sakazawa, S. (2015). Developing Embodied Agents for Education Applications with Accurate Synchronization of Gesture and Speech. In: Nguyen, N., Kowalczyk, R., Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Transactions on Computational Collective Intelligence XX . Lecture Notes in Computer Science(), vol 9420. Springer, Cham. https://doi.org/10.1007/978-3-319-27543-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27543-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27542-0
Online ISBN: 978-3-319-27543-7
eBook Packages: Computer ScienceComputer Science (R0)