Abstract
This article deals with multimodal feedback in two Danish multimodal corpora, i.e., a collection of map-task dialogues and a corpus of free conversations in first encounters between pairs of subjects. Machine learning techniques are applied to both sets of data to investigate various relations between the non-verbal behaviour—more specifically head movements and facial expressions—and speech with regard to the expression of feedback. In the map-task data, we study the extent to which the dialogue act type of linguistic feedback expressions can be classified automatically based on the non-verbal features. In the conversational data, on the other hand, non-verbal and speech features are used together to distinguish feedback from other multimodal behaviours. The results of the two sets of experiments indicate in general that head movements, and to a lesser extent facial expressions, are important indicators of feedback, and that gestures and speech disambiguate each other in the machine learning process.
Similar content being viewed by others
References
Allwood J, Cerrato L, Jokinen K, Navarretta C, Paggio P (2007) The MUMIN coding scheme for the annotation of feedback, turn management and sequencing: multimodal corpora for modelling human multimodal behaviour. Int J Lang Resour Eval 41(3-4):273–287 (Special Issue)
Allwood J, Nivre J, Ahlsén E (1992) On the semantics and pragmatics of linguistic feedback. J Semant 9:1–26
Boersma P, Weenink D (2009) Praat: doing phonetics by computer (version 5.1.05) [computer program]. Retrieved 1 May 1 2009, from http://www.praat.org/
Busso C, Narayanan SS (2007) Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans Audio Speech Lang Process 15:2331–2347
Cerrato L (2007) Investigating communicative feedback phenomena across languages and modalities: PhD thesis. KTH, Speech and Music Communication, Stockholm
Duncan S (1972) Some signals and rules for taking speaking turns in conversations. J Pers Soc Psychol 23:283–292
Fujie S, Ejiri Y, Nakajima N, Matsusaka Y, Kobayashi T (2004) A conversation robot using head gesture recognition as para-linguistic information. In: Proceedings of the 13th IEEE international workshop on robot and human interactive, communication, September 2004, pp 159–164
Gregersen F, Pedersen IL (eds) (1991) The Copenhagen study in urban sociolinguistics. Reitzel, Copenhagen
Grønnum N (2006) DanPASS: a Danish phonetically annotated spontaneous speech corpus. In: Calzolari N, Choukri K, Gangemi A, Maegaard B, Mariani J, Odijk J, Tapias D (eds) Proceedings of the 5th LREC, Genoa, May, pp 1578–1583
Hadar U, Steiner TJ, Rose FC (1985) Head movement during listening turns in conversation. J Nonverbal Behav 9(4):214–228
Hager JC, Ekman P et al (1983) The inner and outer meanings of facial expressions, chapter 10. In: Cacioppo JT, Petty RE (eds) Social psychophysiology: a sourcebook. The Guilford Press, New York
Jokinen K, Navarretta C, Paggio P (2008) Distinguishing the communicative functions of gestures. In: Proceedings of the 5th MLMI, LNCS 5237, Utrecht, September 2008. Springer, pp 38–49
Kendon A (1972) Some relationships between body motion and speech. In: Seigman A, Pope B (eds) Studies in dyadic communication. Pergamon Press, Elmsford, p 216
Kipp M (2004) Gesture Generation by imitation—from human behavior to computer character animation: PhD thesis. Saarland University, Saarbruecken, Germany, Boca Raton, Florida. http://www.dissertation.com. Accessed 7 Aug 2012
Loehr DP (2004) Gesture and intonation. PhD thesis, Georgetown University, Washington
Lu J, Allwood J (2011) Unimodal and multimodal feedback in Chinese and Swedish monocultural and intercultural interactions (a pilot study). In: Paggio p, Ahlsén E, Allwood J, Jokinen K, Navarretta C (eds) Proceedings of the 3rd Nordic symposium on multimodal communication, NEALT, May 27–28 2011, pp 40–47. Available at: http://dspace.utlib.ee/dspace/handle/10062/22532. Accessed 7 Aug 2012
Lu J, Allwood J, Ahlsén E (2011) A study on cultural variations of smile based on empirical recordings of Chinese and Swedish first encounters. Talk given at the workshop on Multimodal Corpora at ICMI-MLMI, Alicante, Spain
Maynard S (1987) Interactional functions of a nonverbal sign: head movement in Japanese dyadic casual conversation. J Pragmat 11:589–606
McClave E (2000) Linguistic functions of head movements in the context of speech. J Pragmat 32:855–878
Morency L-P, de Kok I, Gratch J (2009) A probabilistic multimodal approach for predicting listener backchannels. Auton Agents Multi-Agent Syst 20:70–84
Morency L-P, Sidner C, Lee C, Darrell T (2005) Contextual recognition of head gestures. In: Proceedings of the international conference on multi-modal interfaces
Morency L-P, Sidner C, Lee C, Darrell T (2007) Head gestures for perceptual interfaces: the role of context in improving recognition. Artif Intell 171(8–9):568–585
Murray G, Renals S (2008) Detecting action meetings in meetings. In: Proceedings of the 5th MLMI, LNCS 5237, Utrecht, The Netherlands, September 2008. Springer, pp 208–213
Navarretta C, Paggio P (2010) Classification of feedback expressions in multimodal data. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pages 318–324
Akker H, Schulz C (2008) Exploring features and classifiers for dialogue act segmentation. In: Proceedings of the 5th MLMI, pp 196–207
Paggio P (2006) Annotating information structure in a corpus of spoken Danish. In: Proceedings of the 5th international conference on language resources and evaluation LREC2006, Genova, Italy, 2006, pp 1606–1609
Paggio P (2006) Information structure and pauses in a corpus of spoken Danish. In: Conference companion of the 11th conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 2006, pp 191–194
Paggio P, Allwood J, Ahlsén E, Jokinen K, Navarretta C (2010) The NOMCO multimodal Nordic resource: goals and characteristics. In: Proceedings of the seventh conference on international language resources and evaluation (LREC 2010), Valletta, Malta, 2010. European Language, Resources Association (ELRA)
Paggio P, Diderichsen P (2010) Information structure and communicative functions in spoken and multimodal data. In: Henriksen PJ (ed) Linguistic theory and raw sound: volume 49 of Copenhagen Studies in Language. Samfundslitteratur, Denmark, pp 149–168
Paggio P, Navarretta C (2010) Feedback in head gesture and speech. In: Kipp M, Martin JC, Paggio P, Heylen D (eds) Proceedings of the workshop on Multimodal Corpora held in conjunction with LREC 2010, Malta, May 17 2010, pp 1–4
Paggio P, Navarretta C (2011) Feedback and gestural behaviour in a conversational corpus of Danish. In: Paggio P, Ahlsén E, Allwood J, Jokinen K, Navarretta C (eds) Proceedings of the 3rd Nordic symposium on multimodal communication, NEALT, May 27–28, 2011, pp 33–39. Available at: http://dspace.utlib.ee/dspace/handle/10062/22532
Paggio P, Navarretta C (2011) Learning to classify the feedback function of head movements in a Danish corpus of first encounters. In: Talk given at the workshop on Multimodal Corpora at ICMI, November 2011, Alicante. Spain
Reidsma D, Heylen D, Akker R (2009) On the contextual analysis of agreement scores. In: Kipp M, Martin J-C, Paggio P, Heylen D (eds) Multimodal corpora from models of natural interaction to systems and applications, number 5509 in LNAI. Springer, pp 122–137
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
Yngve V (1970) On getting a word in edgewise. In: Papers from the sixth regional meeting of the Chicago Linguistic Society, pp 567–578
Zhang H, Jiang L, Su J (2005) Hidden naive bayes. In: Proceedings of the twentieth national conference on artificial intelligence, pp 919–924
Acknowledgments
This work has been carried out in the context of the VKK and NOMCO projects. VKK is funded by the Danish Research Council for the Humanities. NOMCO is funded by the NORDCORP programme under the Nordic Research Councils for the Humanities and the Social Sciences (NOS-HS). We would like to thank our partners in the NOMCO project Elisabeth Ahlsén, Jens Allwood and Kristiina Jokinen, the annotators Sara Andersen, Josephine B. Arrild, Anette Studsgård and Bjørn Wesseltolvig, as well as the two anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paggio, P., Navarretta, C. Head movements, facial expressions and feedback in conversations: empirical evidence from Danish multimodal data. J Multimodal User Interfaces 7, 29–37 (2013). https://doi.org/10.1007/s12193-012-0105-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-012-0105-9