Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

People Interpret Robotic Non-linguistic Utterances Categorically

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

We present results of an experiment probing whether adults exhibit categorical perception when affectively rating robot-like sounds (Non-linguistic Utterances). The experimental design followed the traditional methodology from the psychology domain for measuring categorical perception: stimulus continua for robot sounds were presented to subjects, who were asked to complete a discrimination and an identification task. In the former subjects were asked to rate whether stimulus pairs were affectively different, while in the latter they were asked to rate single stimuli affectively. The experiment confirms that Non-linguistic Utterances can convey affect and that they are drawn towards prototypical emotions, confirming that people show categorical perception at a level of inferred affective meaning when hearing robot-like sounds. We speculate on how these insights can be used to automatically design and generate affect-laden robot-like utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Hackett [28] proposes in total 13 properties that are universal to language, however the remaining properties (the vocal-auditory channel, broadcast transmission and directional reception, rapid fading, specialisation and total feedback were the listener can reproduce what they hear) relate specifically language through vocal/acoustic expression, and in the light of artificial languages such as sign language or programming languages, their value with respect to the broader concept of language is deemed as limited.

  2. We argue that NLUs do not contain linguistic semantic content. They do however contain semantic content in the same way that the audible sounds made by computers, smart-phones, etc, contain semantic content.

  3. Given the close resemblance between Gibberish Speech and Natural Language, it may be argued that Gibberish Speech could be perceived as a foreign language rather than meaningless nonsense to the naive observer.

  4. Such settings tend to be in dynamic and unpredictable real world environments that are far from the protected and controlled, “safe” laboratory environments.

  5. To listen to the utterances, please refer to the Online Resources. Resources 1–6 are the utterances in Set 1, and resources 7–12 are for Set 2.

  6. Python and Java source code for the AffectButton can be downloaded at http://www.joostbroekens.com/

  7. Broekens et al. [9] provide a detailed description of the AffectButton functionality and so this will not be described here.

  8. The basic emotion theory as proposed by Ekman and Friesen [19] states that there are certain facial behaviours which are universally associated with particular emotions, namely anger, happiness, sadness, surprise, fear and disgust.

  9. By “minimal situational context” we refer to the fact that the robot did not engage in vocal interaction, nor did the robot and subject engage in a complex interaction (e.g. a game of chess). Subjects were simply asked to rate sounds made by the robot, with the knowledge that the sounds were pre-recorded, and touching the robot on the head would play the next sound. In this scenario, there are no other cues that subjects can turn to in order to aid in the interpretation of the sounds.

References

  1. Banse R, Scherer K (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636

    Article  Google Scholar 

  2. Banziger T, Scherer K (2005) The role of intonation in emotional expressions. Speech Commun 46(3–4):252–267

    Article  Google Scholar 

  3. Beck A, Stevens B, Bard KA, Cañamero L (2012) Emotional body language displayed by artificial agents. Trans Interact Intell Syst 2(1):1–29

    Article  Google Scholar 

  4. Bimler D, Kirkland J (2001) Categorical perception of facial expressions of emotion: evidence from multidimensional scaling. Cogn Emot 15(5):633–658

    Article  Google Scholar 

  5. Blattner M, Sumikawam D, Greenberg R (1989) Earcons and icons: their structure and common design principles. Hum Comput Interact 4:11–44

    Article  Google Scholar 

  6. Bornstein MH, Kessen W, Weiskopf S (1976) Color vision and Hue categorization in young human infants. Human perception and performance. J Exp Psychol 2(1):115–129

    Google Scholar 

  7. Breazeal C (2002) Designing sociable robots. The MIT Press, Cambridge

    Google Scholar 

  8. Breazeal C (2003) Emotion and sociable humanoid robots. Int J Hum Comput Stud 59(1–2):119–155

    Article  Google Scholar 

  9. Broekens J, Brinkman WP (2013) Affectbutton: a method for reliable and valid affective self-report. Int J Hum Comput Stud 71(6):641–667

    Article  Google Scholar 

  10. Broekens J, Pronker A, Neuteboom M (2010) Real time labelling of affect in music using the affect button. In: Proceedings of the 3rd international workshop on affective interaction in natural environments (AFFINE 2010) at ACM multimedia 2010. ACM, Firenze, pp 21–26

  11. Cassell J (1998) A framework for gesture generation and interpretation. In: Cipolla R, Pentland A (eds) Computer vision for human–machine interaction. Cambridge University Press, Cambridge, pp 191–216

    Chapter  Google Scholar 

  12. Cheal JL, Rutherford MD (2011) Categorical perception of emotional facial expressions in preschoolers. J Exp Child Psychol 110(3):434–443

    Article  Google Scholar 

  13. Cowie R, Cornelius R (2003) Describing the emotional states that are expressed in speech. Speech Commun 40(1–2):5–32

    Article  MATH  Google Scholar 

  14. Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) ’FEELTRACE’: An instrument for recording perceived emotion in real time. In: Proceedings of the ISCA tutorial and research workshop (ITRW) on speech and emotion. Newcastle, pp 19–24

  15. Delaunay F, de Greeff J, Belpaeme T (2009) Towards retro-projected robot faces: An alternative to mechatronic and android faces. In: Proceedings of the 18th international symposium on robot and human interactive communication (ROMAN 2009). Toyama, pp 306–311

  16. Delaunay F, de Greeff J, Belpaeme T (2010) A study of a retro-projected robotic face and its effectiveness for gaze reading by humans. In: Proceedings of the 5th international conference on human–robot interaction (HRI’10). ACM/IEEE, Osaka, pp 39–44

  17. Duffy BR (2003) Anthropomorphism and the social robot. Robot Autonom Syst 42(3–4):177–190

    Article  MATH  Google Scholar 

  18. Ekman P (2005) Basic emotions. In: Dalgleish T, Power M (eds) Handbook of cognition and emotion. Wiley, Chichester, pp 45–60

    Chapter  Google Scholar 

  19. Ekman P, Friesen W (1971) Constants across cultures in the face and emotion. J Pers Soc Psychol 17(2):124–129

    Article  Google Scholar 

  20. Embgen S, Luber M, Becker-Asano C, Ragni M, Evers V, Arras K (2012) Robot-specific social cues in emotional body language. In: Proceedings of the 21st international symposium on robot and human interactive communication (RO-MAN 2012). IEEE, Paris, pp 1019–1025

  21. Etcoff N, Magee J (1992) Categorical perception of facial expressions. Cognition 44:227–240

    Article  Google Scholar 

  22. Eyssel F, Hegel F (2012) (S)he’s got the look: gender stereotyping of robots. J Appl Soc Psychol 42(9):2213–2230

    Article  Google Scholar 

  23. Franklin A, Davies IR (2004) New evidence for infant colour categories. Br J Dev Psychol 22(3):349–377

    Article  Google Scholar 

  24. Funakoshi K, Kobayashi K, Nakano M, Yamada S, Kitamura Y, Tsujino H (2008) Smoothing human-robot speech interactions by using a blinking-light as subtle expression. In: Proceedings of the 10th international conference on multimodal interfaces (ICMI’08). ACM, Chania, pp 293–296

  25. Gaver W (1986) Auditory icons: using sound in computer interfaces. Hum Comput Interact 2(2):167–177

    Article  Google Scholar 

  26. Gerrits E, Schouten M (2004) Categorical perception depends on the discrimination task. Percept Psychophys 66(3):363–376

    Article  Google Scholar 

  27. Goldstone RL, Hendrickson AT (2009) Categorical perception. Wiley Interdiscip Rev 1(1):69–78

    Article  Google Scholar 

  28. Hackett C (1960) The origin of speech. Sci Am 203:88–96

    Article  Google Scholar 

  29. Harnad S (ed) (1987) Categorical perception: the groundwork of cognition. Cambridge University Press, Cambridge

    Google Scholar 

  30. Heider F, Simmel M (1944) An experimental study of apparent behavior. Am J Psychol 57:243–259

    Article  Google Scholar 

  31. Jee E, Jeong Y, Kim C, Kobayashi H (2010) Sound design for emotion and intention expression of socially interactive robots. Intel Serv Robot 3:199–206

    Article  Google Scholar 

  32. Jee ES, Kim CH, Park SY, Lee KW (2007) Composition of musical sound expressing an emotion of robot based on musical factors. In: Proceedings of the 16th international symposium on robot and human interactive communication (RO-MAN 2007). IEEE, Jeju Island, pp 637–641

  33. Johannsen G (2004) Auditory displays in human–machine interfaces. Proc IEEE 92(4):742–758

    Article  Google Scholar 

  34. Karg M, Samadani Aa, Gorbet R, Kuhnlenz K (2013) Body movements for affective expression: a survey of automatic recognition and generation. Trans Affect Comput 4(4):341–359

    Article  Google Scholar 

  35. Komatsu T, Kobayashi K (2012) Can users live with overconfident or unconfident systems?: A comparison of artificial subtle expressions with human-like expression. In: Proceedings of conference on human factors in computing systems (CHI 2012). Austin, pp 1595–1600

  36. Komatsu T, Yamada S (2007) How appearance of robotic agents affects how people interpret the agents’ attitudes. In: Proceedings of the international conference on Advances in computer entertainment technology: ACE ’07

  37. Komatsu T, Yamada S (2011) How does the agents’ appearance affect users’ interpretation of the agents’ attitudes: experimental investigation on expressing the same artificial sounds from agents with different appearances. Int J Hum Comput Interact 27(3):260–279

    Article  Google Scholar 

  38. Komatsu T, Yamada S, Kobayashi K, Funakoshi K, Nakano M (2010) Artificial subtle expressions: intuitive notification methodology of artifacts. In: Proceedings of the 28th international conference on human factors in computing systems (CHI’10). ACM, New York, pp 1941–1944

  39. Kuhl PK (1991) Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Percept Psychophys 50(2):93–107

    Article  Google Scholar 

  40. Kuratate T, Matsusaka Y, Pierce B, Cheng G (2011) “Mask-bot”: A life-size robot head using talking head animation for human–robot communication. In: Proceedings of the 11th IEEE-RAS international conference on humanoid robots (Humanoids 2011). IEEE, Bled, pp 99–104

  41. Lang P, Bradley M (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Therapy Exp psychiatry 25(1):49–59

    Article  Google Scholar 

  42. Laukka P (2005) Categorical perception of vocal emotion expressions. Emotion 5(3):277–295

    Article  Google Scholar 

  43. Levitin DJ, Rogers SE (2005) Absolute pitch: perception, coding, and controversies. Trends Cognit Sci 9(1):26–33

    Article  Google Scholar 

  44. Liberman A, Harris K, Hoffman H (1957) The discrimination of speech sounds within and across phoneme boundaries. J Exp Psychol 54(5):358–368

    Article  Google Scholar 

  45. Moore RK (2012) A Bayesian explanation of the ’Uncanny Valley’ effect and related psychological phenomena. Sci Rep 2:864

    Article  Google Scholar 

  46. Moore RK (2013) Spoken language processing: where do we go from here? In: Trappl R (ed) Your virtual butler. Springer, Berlin, pp 119–133

    Chapter  Google Scholar 

  47. Mori M (1970) The Uncanny Valley. Energy 7:33–35

    Google Scholar 

  48. Mubin O, Bartneck C, Leijs L, Hooft van Huysduynen H, Hu J, Muelver J (2012) Improving speech recognition with the robot interaction language. Disrupt Sci Technol 1(2):79–88

    Article  Google Scholar 

  49. Mumm J, Mutlu B (2011) Human–robot proxemics: physical and psychological distancing in human–robot interaction. In: Proceedings of the 6th international conference on human–robot interaction (HRI’11), Lausanne

  50. Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Stud 59(1–2):157–183

    Google Scholar 

  51. Paepcke S, Takayama L (2010) Judging a bot by its cover: an experiment on expectation setting for personal robots. In: Proceedings of the 5th international conference on human–robot interaction (HRI’10). ACM/IEEE, Osaka, pp 45–52

  52. Picard RW (1997) Affective computing. MIT Press, Cambridge

    Book  Google Scholar 

  53. Plutchik R (1994) The psychology and biology of emotion. HarperCollins College Publishers, New York

    Google Scholar 

  54. Rae I, Takayama L, Mutlu B (2013) The influence of height in robot-mediated communication. In: Proceedings of the 8th international conference on human–robot interaction (HRI’13). IEEE, Tokyo, pp 1–8

  55. Read R, Belpaeme T (2010) Interpreting non-linguistic utterances by robots : studying the influence of physical appearance. In: Proceedings of the 3rd international workshop on affective interaction in natural environments (AFFINE 2010) at ACM multimedia 2010. ACM, Firenze, pp 65–70

  56. Read R, Belpaeme T (2012) How to use non-linguistic utterances to convey emotion in child–robot interaction. In: Proceedings of the 7th international conference on human–robot interaction (HRI’12). ACM/IEEE, Boston, pp 219–220

  57. Read R, Belpaeme T (2014) Situational context directs how people affectively interpret robotic non-linguistic utterances. In: Proceedings of the 9th international conference on human–robot interaction (HRI’14). ACM/IEEE, Bielefeld

  58. Reeves B, Nass C (1996) The media equation: how people treat computers, television, and new media like real people and places. CSLI Publications, Stanford

    Google Scholar 

  59. Repp B (1984) Categorical perception: issues, methods, findings. Speech Lang 10:243–335

    Google Scholar 

  60. Ros Espinoza R, Nalin M, Wood R, Baxter P, Looije R, Demiris Y, Belpaeme T (2011) Child-robot interaction in the wild: Advice to the aspiring experimenter. In: Proceedings of the 13th international conference on multimodal interfaces (ICMI’11). ACM, Valencia, pp 335–342

  61. Saerbeck M, Bartneck C (2010) Perception of affect elicited by robot motion. In: Proceedings of the 5th international conference on human–robot interaction (HRI’10). ACM/IEEE, Osaka, pp 53–60

  62. Scherer K (2003) Vocal communication of emotion: a review of research paradigms. Speech Commun 40(1–2):227–256

  63. Schouten B, Gerrits E, van Hessen A (2003) The end of categorical perception as we know it. Speech Commun 41(1):71–80

    Article  Google Scholar 

  64. Schröder M, Burkhardt F, Krstulovic S (2010) Synthesis of emotional speech. In: Scherer KR, Bänziger T, Roesch E (eds) Blueprint for affective computing. Oxford University Press, Oxford, pp 222–231

    Google Scholar 

  65. Schwent M, Arras K (2014) R2–d2 reloaded: a flexible sound synthesis system for sonic human–robot interaction design. In: Proceedings of the 23rd international symposium on robot and human interaction communiation (RO-MAN 2014), Edinburgh

  66. Siegel J, Siegel W (1977) Categorical perception of tonal intervals: musicians can’t tell sharp from flat. Percept Psychophys 21(5):399–407

    Article  Google Scholar 

  67. Siegel M, Breazeal C, Norton M (2009) Persuasive robotics: the influence of robot gender on human behavior. In: International conference on intelligent robots and systems (IROS 2009). IEEE, St. Louis, pp 2563–2568

  68. Singh A, Young J (2012) Animal-inspired human–robot interaction: a robotic tail for communicating state. In: Proceedings of the 7th international conference on human–robot interaction (HRI’12), Boston, pp 237–238

  69. Stedeman A, Sutherland D, Bartneck C (2011) Learning ROILA. CreateSpace, Charleston

    Google Scholar 

  70. Tay B, Jung Y, Park T (2014) When stereotypes meet robots: the double-edge sword of robot gender and personality in human–robot interaction. Comput Hum Behav 38:75–84

    Article  Google Scholar 

  71. Terada K, Yamauchi A, Ito A (2012) Artificial emotion expression for a robot by dynamic coluor change. In: Proceedings of the 21st international symposium on robot and human interactive communication (RO-MAN 2012). IEEE, Paris, pp 314–321

  72. Walters ML, Syrdal DS, Dautenhahn K, te Boekhorst R, Koay KL (2007) Avoiding the uncanny valley: robot appearance, personality and consistency of behaviour in an attention-seeking home scenario for a robot companion. Auton Robots 24(2):159–178

    Article  Google Scholar 

  73. Yilmazyildiz S, Athanasopoulos G, Patsis G, Wang W, Oveneke MC, Latacz L, Verhelst W, Sahli H, Henderickx D, Vanderborght B, Soetens E, Lefeber D (2013) Voice modification for wizard-of-OZ experiments in robot–child interaction. In: Proceedings of the workshop on affective social speech signals, Grenoble

  74. Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2011) EMOGIB: emotional gibberish speech database for affective human–robot interaction. In: Proceedings of the international conference on affective computing and intelligent interaction (ACII’11). Springer, Memphis, pp 163–172

  75. Yilmazyildiz S, Henderickx D, Vanderborght B, Verhelst W, Soetens E, Lefeber D (2013) Multi-modal emotion expression for affective human–robot interaction. In: Proceedings of the workshop on affective social speech signals (WASSS 2013), Grenoble

  76. Yilmazyildiz S, Latacz L, Mattheyses W, Verhelst W (2010) Expressive Gibberish speech synthesis for affective human–computer interaction. In: Proceedings of the 13th international conference on text., speech and dialogue (TSD’10). Springer, Brno, pp 584–590

  77. Zhou K, Mo L, Kay P, Kwok VPY, Ip TNM, Tan LH (2010) Newly trained lexical categories produce lateralized categorical perception of color. Proc Natl Acad Sci USA 107(22):9974–9978

    Article  Google Scholar 

Download references

Acknowledgments

This work was (partially) funded by the EU FP7 ALIZ-E project (Grant 248116).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robin Read.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Read, R., Belpaeme, T. People Interpret Robotic Non-linguistic Utterances Categorically. Int J of Soc Robotics 8, 31–50 (2016). https://doi.org/10.1007/s12369-015-0304-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-015-0304-0

Keywords

Navigation