Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3527188.3561921acmotherconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
research-article

Mixed-Cultural Speech for Intelligent Virtual Agents - the Impact of Different Non-Native Accents Using Natural or Synthetic Speech in the English Language

Published: 05 December 2022 Publication History

Abstract

This paper presents an exploratory study investigating the impact of non-native accented speech on the perception of Intelligent Virtual Agents (IVAs). In an online study, native English speakers watched a video of an IVA holding a monologue whilst speaking English with either a Spanish, Hindi or Mandarin accent that was either recorded by native speakers of that respective language (natural speech) or synthetically generated (synthetic speech). The results showed a significant impact of naturalness of speech on the IVAs perceived warmth and a significant interaction of accent and naturalness of speech on its perceived competence. The naturalness of speech impacted the participants’ perception of the IVA as a non-native speaker of English, and the correctness of the attributed mother tongue in the Spanish and the Mandarin accent condition. These results are a valuable contribution to research on mixed-cultural IVAs in general and non-native speech as a cultural cue more specifically.

References

[1]
2011. Census tables | Government of India. https://censusindia.gov.in/census.website/data/census-tables
[2]
2019. Countries of Birth for U.S. Immigrants, 1960-Present | migrationpolicy.org. https://www.migrationpolicy.org/programs/data-hub/charts/immigrants-countries-birth-over-time
[3]
2019. Summary by language size | Ethnologue. https://web.archive.org/web/ 20190329231649/https:/www.ethnologue.com/statistics/size
[4]
2020. Migration, Australia, 2019-20 financial year | Australian Bureau of Statistics. https://www.abs.gov.au/statistics/people/population/migration-australia/latest-release#data-download
[5]
2021. Population of the UK by country of birth and nationality - Office for National Statistics. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/internationalmigration/datasets/populationoftheunitedkingdombycountryofbirthandnationality
[6]
2022. Permanent Residents – Monthly IRCC Updates - Open Government Portal. https://open.canada.ca/data/en/dataset/f7e5498e-0ad8-4417-85c9-9b8aff9b9eda
[7]
2022. What is the most spoken language? | Ethnologue. https://www.ethnologue.com/guides/most-spoken-languages
[8]
Frank Asbrock. 2010. Stereotypes of social groups in Germany in terms of warmth and competence. Social Psychology (2010).
[9]
Richard D Ashmore and Frances K Del Boca. 1979. Sex stereotypes and implicit personality theory: Toward a cognitive—Social psychological conceptualization. Sex roles 5, 2 (1979), 219–248.
[10]
Matthew P. Aylett, Leigh Clark, Benjamin R. Cowan, and Ilaria Torre. 2021. Building and Designing Expressive Speech Synthesis (1 ed.). Association for Computing Machinery, New York, NY, USA, 173–212. https://doi.org/10.1145/3477322.3477329
[11]
Mary Jiang Bresnahan, Rie Ohashi, Reiko Nebashi, Wen Ying Liu, and Sachiyo Morinaga Shearman. 2002. Attitudinal and affective response toward accented English. Language & Communication 22, 2 (2002), 171–185.
[12]
Tamás Csehó. 2009. Zum Einfluss des Foreigner Talk auf die Entstehung des Migrantendeutsch. Germanistische Studien 7(2009), 153–168.
[13]
Amy JC Cuddy, Susan T Fiske, and Peter Glick. 2008. Warmth and competence as universal dimensions of social perception: The stereotype content model and the BIAS map. Advances in experimental social psychology 40 (2008), 61–149.
[14]
Morteza Dehghani, Peter Khooshabeh, Lixing Huang, Lia Oganesyan, and Jonathan Gratch. 2011. Cultural frame-switching using accented spoken language by a virtual character. In Workshop on Culturally Motivated Virtual Characters, IVA.
[15]
Tracey M Derwing and Murray J Munro. 2009. Putting accent in its place: Rethinking obstacles to communication. Language teaching 42, 4 (2009), 476–490.
[16]
Marko Dragojevic and Sean Goatley-Soan. 2022. Americans’ attitudes toward foreign accents: Evaluative hierarchies and underlying processes. Journal of multilingual and multicultural development 43, 2(2022), 167–181.
[17]
Susan A Duffy and David B Pisoni. 1992. Comprehension of synthetic speech produced by rule: A review and theoretical interpretation. Language and Speech 35, 4 (1992), 351–389.
[18]
Thomas Eckes. 2002. Paternalistic and envious gender stereotypes: Testing predictions from the stereotype content model. Sex Roles 47, 3 (2002), 99–114.
[19]
Samantha Finkelstein, Evelyn Yarzebinski, Callie Vaughn, Amy Ogan, and Justine Cassell. 2013. The Effects of Culturally Congruent Educational Technologies on Student Achievement. In Artificial Intelligence in Education, H. Chad Lane, Kalina Yacef, Jack Mostow, and Philip Pavlik (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 493–502.
[20]
Susan T Fiske. 2018. Stereotype content: Warmth and competence endure. Current directions in psychological science 27, 2 (2018), 67–73.
[21]
James Emil Flege. 1984. The detection of French accent by American listeners. The Journal of the Acoustical Society of America 76, 3 (1984), 692–707.
[22]
Benjamin W Fortson IV. 2011. Indo-European language and culture: An introduction. John Wiley & Sons.
[23]
Barbara F Freed. 1981. Foreigner talk, baby talk, native talk. International journal of the sociology of language 1981, 28(1981), 19–40.
[24]
Jairo N Fuertes, William H Gottdiener, Helena Martin, Tracey C Gilbert, and Howard Giles. 2012. A meta-analysis of the effects of speakers’ accents on interpersonal evaluations. European Journal of Social Psychology 42, 1 (2012), 120–133.
[25]
Howard Giles and Peter F Powesland. 1975. Speech style and social evaluation.Academic Press.
[26]
Harald Haarmann. 2017. Kleines lexikon der sprachen: von Albanisch bis Zulu. Vol. 1432. CH Beck.
[27]
Leonie Huddy and Nayda Terkildsen. 1993. Gender stereotypes and the perception of male and female candidates. American journal of political science(1993), 119–147.
[28]
Dušan Jan, David Herrera, Bilyana Martinovski, David Novick, and David Traum. 2007. A computational model of culture-specific conversational behavior. In International Workshop on Intelligent Virtual Agents. Springer, 45–56.
[29]
Rudolf Kalin, Donald S Rayko, and Norah Love. 1980. The perception and evaluation of job candidates with four different ethnic accents. In Language. Elsevier, 197–202.
[30]
Peter Khooshabeh, Morteza Dehghani, Angela Nazarian, and Jonathan Gratch. 2017. The cultural influence model: When accented natural language spoken by virtual characters matters. AI & society 32, 1 (2017), 9–16.
[31]
Nicole Krämer and Arne Manzeschke. 2021. Social Reactions to Socially Interactive Agents and Their Ethical Implications (1 ed.). Association for Computing Machinery, New York, NY, USA, 77–104. https://doi.org/10.1145/3477322.3477326
[32]
Brigitte Krenn, Birgit Endrass, Felix Kistler, and Elisabeth André. 2014. Effects of language variety on personality perception in embodied conversational agents. In International Conference on Human-Computer Interaction. Springer, 429–439.
[33]
Vivien Kühne, Astrid Marieke Rosenthal-von der Pütten, and Nicole C Krämer. 2013. Using linguistic alignment to enhance learning experience with pedagogical agents: the special case of dialect. In International Workshop on Intelligent Virtual Agents. Springer, 149–158.
[34]
Jennifer Lai, Karen Cheng, Paul Green, and Omer Tsimhoni. 2001. On the Road and on the Web? Comprehension of synthetic and human speech while driving. In Proceedings of the SIGCHI conference on Human factors in computing systems. 206–212.
[35]
J. Lee and S. Seneff. 2008. An analysis of grammatical errors in non-native speech in english. In 2008 IEEE Spoken Language Technology Workshop. 89–92. https://doi.org/10.1109/SLT.2008.4777847
[36]
John S Logan, Beth G Greene, and David B Pisoni. 1989. Segmental intelligibility of synthetic speech produced by rule. The Journal of the Acoustical Society of America 86, 2 (1989), 566–581.
[37]
B. Lugrin, B. Eckstein, K. Bergmann, and C. Heindl. 2018. Adapted Foreigner-directed Communication Towards Virtual Agents. In 18th Int. Conf. on Intelligent Virtual Agents (Sydney, NSW, Australia) (IVA ’18). ACM, 59–64. https://doi.org/10.1145/3267851.3267859
[38]
Roy C Major. 2007. Identifying a foreign accent in an unfamiliar language. Studies in second language acquisition 29, 4 (2007), 539–556.
[39]
Murray J Munro. 2003. A primer on accent discrimination in the Canadian context. TESL Canada Journal (2003), 38–51.
[40]
Clifford Ivar Nass and Scott Brave. 2005. Wired for speech: How voice activates and advances the human-computer relationship. MIT press Cambridge, MA.
[41]
David Obremski, Helena Babette Hering, Paula Friedrich, and Birgit Lugrin. 2022. Exploratory Study on the Perception of Intelligent Virtual Agents With Non-Native Accents Using Synthetic and Natural Speech in German. INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (2022). https://doi.org/10.1145/3536221.3556608 (in press).
[42]
David Obremski, Jean-Luc Lugrin, Philipp Schaper, and Birgit Lugrin. 2019. Non-Native Speaker Generation and Perception for Mixed-Cultural Settings. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (Paris, France) (IVA ’19). ACM, New York, NY, USA, 105–107. https://doi.org/10.1145/3308532.3329427
[43]
David Obremski, Jean-Luc Lugrin, Philipp Schaper, and Birgit Lugrin. 2021. Non-native speaker perception of Intelligent Virtual Agents in two languages: the impact of amount and type of grammatical mistakes. Journal on Multimodal User Interfaces 15, 2 (2021), 229–238.
[44]
David Obremski, Alicia L Schäfer, Benjamin P Lange, Birgit Lugrin, Elisabeth Ganal, Laura Witt, Tania R Nuñez, Sascha Schwarz, and Frank Schwab. 2021. Put that Away and Talk to Me-the Effects of Smartphone induced Ostracism while Interacting with an Intelligent Virtual Agent. In Proceedings of the 9th International Conference on Human-Agent Interaction. 428–432.
[45]
Lisanne Pauw, Disa Sauter, Gerben van Kleef, Gale Lucas, Jonathan Gratch, and Agneta Fischer. 2022. The Avatar Will See You Now: Support from a Virtual Human Provides Socio-Emotional Benefits. (2022).
[46]
Sai Sirisha Rallabandi, Babak Naderi, and Sebastian Möller. 2021. Identifying the vocal cues of likeability, friendliness and skilfulness in synthetic speech. In Proc. 11th ISCA Speech Synthesis Workshop (SSW 11). 1–6.
[47]
Matthias Rehm, Nikolaus Bee, Birgit Endrass, Michael Wissner, and Elisabeth André. 2007. Too close for comfort? Adapting to the user’s cultural background. In Proceedings of the international workshop on Human-centered multimedia. 85–94.
[48]
Jack Richards. 1971. Error Analysis and Second Language Strategies. The text of an invited lecture given at Indiana University, Bloomington (1971), 28.
[49]
Brent Rossen, Kyle Johnsen, Adeline Deladisma, Scott Lind, and Benjamin Lok. 2008. Virtual humans elicit skin-tone bias consistent with real-world skin-tone biases. In International Workshop on Intelligent Virtual Agents. Springer, 237–244.
[50]
Tanja Schneeberger, Sofie Ehrhardt, Manuel S Anglet, and Patrick Gebhard. 2019. Would you follow my instructions if I was not human? Examining obedience towards virtual agents. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1–7.
[51]
Tanja Schneeberger, Mirella Scholtes, Bernhard Hilpert, Markus Langer, and Patrick Gebhard. 2019. Can Social Agents elicit Shame as Humans do?. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 164–170.
[52]
Eileen C Schwab, Howard C Nusbaum, and David B Pisoni. 1985. Some effects of training on the perception of synthetic speech. Human factors 27, 4 (1985), 395–408.
[53]
Thomas Scovel. 2000. A critical review of the critical period research. Annual review of applied linguistics 20 (2000), 213–223.
[54]
Barbara Soukup. 2009. Dialect use as interaction strategy: A sociolinguistic study of contextualization, speech perception, and language attitudes in Austria. Braumüller.
[55]
Su-Hie Ting, Mahanita Mahadhir, and Siew-Lee Chang. 2010. Grammatical Errors In Spoken English Of University Students In Oral Communication Course. 10 (2010), 18.
[56]
Ilaria Torre, Jeremy Goslin, and Laurence White. 2015. Investing in accents: How does experience mediate trust attributions to different voices?. In ICPhS.

Cited By

View all
  • (2025)Gender and accent stereotypes in communication with an intelligent virtual assistantInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2024.103407195(103407)Online publication date: Jan-2025
  • (2023)A System for Building Wizard-of-Oz-based Interactive Scenarios with Mixed-Cultural Intelligent Virtual AgentsProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607288(1-4)Online publication date: 19-Sep-2023

Index Terms

  1. Mixed-Cultural Speech for Intelligent Virtual Agents - the Impact of Different Non-Native Accents Using Natural or Synthetic Speech in the English Language

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HAI '22: Proceedings of the 10th International Conference on Human-Agent Interaction
    December 2022
    352 pages
    ISBN:9781450393232
    DOI:10.1145/3527188
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. intelligent virtual agents
    2. mixed-cultural
    3. non-native accent

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HAI '22
    HAI '22: International Conference on Human-Agent Interaction
    December 5 - 8, 2022
    Christchurch, New Zealand

    Acceptance Rates

    Overall Acceptance Rate 121 of 404 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 23 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Gender and accent stereotypes in communication with an intelligent virtual assistantInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2024.103407195(103407)Online publication date: Jan-2025
    • (2023)A System for Building Wizard-of-Oz-based Interactive Scenarios with Mixed-Cultural Intelligent Virtual AgentsProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607288(1-4)Online publication date: 19-Sep-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media