Open Access Published by De Gruyter Mouton February 27, 2024

“You are Apple, why are you speaking to me in Turkish?”: the role of English in voice assistant interactions

Didem Leblebici

From the journal Multilingua

https://doi.org/10.1515/multi-2023-0072

Abstract

This paper investigates the role of English in voice assistant (Siri, Alexa, Google Assistant) use from the perspective of language ideology. Major commercial companies in the voice assistant market use English as a training language for their speech technologies and offer the most optimised support for standardised varieties of English. This affects the experiences with voice assistants of speakers of non-European languages, i.e., one of the non-target audiences. Drawing on qualitative interview data from Turkish-speaking users who migrated to Germany, the present study reveals that the participants iconize English as the “standard” language in digital contexts, constructing it as the “original” language of speaking computers. By conducting an inductive analysis, the article demonstrates that not only the lack of technological support, but also specific discourses about Artificial Intelligence, impact perceptions of English. These developments have implications for our understandings of prestige and digital literacy in human-machine interactions.

Özet

Bu makale, İngilizcenin sesli asistan (Siri, Alexa, Google Assistant) kullanımındaki rolünü dil ideolojisi perspektifinden incelemektedir. Sesli asistan pazarındaki büyük ticari şirketler, konuşma teknolojileri için İngilizceyi bir tür eğitim dili olarak kullanmakta ve standartlaştırılmış İngilizce varyasyonları için en uygun desteği sunmaktadır. Bu, Avrupa dışı dilleri konuşanların, yani hedef kitle dışındaki kitlelerin sesli asistanlarla ilgili deneyimlerini etkilemektedir. Almanya’ya göç etmiş Türkçe konuşan kullanıcılardan elde edilen nitel görüşme verilerine dayanan bu çalışma, katılımcıların İngilizceyi dijital bağlamlarda “standart” dil olarak ikonlaştırdıklarını ve onu bilgisayarların “orijinal” konuşma dili olarak kurguladıklarını ortaya koymaktadır. Tümevarımsal (indüktif) bir analiz yürüten makale, yalnızca teknolojik destek eksikliğinin değil, aynı zamanda yapay zeka hakkındaki belirli söylemlerin (discourse) de İngilizce algılarını etkilediğini göstermektedir. Bu gelişmelerin, insan-makine etkileşimlerinde prestij ve dijital okuryazarlık anlayışlarımız üzerinde yansımaları bulunmaktadır.

Keywords: human-machine interaction; language ideology; voice assistants; multilingualism; English

Anahtar kelimeler: İnsan-Makine Etkileşimi; Dil İdeolojisi; Sesli Asistanlar; Çokdillilik; İngilizce

1 Introduction

Things that are designed and made by people can never be neutral, as people themselves are not neutral, and the society we live in is certainly not neutral. Designers, researchers, and engineers all have their own beliefs and values, and live in environments steeped in cultural and societal norms and assumptions. These biases, values, and assumptions seep into the digital products they work on, whether consciously or unconsciously. (Cirucci and Pruchniewska 2021: 14)

Over the last several years, feminist scholars have been emphasising many facets of “algorithmic bias” by pointing out how technologies encode and reproduce gendered and racial discrimination (Benjamin 2019; Costanza-Chock 2020; Noble 2018). Markl (2022) argues that these biases are increasingly found in speech technologies. People with “non-standard” language practices are faced with algorithms that cannot process their speech because they speak African American English Vernacular (Koenecke et al. 2020), English as a second language (Beneteau et al. 2019; Markl 2022; Wu et al. 2020), or simply happen to be women (Costa-Jussà et al. 2020).

These critical insights on bias in speech technologies motivate this investigation of the perceptions of multilingual users who may be prone to experience these biases. As there is little to no sociolinguistic research on multilingual speakers’ experiences of emerging voice assistant technologies (like Alexa, Siri, and Google Assistant), this paper aims to begin addressing this gap by sampling Turkish speakers who reside in Germany. The participants have migrated to Germany within the last 10 years and have diverse linguistic repertoires, as they engage in Turkish, English, and German in their daily lives. However, in order to use voice assistants, they usually (have to) choose English or German, as Turkish is rarely if ever offered as an option. The big technology companies mentioned above also offer the most optimised support for standardised varieties of English.

While recognizing the significance of algorithmic biases and English-dominated technology design, it is crucial to consider how discourses of English gain new meanings in specific local and social contexts. As evidenced by a substantial body of research, the globalisation of English influences local identity construction processes (see Lee and Jenks 2021; Park and Wee 2012). Individuals in diverse sociocultural environments demonstrate innovative uses of English through local reappropriation and perform new identities through English in both digital and non-digital contexts (e.g., Li 2020; Pennycook 2003; Spilioti 2020). Against this background, this paper asks: (1) how can we explain the role of English in multilingual speakers’ use of voice assistants?; what are some of the structural and historical reasons for the prominence of English?; and (2) what language ideological work do multilingual speakers produce to account for their use of English in voice assistants?

I aim to answer these questions from a language-ideological perspective. Language ideologies are “morally and politically loaded representations of the nature, structure, and use of languages in a social world” (Irvine 1989; Woolard 2020: 1). They “implicitly or explicitly represent not only how language is, but how it ought to be” (Woolard 2020: 2). The paper investigates ideologies among multilingual people as well as the ones integrated into voice assistants. I first briefly explore the history of voice assistants, followed by a discussion of their marketed image versus the complex “sociotechnical ensemble” (Gillespie 2016: 22) that remains hidden from the users. This is followed by a discussion of the role of English in relation to ‘other’ languages in speech design.

2 Voice assistants: brief history and contemporary applications

From a technical perspective, contemporary voice assistants, as we know them today, are considered to have emerged in 2010 with Apple’s Siri (Hoy 2018). It could be argued, though, that cultural experiences with such technologies have a much longer pedigree. Humphry and Chesher (2020) argue that the devices that made it possible to distort the connection between human sound and bodily presence mark the beginning of “cultural familiarization” with voice assistants. These devices include the phonograph introduced in the 19th century and other everyday gadgets such as the telephone and radio. Many researchers argue that these historical developments but also popular science fiction culture have acted as a motive and source of information for the design of contemporary voice assistants (e.g., Hoy 2018; Humphry and Chesher 2020; Natale 2021).

A year after Siri’s initial launch, iPhone owners could already interact with it as an embedded feature on their devices without downloading any app. Four years later, in 2014, Amazon launched Alexa and the Amazon Echo, external devices for home control that require further purchase (for a brief history of the contemporary VAs, see Hoy 2018; Humphry and Chesher 2020; Natale 2021). Looking at the Voice Assistant Timeline (Voicebot.ai 2021), we can observe that since 2016 multiple companies have introduced their own voice assistant technologies. These may come embedded in smartphones, gaming consoles, cars, TV etc.; or in the form of home devices such as Amazon’s Alexa, Apple’s HomePod or Deutsche Telekom’s Magenta. An essential marketing promise is to control “smart” gadgets like lights, alarms, locks, cameras, or coffee machines through speech commands.

Apart from controlling smart devices, answering informational questions, and completing tasks such as texting, third-party developers for Alexa also offer “skills” that can be downloaded separately, similar to smartphone apps. These skills allow users to, for instance, place an order for drinks at a café or use a dictionary. Although these activities can be completed by logging into the apps haptically (by touch), voice assistants serve as a “bridge that allows users to issue commands verbally rather than via an app” (Hoy 2018: 84).

Taking a closer look at the literature, we come across two main conceptualisations of voice assistants. On the one hand, they are defined in technical terms as a technologically developed voice-enabled Artificial Intelligence^[1] (e.g. Poushneh 2021). In other words, voice assistants are a type of “conversational agent”, i.e., “software that accepts natural language as input and generates natural language as output, engaging in a conversation with the user” (Griol et al. 2013: 760). On the other hand, there is an anthropomorphised understanding of these technologies, depicting them as humanlike entities capable of human interaction. As Hoy (2018: 82) puts it, voice assistants “are the realization of the science fiction dream of interacting with our computers by talking to them”. There is a fascination with the idea of holding a conversation with non-human entities, which follows in the footsteps of science fiction narratives. This fascination extends from users to technology professionals, as seen in a 2022 statement by a former Google engineer who declared the chatbot LaMDA as sentient based on an “interview” (Lemoine 2022). His statement sparked public debates about AI and generated conflicting responses. While some argue that contemporary AI technologies are not advanced enough yet, implying an expectation for intelligent machines in the future, others warn about the danger of deception and misinterpretation of unreliable outputs (Coeckelbergh and Gunkel 2023). Both positions derive their arguments from the human-technology binary, treating texts generated by AI as indicative of cognitive capabilities or asserting that these technologies are mere tools separate from human intellectual capacity (ibid.). The categorization of voice assistants as either simple input-output systems or humanlike conversational interlocutors contributes to the construction of these dualistic perspectives.

Rather than reproducing these binaries, it is important to consider these devices by investigating their completeness. From a posthumanist perspective, they can be described as assemblages that are curated by human and nonhuman entities (Pennycook 2018). This assemblage or “sociotechnical ensemble” (Gillespie 2016: 22) not only consists of the objects in their materiality (e.g., smartphone, Alexa unit) but also of the cultural discourses that reshape these technologies, and the vast amount of data and numerous actors involved – programmers, users, and multinational corporations in pursuit of profits. Awareness of these discourses and technological infrastructures is necessary to avoid fetishization (Crawford 2016: 89) or a simplistic approach to algorithms as technological processes (Seaver 2019).

Natale (2021) argues that the way in which voice assistants are marketed hides from the general public a complex infrastructure which requires intensive human labor. On the consumer side, voice assistants are portrayed as “anthropomorphized virtual assistants” incorporating humanlike features such as “trust, friendliness, credibility, and empathy” that are integral parts of these technologies (Sweeney 2016: 221). These abstract qualities are achieved through the use of a fixed gendered voice that is typically female and the implementation of specific conversational scripts (Sweeney 2016; West et al. 2019). These design choices reproduce gender stereotypes: they depict the technology as a personal assistant or a servant, an occupation mainly performed by women (Phan 2017: 30) and they project the female digital assistant as a passive subject who obeys the dominant users’ requests (see also Curry et al. 2020; Sweeney 2016; West et al. 2019). On the technical side, voice assistants consist of various software and hardware systems which require intense human labour and great computing power (Crawford and Joler 2018). Three main software processes of voice assistants that remain hidden from the users are (1) speech processing, which recognizes users’ speech by transcribing it based on the training data and generates synthetic voice in response; (2) natural language processing, which analyses the transcribed inputs and produces advanced outputs in response; and (3) algorithms that extract information from the web (Natale 2021: 110).

Against this background, it is highly significant that ordering voice assistants to complete tasks or make inquiries is only possible in specific language variants provided by the tech companies. The next section will address these issues by comparing the language options provided, and problematizing the dominance of English in popular voice assistants.

3 English in voice assistant design

Kelly-Holmes (2019) examines multilingualism in online spaces beginning with the birth of the internet (a monolingual era dominated by English) and ending with contemporary practices. She argues that we currently find ourselves in a “hyperlingual” era, where users can experience linguistically diverse digital spaces through the increasing participation of diverse audiences and increased technological affordances (p. 31). The growing body of sociolinguistic research informs us about online translanguaging practices which go beyond the standardised and normed languages (e.g., Li 2020; Spilioti 2020; Taibi and Badwan 2022). At the same time, however, we are experiencing the internet in “linguistic filter bubbles” (Kelly-Holmes 2019: 34; Pariser 2011) with increasing “algorithmic mass individualization” (p. 26). In other words, the languages offered by digital platforms are based on algorithms that, for instance, calculate users’ geographical location among their other digital footprints. Voice assistants reflect these latest developments through the language options made available to specific audiences.

The popular voice assistant companies (Amazon, Google and Apple) offer a set of language options that are usually marketed by region (see Table 1). As with other language technologies, voice assistants are first introduced in English and optimized for it, as English has the largest amount of data available to train their AI (Schneider 2022). The language options are offered to their users as countable, nameable and bounded entities, tied to a particular country/region, which mirror standard language ideologies (Schneider 2022). These ideologies reproduce a view of the relationship between nameable languages and their geopolitical territories as natural rather than socially constructed (cf. Stevenson 2008: 485). They presume communication “with an ideal speaker-listener, in a completely homogenous speech-community, who knows its language perfectly” (Chomsky 1965: 3). For instance, in the UK, users of Siri would automatically hear British English vernacular instead of Indian English and vice versa. These design choices aim to appeal to the local language norms, aiming not to “distract” the users with voice assistant responses that do not match these norms (Phan 2017: 29). Because these technologies are constructed for the standard varieties of languages, they give significantly worse feedback for non-standard varieties of English, e.g. African American Vernacular English speakers (Koenecke et al. 2020), Mandarin speakers of English (Wu et al. 2020), Spanish speakers of English (Beneteau et al. 2019) and second-language speakers of English (Markl 2022).

Table 1:

Overview of language options in popular voice assistants (Updated: 2.12.23).

Voice assistants	Language options^a
Alexa (Amazon smart speaker)	Arabic (SA), English (AU, CA, IN, IRL, UK, US), French (CA, FR), German (DE), Hindi (IN), Italian (IT), Japanese (JP), Portuguese (BR), Spanish (ES, MX, US)
Google assistant (smartphone version)	Danish (DK), German (BE, DE, AUT, CH), English (AU, CA, ID, IN, IRL, PH, SG, TH, UK, US), Spanish (AR, CL, CO, ES, US, MX, PE), French (BE, CA, FR, CH), Indonesian (ID), Italian (IT, CH), Dutch (BE, NL), Norwegian (NO), Polish (PL), Portuguese (BR, PT), Swedish (SZ), Vietnamese (VN), Russian (RUS), Arabic (World, SA, EGY), Hindi (IN), Thai (TH), Korean (KR), Chinese (CN, TW, HK), Japanese (JP) Other supported languages that are not mentioned in the app settings: Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu, Turkish and Urdu
Siri (Apple iPhone)	Arabic, Cantonese (CN, HK), Danish, Dutch (NL, BEL), English (AU, CA, IN, IRL, NZ, SG, ZA, UK, US), Finnish, French (BEL, CA, FR), German (DE, AUT), Hebrew, Italian, Japanese, Korean, Malay, Mandarin (CN, TW), Norwegian Bokmål, Portuguese (BR), Russian, Spanish (CL, MX, ES, US), Swedish, Thai, Turkish

^aThe abbreviations refer to the countries associated with the language option. Languages and countries listed here are based on the options provided in the Alexa app and Siri settings on iPhone. Google Assistant’s language options are taken from the information on their support website (below), because not all language options can be selected in the app. For instance, iPhone users cannot choose the Turkish option in the Google Assistant app and should change the language settings of their phone to Turkish. This is, however, not explicitly mentioned in the app. Only on the website mentioned below, there is an indication that the assistant will automatically adapt the language of the phone settings. (https://support.google.com/assistant/answer/7394513?co=GENIE.Platform%3DiOS&oco=0#zippy=%2Cphone-or-tablet) Access: 2.12.23. (https://support.google.com/assistant/answer/7172657?hl=en&co=GENIE.Platform%3DiOS&oco=1) Access: 2.12.23.

As Table 1 shows, Google Assistant’s smartphone version currently has the largest selection of language variants compared to the other popular assistants. The most striking observation here is that most language choices are the standard variants of European/Western languages, albeit with some exceptions. These exceptional cases, however, are not integrated for every digital activity. For instance, Siri only supports English and its selected varieties for the “suggested reminders” function.^[2] Similarly, Alexa and Google Assistant’s home devices do not currently offer any support for Turkish. Note, too, there are no African language options available from these leading voice technology companies except for Arabic.

At the time of empirical data generation for this paper, the most popular voice assistants did not allow for bilingual practices. This has lately started to change, however, as profit motives catch up with market demands. Both home and mobile varieties of Google Assistant now allow users to add one additional language, making the device bilingual at first glance. Nevertheless, this is not made available for all language options, and the feature does not allow users to “code-switch” in the same sentence. In other words, it is not possible to start a command in English and end it in Spanish. Similarly, Amazon made it possible to use Alexa bilingually, but it is significantly more limited, since it is almost only possible to combine English language varieties with other language options with the exceptions of German-French and German-Italian (see Figure 1 below). The function of English “as a gatekeeper to positions of prestige in a society” (Pennycook 2017: 14) extends here to buffer access to the speech technologies. In the context of Alexa’s bilingual options, it is also interesting to note that only US and UK English language varieties are compatible with almost every other language option. While US English can be paired with any language option except Canadian French, UK English is largely linked with European languages and their associated countries such as Spain, Italy, Germany, and France. Australian, Canadian, Indian and Irish Englishes cannot be combined with languages divergent from their respective regions. Due to space constraints, a comprehensive analysis of these language pairings is beyond the scope of this paper. Future research could explore how companies reappropriate regional and national language categories in technology design.

Figure 1:

Language pairs of Alexa for the bilingual option, illustrated by the author (Updated: 2.12.23).

4 Methodology

4.1 Sampling and data generation

Previous studies interested in voice assistant interactions employ different approaches for generating data. Some scholars collect automatically saved interaction logs on Alexa app (Habscheid et al. 2021) and others implement an external voice recorder specifically designed for research purposes (Porcheron et al. 2018). However, the data generated with these methods only provide access to spoken interaction for a few seconds after the wake word that activates the device. Important discursive and contextual elements thus remain invisible for the analysts (Habscheid et al. 2021). Moreover, planting a voice recorder in private homes raises ethical concerns (Hector et al. 2022). Hence, in contrast to these methods, I opted for “techno-biographic interviews” (Lee 2014: 94) allowing participants to share their life stories related to voice assistant technologies. The techno-biographic interview method is rooted in traditional narrative inquiry research (Ching and Vigdor 2005; Lee 2014). Informants are encouraged to construct a narrative about their experiences with technology, much like traditional narrative interviews where individuals recount significant life events (Linde 1987; Rosenwald and Ochberg 1992). Scholars engage participants in open-ended discussions, encouraging them to reflect on their early encounters with the technology, current use cases and its impact on their lives. The approach provides insights into how people organise and interpret their subjective experiences (Linde 1987; Rosenwald and Ochberg 1992).

The participants in this study are Turkish-speaking voice assistant users who migrated from Turkey to Germany within the last 10 years. By looking at the monolingual speech design (see previous section), I argue that the participants are not considered to be the “ideal addressee” in user experience design (Piller 2001; Portmann 2022: 2). Although they may have multilingual repertoires (see Table 2), they must select only one language to use speech technologies. Unlike the smartphone assistants Siri, Google Assistant and Bixby (Samsung), Turkish is not offered as an option for the smart speaker Alexa. We thus understand that the “end-user” (Cirucci and Pruchniewska 2021) of Alexa is clearly not imagined to be someone who speaks Turkish. Looking at the one-country one-language speech design (Table 1), the user is expected to speak the national language of the country of residence, which would be Standard German in the context of Germany. These design choices, I argue, are informed by standard language ideologies and not by the multilingual realities of migrant populations.

Table 2:

Overview of the interviewees.

Interviewees	Dilan	Yavuz	Berk	Talha	Sena	Emir
Age	20	26	31	33	35	39
Gender	F	M	M	M	F	M
Occupation	Student (Engineering)	Architect	Engineer	Engineer	Marketing	IT
Language repertoires^a	Turkish, English, German	Turkish, English, German, Italian	Turkish, English, (Italian, Polish, German)	Turkish, German, English	Turkish, English, French, (German)	Turkish, English, (German)
Language practices with voice assistants^b	German, (English, Turkish)	English, Italian	Turkish, English, (Polish)	German, (English)	English, (French)	English
Voice assistants^c	Alexa, (Siri)	Siri, (Alexa)	Google Assistant (Bixby, Siri)	Siri, (Alexa)	Alexa, (Siri)	Alexa
Smart devices	–	–	Lights, TV	Lights	Lights, speaker	Lights, switches
Duration of using voice assistants	4 months	4 years	10 years	3 years	4 months	1.5 years

^aThe order of named languages is based on how individuals articulated their language repertoires during the interviews, referring to the ease and proficiency with which they are employed in daily life. Languages illustrated in brackets refer to the languages they are currently learning. ^bLanguages in brackets refer to their language practices in the past. The first mentioned language is the primary language setting of their gadgets. ^cVoice assistants in brackets refer to the gadgets they have used previously.

Since I share a similar cultural, linguistic and migration background with the participants, I consider myself an “insider”, someone who exists “within the same imagined community” (Ganga and Scott 2006). This insider status has been beneficial in the recruiting process and in “making sense of the complex intricacies of situated everyday activity” (Rampton et al. 2015: 16). Nevertheless, finding Turkish-speaking voice assistant users is less self-evident than finding Turkish-speaking members of chat groups, classrooms and similar. I had to work via WhatsApp chat groups to get in touch with some of them that were willing to cooperate. In total, I could get in touch with six participants through WhatsApp chat groups, of which I am a member. These groups primarily consist of Turkish-speaking newcomers who assist each other concerning a wide variety of topics (e.g., finding housing or visa issues).

The “new wave” migrants, a term self-adopted by these digital communities, differ from the existing Turkish guest worker diaspora in Germany, in terms of socioeconomic status, age, education levels, lifestyle practices and political stance (cf. Savas 2019; Türkmen 2019). The formation of social media communities by these groups serves as a means to maintain a group identity as well-educated newcomers by differentiating themselves from the existing guest worker diaspora in Germany (cf. Oldac and Fancourt 2021). Presenting as an in-group member, I announced my research topic by asking if anyone uses voice assistants. While it was expected that the members would have higher educational levels due to the nature of these online communities, people who contacted me also work in or adjacent to the tech industry (see Table 2). This leads to believe that at least some of my informants have a rather positive view of voice assistant technology or might not be very critically predisposed to it.

The interviews were conducted through video calls in April 2021 because of the Covid-19 pandemic. I opted for a semi-structured interview with open questions (Roulston and Choi 2018), as the present study is interested in people’s beliefs and perspectives about language. In this context, it is essential to “listen to the valuations of the speakers” and orient oneself towards the interviewees by taking their individual biographies into account to avoid producing a “view from nowhere” or assuming that “linguist knows best” (Gal 2016: 131). Hence, I included questions about participants’ language repertoires and their personal history with voice assistants (see Table 2). Some of the more specific questions addressed their motivation for purchasing and using an AI assistant, their language preferences, communication issues, the gender of their voice assistants, and their personal connection to the devices. The participants’ names are withheld to protect their privacy. Table 2 assigns them fictional names, to be used when quoting interviewees during the analysis.

4.2 Coding and analytical procedure

The recorded interviews were transcribed in standard written Turkish. Paralinguistic elements such as short pauses “(.)”, laughter “@” and elongated vowels “::” are also transcribed. As this study is interested in individuals’ language ideologies, the reflexive statements of speakers – i.e., “metacommunication, participants’ talk about talk, or their reflections, signals and presuppositions about linguistic forms and their use” (Gal 2016: 116) – were coded inductively.

I applied the Grounded Theory Method for coding and categorizing to generate theory working closely with the data (Glaser and Strauss 2009). I created codes by “naming segments of data with a label that simultaneously categorizes, summarizes, and accounts for each piece of data” (Charmaz 2006: 43). These are found as subsections in the upcoming Sections 5 and 6. Whereas codes are closer to the data, describing a topic or quoting the data directly, categories are more complex (Berg and Milmeister 2011: 308). They link the codes and display connections to meta-level concepts. In what follows, I connect the instances where participants report communication issues with languages other than English (code 1) and declare English as “the original” (code 2). I show how these codes are related to technological dependencies on English in human-machine interactions (category).

Ideological constructions within these categories are examined through three semiotic processes: iconization, fractal recursivity and erasure (as discussed in Gal and Irvine 2019; Irvine and Gal 2000; Woolard 2020), which are reflected throughout the analysis. These ideological processes are realized by differentiating and comparing signs which are organized along an “axis of differentiation” (Gal and Irvine 2019: 19). In the iconization/rhematization processes, distinctions perceived in the signs are interpreted as if they mirror specific qualities and the fundamental character of speakers (Gal and Irvine 2019: 19; Woolard 2020: 10). For instance, when it comes to English-speaking voice assistants that mimic human speakers, participants associated their English with qualities such as digital, international, and natural. In juxtaposition, Turkish was linked with traits such as unnatural, robotic, a ‘follow-up’ version or perverse that were attributed to Turkish-speaking assistants. Fractal recursivity is defined as the ideological work of reproducing or repeating the contrasts created by the axis of differentiation within one of the contrasted signs (Gal and Irvine 2019: 20). Erasure, thirdly, refers to the process whereby speakers “eliminate” or “overlook” signs that refute/dispute their attempts at iconization or fractal recursivity (Woolard 2020: 10).

Considering that interviews are interactional contexts created by the researcher and the informants, it is necessary to recognize that this leads to an emergence of certain expectations regarding what interviewees should say or account for. Thus, it seems prudent to also direct attention towards the function of the language ideological work and particular associations made by participants. The data are therefore also examined with attention for how these are communicated to the interviewer, whom informants hope will understand what they are suggesting.

When it comes to the role of English in multilingual speakers’ use of voice assistants and what language ideological work speakers produce to account for their use of English in this context, we will see in the next sections that English comes with two sets of qualities. Some of these are linked to technological dependencies such as “original”, “digital” and “world language”; whereas others such as “sexy” and “natural” are associated with popular media discourses on Artificial Intelligence and standard language ideologies. In other words, one aspect is to be interpreted considering the perceived technological affordances, while the other is to be analysed in terms of its connection to science fiction narratives and the anthropomorphised design of voice assistants. As English contributes significantly to shaping local identities in the context of globalization (Lee and Jenks 2021), the analysis also explores how interviewees express their local identities through their metalinguistic reflections.

5 Technological dependencies on English

Many informants make sense of their experiences and language preferences by asserting that technology design inadequately accommodates non-English language options. One narrative revolves around the challenges arising from speech design, where assigning a single language to a specific country leads to communication issues when speaking a language other than the national variant of residence. Simultaneously, there exists a perception of English as the “original” language of computers and the wider digital realm, irrespective of the newly introduced voice technologies. This section illustrates how participants intertwine technological discourses associated with English with their experiences and worldviews.

5.1 Communication issues with other languages

The participants in the study either use English or German to engage with the voice assistants (see Table 2), except for Berk who prefers Turkish. The exceptional case reveals interesting insights, as the participant emphasises that his preference for Turkish causes communication problems. In the excerpt below, he tells me about the issues that he encounters.

(1)

Berk:

O da işte bazı adreslerde ciddi sıkıntı yaratıyor işte mesela eve gidiyorum dediğin zaman kayıtlıysa o adresler çok kolay. Ama işte mesela ben Almanya’da yaşıyorum, asistanı Türkçe kullanıyorum işte benim işyerim Radolfzell’de. Radolfzell’e gitmek istiyorum dediğim zaman mesela bana atıyorum Türkiye’den Uzungöl’ü açıyor diyeyim. […] Türkçe dinlediği için Türkiye’deki yerleri maplemeye çalışıyor, muhtemelen algoritması böyle çalışıyor.

It^[3] also causes serious trouble at some addresses, for example, if you say you are going home, those addresses are very easy if they are saved like that on the app. But, you know, I live in Germany, I use the assistant in Turkish, my workplace is in Radolfzell. When I say I want to go to Radolfzell, for example, it opens Uzungöl from Turkey. […] Because it listens in Turkish, it tries to map locations in Turkey. Its algorithm probably works like this.

With his background in engineering, Berk presents himself in his account as someone who understands how the technology works or why it may cause issues with different language options, asserting that algorithms are involved in processing user inputs. As Jones (2021: 2) notes, algorithms and “our ideas about algorithms come to dominate the way we think about interacting with digital technologies”. Berk employs his understanding of algorithms to interpret the “troubles” he experiences while using voice assistants in Turkish in Germany. He evokes one language/one nation language ideologies inscribed in voice assistants by contrasting two distinct locations – a lake in Turkey “Uzungöl” and a town in Germany “Radolfzell”. These examples not only demonstrate considerable phonetic distinctions but also differences in the types of locations, highlighting the challenges posed by the technology in handling diverse linguistic and geographical contexts. What is fascinating in this case is that the “multilingual language practice” in question consists of mentioning the name of a geographical location in the territorial borders of Germany while speaking Turkish. According to his account, these circumstances led him to develop strategies to navigate the device. As Berk notes, it is easier for him to navigate Google Maps if he manually saves particular addresses, such as his workplace or home, and refers to these locations as “home” or “work”.

Five of the six interviewees (including Berk) noted difficulties communicating with their devices because they use language options other than English or speak “non-standard” English. Non-standard English in this context not only refers to deviation from “correct” pronunciation but also to using Turkish or German words. These include geographical locations, as in the above example, but also names of friends and family, or song names that the users would like to listen to (also noted in Beneteau et al. 2019).

Informants address these challenges by not only acknowledging them but also seeking ways to work around them rather than openly contest them. Berk, for instance, exonerates the voice assistant and its algorithm, choosing to navigate the difficulties by sticking to the Turkish language option and employing specific strategies for coping. In contrast, other informants lean towards English, justifying their preference as a rational choice. All participants have at least tried using their voice control devices in English, and four of them prefer it as their primary language option for voice assistants (see Table 2). In the next subsection, I explain the motivations behind their choices.

5.2 English as the “original” language

Technological innovations in voice-controlled gadgets are firstly introduced for English and, more specifically, for British and American English vernaculars. Although users are usually presented with standardized normative English variants, the voice assistants integrated into smartphones, such as Siri and Google Assistant, also offer Turkish language options. Nevertheless, the Turkish option is usually introduced after the initial release with English. For instance, Turkish-speaking Siri was released five years after the initial introduction.^[4] This fact makes ‘English as the original and standard version for voice control’ available as a resource for motivating one’s language choice. In the excerpt below, Yavuz explains his preference for English in Siri by comparing it with other language options that would come into question for him:

(2)

Yavuz:	Bir de zannediyorum ki (.) orijinal (.) İngilizce olması bana kalırsa daha rahat çünkü e:: dünya dili ve böyle bir update geldiği zaman en iyi e:: imkanları bana kalırsa İngilizce ilk başta veriyor, sonra Almanca geliyor, en son Türkçeye filan geliyordur gibi geliyor. Çünkü Türkçe Siri sesi dinlediğim zaman şey olmuştum, abi bu ne yani çok (.) hiç doğal değil.
	I also think that (.) original (.) English is more comfortable for me, because uhm it is a world language and when an update comes, English gives the best opportunities and then German. The last one would be Turkish, it feels like. Because when I first listened to Siri in Turkish, I was like “what is this?” I mean, it is not (.) natural at all.

His reflections show that Yavuz is aware of companies’ tendency to offer more features for English, followed by other European languages like German, before providing any updates for non-Western languages such as Turkish. In the beginning of his sentence, he mentions the term “original” and then pauses shortly to specify that he is referring to English. English is notably iconized (Irvine and Gal 2000) as the original language in which users experience technologies in the most optimized way.

In the end of his statement, Yavuz explicitly differentiates Turkish from both German and English, assessing it as sounding “not natural”. As Woolard (2020: 11) notes, “iconization always trades at least implicitly in contrasts in order to generate meaning” between dichotomies that represent higher and lower scales. By iconizing English as “the original” and “world language”, Yavuz explicitly constructs Turkish as “not natural” and implicitly as merely a local language with limited widespread use. One might argue that Yavuz is not so much iconizing Turkish as unnatural but is merely alluding to the technological, robotic quality of the Turkish voice output compared to the ‘human’ voice effect that English synthetic voice use exudes. From a technical standpoint, however, all synthetic voices can be found (un)natural or (non)humanlike, depending on those who perceive them (e.g., Wagner et al. 2019). It seems more convincing, therefore, to argue that Yavuz’s perception is not so much caused by technological constraints but by his ideological stance.

Given my presence in the interview as a person who shares a similar educational background, linguistic repertoire, and migration biography with Yavuz, he also presents himself as an educated, mobile, and cosmopolitan person by highlighting that English is more “comfortable” for him as a “world” language (see also Park and Wee 2021). This presentation not only serves as an explanation of his English language preference against a researcher he thinks could question why he does not speak Turkish, which is their shared language, but it also serves as a means of indicating worldviews and values that he believes the researcher will hold, which helps to explain why he chose English.

While Yavuz explains why he prefers the language by connecting it to technological developments and his cosmopolitan identity, Talha is not sure why he favors English as the language setting of his smartphone (3). Although he prefers interacting with Siri in German, he has his phone set to English. He is used to navigating his phone visually and haptically through typed English, but not to interacting orally in English. Let us look at some of his statements from the beginning and the end of the interview:

(3)

Author:	Peki Siri’yi sadece mutfakta mı kullanıyorsun?
Author:	So do you use Siri only in the kitchen?
Talha:	Mutfakta ve ışıkları açmak için başka bir yerde kullanmıyorum. Bir kere söyle bir şey denedim, söyle bir sorun oldu. Yine dil sorunuyla ilgili. Benim telefon İngilizce. Yani telefonun Almanca olmasını nedense hiç şu ana kadar istemedim. Hep İngilizce. İlk aldığım 3310 da İngilizceydi. Şu anda hala bütün telefonlarım İngilizce. Appler o yüzden otomatik İngilizce iniyor ve sanırım telefonun dili değişmeden app’in öyle bir özelliği de varsa değiştiremiyorsun. Yani bende Google Maps İngilizce, navigasyona girip sokaklardan geçerken o bütün Almanca sokak isimlerini İngilizce söylemeye çalışıyor ve çok komik bir durum ortaya çıkıyor.
	In the kitchen and to turn on the lights. I don’t use it anywhere else. Once I tried something, an issue occurred. Language issue again. My phone is in English. I mean I never wanted my phone to be in German yet, for some reason. It is always in English. The first one I purchased, 3310, was also in English. At the moment, all my phones are in English. That’s why the apps are downloaded automatically in English, and I think, you cannot change the language of the app without changing the language of the phone. I mean my Google Maps is in English, when I go to navigation and pass through the streets, it tries to say all the German street names in English, and a very funny situation occurs.

(4)

Talha:	Ben şimdi gün içinde Almanca konuşuyorum, Türkçe konuşuyorum. Bir de durup durmadık bir yerde hiç olmadık bir yerde telefonda İngilizce bir şey söylemek garip. […] he diyeceksin ki telefon niye İngilizce o zaman? Bu alışkanlık. O hep öyleydi.
	Now, I speak German and Turkish throughout the day. It is strange to say something in English to the phone suddenly at an inappropriate time and place. […] You may ask why my phone is in English then. It is a habit. It was always like this.

These two excerpts from the beginning (3) and the end (4) of the interview show Talha’s thought process during our conversation. In the beginning, he reflected on his preference for English in his phone’s language settings. However, he could not give a specific reason for his choice. He reflects on the past, when he was not using a smartphone – “3310” in this context refers to Nokia’s popular phone model from the 2000s. He highlights a clear distinction between visual and oral digital engagement.

As he interacts with humans in spoken language in German and Turkish throughout the day, he finds it out of place to manage his device orally in English. Hence, he prefers interacting with Siri in German. Engaging with English-speaking voice assistants, despite not using English in daily conversations, results in a bilingual or multilingual experience that he presents as undesirable. He is alluding to discourses that emphasize that each language has its place and is to be spoken separately in appropriate contexts. A similar observation related to these ideologies was made by Heller (2002: 48) in the Canadian context, which she calls “double monolingualism”: “while bilingualism is valued, it is only valued as long as it takes the shape of ‘double-monolingualism‘. One is expected to speak each ‘language’ as though it is a homogeneous monolingual variety […]. Mixed varieties, which of course are common in bilingual settings, are frowned upon”. While Talha does not embody these ideological discourses in his personal life, he feels accountable to them and provides an explanation for his language choices. In this case, he prefers speaking one language at a time and deems it rather “strange” to “switch” his spoken language practices to English only to speak to his phone.

On the other hand, navigating the phone by reading English layout (app names, settings etc.) is not something he finds odd, although it appears to deviate from the ideological schema of “double-monolingualism” (Heller 2002). Talha’s experience reflects the insights of Gal and Irvine (2019: 21) who argue that ideologies, as “totalizing visions”, lead individuals to transform arguments that do not align with the ideological framework. Throughout the interview, Talha emphasises many “multilingual issues” or “language issues” he faces, such as the voice assistant Siri coupled with GPS navigation never seeming to work or that he gave up on using Alexa because the device could not process Turkish names, referring to one-language/one-nation technology design (as seen in Excerpt 1 by another participant). Given the fact that he frequently mentioned these issues – and that at times he did so without direct inquiry (see Excerpt 3), the informant seems to assume that it is important to address these to an interviewer who might find these insights noteworthy or relevant given her background as a Turkish-speaking university researcher in the field of linguistics. Following this narrative, he justifies the English language settings on his phone by asserting that technological affordances are to blame, as the apps are downloaded automatically in English without his interference. Later in the interview (4), he explains that his English language settings became a “habit” symbolizing his technobiography intertwined with English. Talha’s case illustrates that the standardization of English goes beyond voice control technologies. English is “naturalized” and treated as “common sense” (Fairclough 2013: 30) – a written layout that does not have to be questioned further.

Another critical aspect is which language Talha excludes in his narration. He comments on German, stating that he “never wanted it [his phone] to be German for some reason” (3). While iconizing English as the default language option, he recognizes German as an alternative option. However, he “overlooks” or “erases” (Woolard 2020) Turkish as a language option for his phone, although he explains that Turkish is a part of his language repertoire. Erasure in language ideological processes occurs when linguistic forms do not conform to the “iconic image” which speakers wish to convey (Woolard 2020: 11). By omitting Turkish and naturalizing English in this context, Talha may be presenting himself as someone closely following technological developments, often introduced in English. Thus, mentioning Turkish as an option here would not align with the kind of self-presentation he seeks. The next section addresses similar language ideologies surrounding English, showing the contribution not only of technological circumstances but also of cultural discourses about AI.

6 Cultural constructions of Artificial Intelligence

“I < 3 Star Trek, bad puns, and platypuses. Tweets and opinions are my own.”

(Amazon Alexa’s Twitter Bio, @alexaa99, Access: 30.09.22)

In this section, I aim to illustrate how English is associated with cultural constructions of AI and how these associations are deployed to legitimize the prominence of English in informants’ use of voice assistants. The data shows that imaginations of AI as humanlike entities inspired by science fiction are very strongly present in users’ perceptions of their interactions. By stating this, I do not suggest that all users treat their AI assistants like humans, but that they are aware of these discourses. It would be difficult for them to ignore this aspect, as anthropomorphism and science fiction narratives are also used as a marketing strategy for these devices. For instance, social media profiles of the voice assistant Alexa on Twitter and Instagram depict the AI assistant as a humanlike persona. In the quote above taken from the bio information on Twitter, we observe that Alexa is presented to be articulating an affinity towards science fiction movies like Star Trek and expressing “opinions” which indicates that Alexa is capable of raising their voice like a human.

Similar to the social media profile, the outputs of voice assistants are based on pre-written scripts aiming to achieve “humanness” (Sweeney 2016: 221). Given that these technologies depend on voice-based feedback, the assistants’ gendered voices that are female by default are especially crucial in anthropomorphizing processes (e.g., Sweeney 2016; West et al. 2019).

Furthermore, virtual assistants directly address users by “expressing opinions” or “making jokes”, as indicated on the social media profile. This “simulation of private, face-to-face, person-to-person discourse” by companies through voice assistant design can be considered a type of “synthetic personalization” (Fairclough 2013: 65; Thurlow 2018). Fairclough (2013: 65) argues that synthetic personalization practices are part of blurring the lines between public and private in the process of “expand[ing] into private domains”. The fascination with speaking to humanlike robots is instrumentalized in voice assistant design that facilitates companies in accessing private spaces. In the following sub-sections, I introduce instances where speakers reflect on gendered voices and science fiction to discuss the connection between AI’s cultural assignations and the role of English.

6.1 Gendered assistants

Sound in human-machine interactions seems to play an important role in the construction of computers as anthropomorphic entities. As discussed earlier in Section 2, the popular voice assistants are marketed and released with the female voice option. If the users are informed about the options and would like to change the sound, they must change the default female voice to male in the smartphone settings. However, these preferences are limited because the gender choices are only available in specific language options.^[5]

By keeping these circumstances in mind, it is not surprising that only two of the six interviewees were interested in changing the default female voice of Siri to male. The other participants were surprised by my question regarding the gendered sound for their voice assistants. The two participants, Yavuz and Sena, explain that they prefer the gendered male voice, because it appeals to them more. However, it is not only the gendered voice that leads to anthropomorphizing. They particularly find the male British English voice appealing.

(5)

Yavuz:	Ya kadın sesi bana çok şey geliyor, klişe geliyor hiç böyle personalize değil.
	Female voice sounds very cliché for me, it is not really personalized.
Author:	A sen o değiştirdin o dili, onu ne zaman yaptın?
	Oh, so you changed it. When did you do that?
Yavuz:	Galiba başladıktan çok kısa bir süre sonra yaptım. Hepsini denedim hepsine baktım, hangisini seçsem bu bana çekici geliyor diye. Yani İngiliz İngilizcesini sevdiğimden ve biraz da hani e:: (.) gökkuşağı rengimiz belli olsun diye.
	I think I did this after a short while using it. I tried and looked at all of them to pick out and decide which one sounds more attractive to me. I mean because I love British English and to show, you know, (.) our rainbow colors.

(6)

Sena:

Alexa bende sanırım Amerikalı (.) çünkü birkaç tane denedim e:: farklı çok da geyik oluyor. Bu arada yani alırsan kesinlikle dene. Siri’de denemişsindir belki. Böyle bir Hintli İngilizcesi, Avustralya İngilizcesi mesela ben hiç iletişim kuramıyorum anlamıyorum filan @ ne diyor acaba. şey e:: (.) Alexa benim için nötr aslında. Kadın sesi var bende. Mesela Siri’de erkek sesi vardı, İngiliz erkeği sesi koymuştum çok seksi ve hoş geldiği için kulağa.

Alexa is, I think, American for me (.) because I tried a few and uhm it is very funny actually. By the way, if you get one for yourself, you should definitely try it. Maybe you tried it with Siri already. For example, Indian English, Australian English, I cannot communicate at all, I don’t understand @ I wonder what it is saying. Alexa is actually neutral for me. I have the female voice. For example, Siri had male voice. I picked out British male voice, since it sounds very sexy and nice.

Above, Yavuz and Sena tell me that they changed the voice of their assistants from female to male after trying out different language options. Multiple aspects are interesting in these statements. Firstly, the fact that they decided to change the voice option according to their attraction indicates that they attribute some human characteristics to their devices. Describing the computer voice as “sexy and nice” not only indicates that the AI assistant is depicted as a person but also an attractive one. Yavuz also notes that he wanted to show his “rainbow colors” while picking out this sound to hint at his queer identity and indicate that he finds male voices appealing. In doing so, he presents himself as a person who engages with the technology in a non-stereotypical way, opting for an unconventional gender option for Siri and wishes to emphasise his sexuality. For Yavuz, Siri functions as means to portray himself in a particular light: cosmopolitan (Excerpt 2) and queer. Similarly, Sena associates a sense of sexiness with Siri’s synthetic male voice, deliberately chosen and differentiated from Alexa’s female voice, which she categorizes as “neutral”, indexing the default setting of the technology.

Simultaneously, we see that axes other than Turkish/English are active in ideological processes when evaluating synthetic voices. British English is iconised as “sexy”, “attractive” and “nice”, differentiated from all other English languages that are associated with “fun” options to experiment with. British and American English variants as Standard English, ‘Received Pronunciation’, typically evoke indexical meanings that span from prestige, sophistication and (linguistic) authority to seriousness and intelligence (e.g., Bhatt 2002; Park and Wee 2021; Preisler 1995). The ascribed enregistered qualities may thus enhance the overall perception of attractiveness. Sena further notes that Indian and Australian Englishes are unintelligible to her, a detail that may index her educational background. Her narrative implies that her linguistic proficiency is limited to understanding ‘Standard English’ which is typically acquired through formal education in academic settings as the ‘correct’ way of speaking. The contrast drawn between British as “sexy” and Indian/Australian Englishes as “fun” thus reiterates the hegemony of Standard English within this specific cultural and technological context. The metalinguistic reflections about the choice of British English for Siri and American English for Alexa functions as a way to present oneself as educated through linguistic proficiency in (Standard) English. The following section discusses the entanglement of British English, Turkish, gender options while exploring the influence of science fiction narratives.

6.2 Science fiction narratives

The interviews show that popular media, especially science fiction narratives from the United States, influence what an AI assistant is expected to sound like. Below, Yavuz mentions Star Trek to explain why he thinks English is the appropriate language for a voice assistant:

(7)

Siri:	Ben Siri, senin sanal asistanın [in Turkish, male voice]
	I am Siri, your virtual assistant [in Turkish, male voice]
Siri:	Ben Siri, senin sanal asistanın [in Turkish, female voice]
	I am Siri, your virtual assistant [in Turkish, female voice]
Yavuz:	Çok çok çok (.) sapkınca geliyor gerçekten Türkçe @ hiç doğal gelmiyor ses tonu. Ya bir de belki de benim ana dilim olduğu için öyle o yüzden daha böyle kritize edebiliyorum bunu ama-
	it sounds very very very (.) perverted, really Turkish @ it doesn’t sound natural at all. Or maybe it’s because it’s my mother tongue, so that I can criticize it like this, but-
Siri:	I am Siri, your virtual assistant [in British English, male voice]
	I am Siri, your virtual assistant [in British English, male voice]
Yavuz:	Bu bana çok daha şey geliyor hani tamam, bambaşka bir dünyadayız hani o Star Trek’teyiz, ben bununla konuşabiliyorum ama Türkçe konuştuğum zaman şey diyorum sen Apple’sın benimle niye Türkçe konuşuyorsun? Bir de böyle çok değişik bir tonlamayla.
	This sounds a lot more to me, okay, we are in a different world, you know, we are in Star Trek, I can talk with it, but when I speak Turkish, I say, you are Apple, why are you speaking to me in Turkish? And with such a different intonation.

In this excerpt, Yavuz tries out Siri’s male and female voices in Turkish to evaluate the Turkish and English language options. He was not aware that there was a male voice option for Turkish and hears it for the first time in this instance. By clicking on each option, Yavuz plays each sound separately and we hear different synthetic voices.

To understand his comments in this excerpt, it is necessary to remember that he prefers to use Siri with a British English male voice and evaluates it as sounding more attractive to him (as seen in Extract 5). After trying out Siri’s male and female voices in Turkish, Yavuz makes assessments about the Turkish and English language options. The Turkish/English axis is reiterated, in which British English is regarded as fitting and “sexy” (Extract 5) and Turkish as “perverted”. He argues that because his mother tongue is Turkish, he can point out sound design and intonation issues. This statement may be linked to the limited technological advance that has been made in Turkish voice design. Nevertheless, since there are no grammatical or intonational issues in Siri’s Turkish output, it is unlikely that the participant is referring to these formal aspects. Rather, he deems it culturally inappropriate for the AI assistant to produce synthetic voice in Turkish. Being familiar with the concept of human-machine interactions from American science fiction movies, he finds it disconcerting and strange to listen to Siri in Turkish. Through the construction of a dichotomy between these languages, qualities are assessed based on “preexisting cultural images” related to popular media culture (Gal 2016: 121). The participant picks up the qualities of a British English voice produced by an AI technology and relates them to science fiction movies like Star Trek that feature human-machine conversations in English. The synthetic voice of Siri in English is taken to index science fiction narratives, considered appropriate in the context of voice assistant interactions. Within these cultural discourses, Turkish has no place and sounds “perverted”.

Interviews with other participants also show numerous references to science fiction movies, TV programs and cartoons from the United States. These include the movie “Her” and the animated science fiction cartoon “the Jetsons”. Furthermore, some interviewees talked about science fiction narratives in general during our dialogues about voice control technologies. Although I did not ask about anything related to science fiction or mention any issue regarding this aspect, the speakers brought up some examples from movies and cartoons to illustrate their interactions. Based on these observations, there seems to be the creation of a particular cultural image based on English-speaking science fiction movies, which contributes to iconization processes of English as the standard, prestigious, natural, and culturally appropriate language of digital assistants. These circumstances would also suggest that oral interactions with other languages such as Turkish may be interpreted as strange and unnatural.

Another issue that Yavuz mentions is that it seems odd for him to interact with an Apple gadget in Turkish (7), which may result from the company’s origin in the United States. Hence, the voice assistant is not only perceived as a conversational AI associated with science fiction narratives, but it also represents the company that markets the technology. When the US-based company speaks through the AI assistant in Turkish, the company’s authenticity is questioned, leading to iconization of Turkish as culturally inappropriate.

Additionally, Yavuz grapples with inconsistencies in the iconic image associated with English. While he favours the British English voice option, he provides examples tied to American culture. This incongruity prompts the reflection that he may associate British English with the broader anglophone culture, perceiving it as the standard form – akin to received pronunciation.

While certain dependencies on English in technologies are acknowledged due to the inherent technology design (discussed in the previous section), the participants leverage English-speaking voice technologies entangled with science fiction narratives and standard language ideologies to shape a particular image of themselves – a portrayal marked by a cosmopolitan, mobile, educated, and technologically savvy lifestyle. This phenomenon can be attributed to the specific data sampling and participant recruitment methods employed. As outlined in the methodology, I engaged with these individuals through digital chat groups known as the “New Wave” of migrants in Germany. These communities identify themselves as comprised of newcomers with a high level of education. The announcement of my project concerning voice assistants likely resonated with those who consider themselves as actively engaged with contemporary AI voice technologies and enthusiasts of technological advancements. Furthermore, while interacting with a researcher who shares a similar linguistic and migration background as themselves, informants are accounting for the prominent or continued role of English in their voice assistant use, by means of ideological associations that they expect the interviewer to understand or find relevant.

7 Conclusions

This study aimed to explore the role of English in multilingual speakers’ use of voice assistants, identify some of the structural and historical factors that have contributed to the prominence of English in these contexts and investigate language ideological work produced by multilingual speakers to account for their use of English. To answer these research questions, I closely explored monolingual speech design and drew on qualitative interviews with six users. To address the gap in previous research on voice technologies, I particularly sampled people who are multilingual and have recently migrated from Turkey to Germany. Examining individuals’ language choices for their digital technologies and how they interpret these preferences in relation to technology design offers a unique perspective for understanding semiotic assemblages from a language ideological perspective. “Rather than viewing language in segregational terms as linguistic choices made by people in various contexts, this [an understanding of semiotic assemblages] allows for an appreciation of a much wider range of linguistic, artefactual, historical and spatial resources brought together in particular assemblages in particular moments of time and space” (Pennycook 2018: 54). Moving beyond individuals’ language choices, an exploration of technological affordances, cultural discourses about AI, and broader language ideologies facilitates an understanding of digital technologies as sociotechnical assemblages appropriated within specific social and cultural contexts.

The categorization of languages as countable and nameable entities tied to political borders (cf. Gal 2006: 14; Stevenson 2008: 484f.) does not consider the dynamicity and complexity of communication (Pennycook 2018). Contemporary voice-enabled technologies appear to further cultivate and secure these ideologies. These language options establish strict borders between standardized languages and do not allow room for “non-standard” language practices, although we know that “multilinguals do not think unilingually in a politically named unity” (Li 2018: 18). These top-down language ideologies are elicited by the participants with references to communication issues in algorithmic processing of location and people’s names that diverge from the selected national and regional language. Within the context of voice assistants, the concept of “multilingual language practice” takes on a distinct meaning, as it involves mentioning the name of a geographical location within the territorial borders of one country while speaking another national language. At the same time, “double-monolingualism” (Heller 2002) ideology emerges in these contexts, where some users find it inappropriate to “fully switch” to a language they do not use in their everyday communication solely for the purpose of utilizing voice technology.

The depiction of English as the original language of voice assistants and the wider digital realm corresponding to technological developments is a way of naturalizing, or presenting it as “common sense” (Fairclough 2013: 30). Previous studies note that the ability to use digital technologies, i.e., digital literacy, (Friedrich and Figueiredo 2016: 67) is higher for people who can speak and read in English, as they have access to the most optimized versions of these technologies. Digital literacy is “generally acquired by those in more developed countries and higher socioeconomic classes and […] the same is true for linguistic capital of learning English” (Friedrich and Figueiredo 2016: 68). The participants in this study are also from higher socioeconomic classes and have either attained or are in the process of attaining a university degree. Hence, they have better access to English and to new and emerging speech technologies, although with their multilingual repertoires they may have not been imagined as their target audience. Furthermore, people with a knowledge of technologies are likely to better understand that the communication issues they experience may derive from the algorithms rather than from their “mistakes” or “mispronunciation”. As the study only investigated the perspectives of English speakers, we cannot say much about people who do not have the resources to learn English. However, we can assume that those without access to such linguistic capital are more likely to be hindered by the speech design.

Despite the communication issues faced with using English-speaking voice assistants in Germany, the participants still predominantly prefer using their devices in English to sketch a particular version of themselves and of the world around them. As asserted by Lee and Jenks (2021), language ideologies play a pivotal role in shaping how individuals express their linguistic identities. In the context of globalization and the widespread use of English, the discourses surrounding the language acquire new meanings in various local social and cultural contexts (e.g., Park and Wee 2021; Pennycook 2003). To understand the meaning of English in the context of the study, it is thus crucial to consider the background of informants who identify as well-educated newcomers and distinguish themselves from the existing guest worker diaspora in Germany.

Central to the narratives are the contrasting ideologies regarding English and Turkish that can be described as set along “axes of differentiation”, in which languages are assessed qualities depending on their place in “pre-existing cultural images” (Gal 2016: 121). The iconization processes framing English as a “world” language, coupled with the evocation of typical indexical meanings of prestige and sophistication associated with Standard English contribute to the construction of cosmopolitan, educated, and upwardly mobile identities. Furthermore, the conceptualisation of English as the “original” language of the digital realm acts as a mechanism for participants to align themselves with technological advancements predominantly introduced in English. In the context of “anthropomorphized virtual agents”, characterized by humanlike qualities such as gendered voices and personality traits (Sweeney 2016), the role of English is further nuanced by the cultural discourses of AI. These discourses, intertwined with American science fiction films and anglophone popular culture associated with British English, contribute to the iconization of Standard English as appropriate and “natural” in the voice assistant context, enhancing the portrayal of oneself as ‘in tune with technology’. In crafting this “iconic image” (Woolard 2020) for AI assistants, Turkish is marked as culturally inappropriate.

The findings support previous research on the function of English “as a gatekeeper to positions of prestige in a society” (Pennycook 2017: 14). There is a risk that non-English language options could be seen as non-prestigious compared to the “original” and “natural” sounding variant. If speakers of Turkish or other marginalized languages generally prefer English or other European language options, this could have consequences for the development of AI-driven voice technologies. Because these language models need more data to perform better, less users would mean less improvement. If the users are not willing to interact with the voice technologies in Turkish, there would not be sufficient material to develop them, creating a potential “feedback loop” (Schneider 2022: 380). At the same time, this means that the language models could alternatively be fed with “non-standard” English practices to potentially diversify the training data.

The findings in this paper remain indicative, as they are based on a qualitative study with a limited number of participants. In terms of further research, a complimentary quantitative survey examining the preferences of speakers with diverse linguistic repertoires could reveal interesting insights regarding the potential of a “feedback loop”. Further qualitative research on voice assistants that investigates how (multilingual) people actually use voice assistants and attune to them could also yield valuable findings (e.g. Alač et al. 2020; Beneteau et al. 2019). It would be especially valuable to conduct research with people who do not have access to English and speak marginalized languages. For instance, future studies could investigate the experiences of Kurdish speakers, a language that is not offered as an option by any of the popular voice technologies. Such an approach would open up another perspective on digital literacy, social inequalities and the role of English.

Corresponding author: Didem Leblebici, The Faculty of Social and Cultural Sciences, European University Viadrina, Große Scharrnstraße 59, 15230 Frankfurt (Oder), Germany, E-mail: leblebici@europa-uni.de

Acknowledgments

I am grateful to my PhD supervisor, Britta Schneider, for the inspiring talks, helpful suggestions, and constructive criticism during the preparation of this paper. Special thanks to all the participants of this study, without whom I could not have written this work. I would like to thank them for their willingness to provide personal insights and their exciting contributions and answers to my questions.

Note: The data presented in this article is based on my unpublished MA thesis. Leblebici, Didem (2021). Language ideologies in human-machine interaction. A qualitative study with voice assistant users. MA thesis, Frankfurt (Oder): Europa-Universität Viadrina.
Competing interests: The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References

Alač, Morana, Yelena Gluzman, Tiffany Aflatoun, Adil Bari, Buhang Jing & German Mozqueda. 2020. Talking to a toaster: How everyday interactions with digital voice assistants resist a return to the individual. Evental Aesthetics 9(1). 3–53.Search in Google Scholar

Beneteau, Erin, Olivia K. Richards, Mingrui Zhang, Julie A. Kientz, Jason Yip & Alexis Hiniker. 2019. Communication breakdowns between families and Alexa. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–13. New York, NY, USA: ACM.10.1145/3290605.3300473Search in Google Scholar

Benjamin, Ruha. 2019. Race after technology: Abolitionist tools for the New Jim Code. Medford, MA: Polity.10.1093/sf/soz162Search in Google Scholar

Berg, Charles & Marianne Milmeister. 2011. Im Dialog mit den Daten das eigene Erzählen der Geschichte finden: Über die Kodierverfahren der Grounded-Theory-Methodologie. In Günter Mey & Katja Mruck (eds.), Grounded Theory Reader, 2nd edn., 303–332. Wiesbaden: VS Verlag.10.1007/978-3-531-93318-4_14Search in Google Scholar

Bhatt, Rakesh Mohan. 2002. Experts, dialects, and discourse. International Journal of Applied Linguistics 12(1). 74–109. https://doi.org/10.1111/1473-4192.00025.Search in Google Scholar

Charmaz, Kathy. 2006. Constructing grounded theory. London; Thousand Oaks, Calif: Sage Publications.Search in Google Scholar

Ching, Cynthia Carter & Linda Vigdor. 2005. Technobiographies: Perspectives from education and the arts. Champaign, IL: First International Congress of Qualitative Inquiry.Search in Google Scholar

Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, Massachusetts: The MIT Press.10.21236/AD0616323Search in Google Scholar

Cirucci, Angela M. & Urszula M. Pruchniewska. 2021. UX research methods for media and communication studies: An introduction to contemporary qualitative methods, 1st edn. New York: Routledge.10.4324/9781003181750Search in Google Scholar

Coeckelbergh, Mark & David J. Gunkel. 2023. ChatGPT: Deconstructing the debate and moving it forward. London: Springer Verlag.10.1007/s00146-023-01710-4Search in Google Scholar

Costa-Jussà, Marta R., Christine Basta & Gerard I. Gállego. 2020. Evaluating gender bias in speech translation. arXiv:2010.14465 [cs]. http://arxiv.org/abs/2010.14465 (accessed 28 June 2021).Search in Google Scholar

Costanza-Chock, Sasha. 2020. Design justice: Community-led practices to build the worlds we need. Cambridge, Massachusetts: The MIT Press.10.7551/mitpress/12255.001.0001Search in Google Scholar

Crawford, Kate. 2016. Can an algorithm be agonistic? Ten scenes from life in calculated publics. Science, Technology, & Human Values 41(1). 77–92. https://doi.org/10.1177/0162243915589635.Search in Google Scholar

Crawford, Kate & Vladan Joler. 2018. Anatomy of an AI system. http://www.anatomyof.ai (accessed 11 July 2022).Search in Google Scholar

Curry, Amanda Cercas, Judy Robertson & Verena Rieser. 2020. Conversational assistants and gender stereotypes: Public perceptions and desiderata for voice personas, 72–78. https://www.aclweb.org/anthology/2020.gebnlp-1.7 (accessed 28 June 2021).Search in Google Scholar

Fairclough, Norman. 2013. Critical discourse analysis: The critical study of language, 2nd edn. London: Routledge.10.4324/9781315834368Search in Google Scholar

Friedrich, Patricia & Eduardo H. Diniz de Figueiredo. 2016. The sociolinguistics of digital Englishes. London; New York: Routledge, Taylor & Francis Group.10.4324/9781315681184Search in Google Scholar

Gal, Susan. 2006. Migration, minorities and multilingualism: Language ideologies in Europe. In Clare Mar-Molinero & Patrick Stevenson (eds.), Language Ideologies, Policies and Practices, 13–27. London: Palgrave Macmillan UK.10.1057/9780230523883_2Search in Google Scholar

Gal, Susan. 2016. Sociolinguistic differentiation. In Nikolas Coupland (ed.), Sociolinguistics, 113–136. Cambridge: Cambridge University Press.10.1017/CBO9781107449787.006Search in Google Scholar

Gal, Susan & Judith T. Irvine. 2019. Signs of difference: Language and ideology in social life, 1st edn. New York, NY, Cambridge, UK: Cambridge University Press.10.1017/9781108649209Search in Google Scholar

Ganga, Deianira & Sam Scott. 2006. Cultural “insiders” and the issue of positionality in qualitative migration research: Moving “across” and moving “along” researcher-participant divides. Forum Qualitative Sozialforschung 7(3).Search in Google Scholar

Gillespie, Tarleton. 2016. Algorithm. In Benjamin Peters (ed.), Digital Keywords: A Vocabulary of Information Society and Culture, 18–30. Princeton: Princeton University Press.10.2307/j.ctvct0023.6Search in Google Scholar

Glaser, Barney Galland & Anselm Leonard Strauss. 2009. The discovery of grounded theory: Strategies for qualitative research, vol. 4. Paperback Printing. New Brunswick: Aldine.Search in Google Scholar

Griol, David, Javier Carbó & José M. Molina. 2013. An automatic dialog simulation technique to develop and evaluate interactive conversational agents. Applied Artificial Intelligence 27(9). 759–780. https://doi.org/10.1080/08839514.2013.835230.Search in Google Scholar

Habscheid, Stephan, Tim Moritz Hector, Christine Hrncal & David Waldecker. 2021. Intelligente Persönliche Assistenten (IPA) mit Voice User Interfaces (VUI) als ‚Beteiligte‘ in häuslicher Alltagsinteraktion. Welchen Aufschluss geben die Protokolldaten der Assistenzsysteme? Journal für Medienlinguistik 4(1). 16–53. https://doi.org/10.21248/jfml.2021.44.Search in Google Scholar

Hector, Tim, Franziska Niersberger-Gueye, Franziska Petri & Christine Hrncal. 2022. The ‘conditional voice recorder’: Data practices in the co-operative advancement and implementation of data-collection technology. Siegen: Universitätsbibliothek Siegen.Search in Google Scholar

Heller, Monica. 2002. Commodification of bilingualism in Canada. In David Block & Deborah Cameron (eds.), Globalization and Language Teaching, 47–63. London; New York: Routledge.Search in Google Scholar

Hoy, Matthew B. 2018. Alexa, Siri, Cortana, and more: An introduction to voice assistants. Medical Reference Services Quarterly 37(1). 81–88. https://doi.org/10.1080/02763869.2018.1404391.Search in Google Scholar

Humphry, Justine & Chris Chesher. 2020. Preparing for smart voice assistants: Cultural histories and media innovations. New Media & Society 23(7). 1–18. https://doi.org/10.1177/1461444820923679.Search in Google Scholar

Irvine, Judith T. 1989. When talk isn’t cheap: Language and political economy. American Ethnologist 16(2). 248–267. https://doi.org/10.1525/ae.1989.16.2.02a00040.Search in Google Scholar

Irvine, Judith T. & Susan Gal. 2000. Language ideology and sociolinguistic differentiation. In Paul V. Kroskrity (ed.), Regimes of language: Ideologies, polities, and identities, 35–84. Santa Fe: School of American Research Press.Search in Google Scholar

Jones, Rodney H. 2021. The text is reading you: Teaching language in the age of the algorithm. Linguistics and Education 62. 1–7. https://doi.org/10.1016/j.linged.2019.100750.Search in Google Scholar

Kelly-Holmes, Helen. 2019. Multilingualism and technology: A review of developments in digital communication from monolingualism to idiolingualism. Annual Review of Applied Linguistics 39. 24–39. https://doi.org/10.1017/S0267190519000102.Search in Google Scholar

Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky & Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117(14). 7684–7689. https://doi.org/10.1073/pnas.1915768117.Search in Google Scholar

Lee, Carmen. 2014. Language choice and self-presentation in social media: The case of university students in Hong Kong. In Philip Seargeant & Caroline Tagg (eds.), The language of Social Media, 91–111. London: Palgrave Macmillan UK.10.1057/9781137029317_5Search in Google Scholar

Lee, Jerry Won & Christopher Jenks. 2021. Ideology, identity and world Englishes. In Ruanni Tupas, Rani Rubdy & Mario Saraceni (eds.), Bloomsbury World Englishes: Ideologies, vol. 2, 114–126. London; New York: Bloomsbury Academic.10.5040/9781350065871.0014Search in Google Scholar

Lemoine, Blake. 2022. Is LaMDA sentient? – An interview. Medium. https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917 (accessed 8 July 2022).Search in Google Scholar

Li, Wei. 2018. Translanguaging as a practical theory of language. Applied Linguistics 39(1). 9–30. https://doi.org/10.1093/applin/amx039.Search in Google Scholar

Li, Wei. 2020. Multilingual English users’ linguistic innovation. World Englishes 39(2). 236–248. https://doi.org/10.1111/weng.12457.Search in Google Scholar

Linde, Charlotte. 1987. Explanatory systems in oral life stories. In Dorothy Holland & Naomi Quinn (eds.), Cultural Models in Language and Thought, 343–366. Cambridge: Cambridge University Press.10.1017/CBO9780511607660.015Search in Google Scholar

Markl, Nina. 2022. Language variation and algorithmic bias: Understanding algorithmic bias in British English automatic speech recognition. In 2022 ACM Conference on Fairness, Accountability, and Transparency, 521–534. Seoul Republic of Korea: ACM.10.1145/3531146.3533117Search in Google Scholar

Natale, Simone. 2021. Deceitful media: Artificial Intelligence and social life after the turing test. New York: Oxford University Press.10.1093/oso/9780190080365.001.0001Search in Google Scholar

Noble, Safiya Umoja. 2018. Algorithms of oppression: How search engines reinforce racism. New York: New York University Press.10.2307/j.ctt1pwt9w5Search in Google Scholar

Oldac, Yusuf Ikbal & Nigel Fancourt. 2021. ‘New wave Turks’: Turkish graduates of German universities and the Turkish diaspora in Germany. British Journal of Educational Studies 69(5). 621–640. https://doi.org/10.1080/00071005.2021.1935708.Search in Google Scholar

Pariser, Eli. 2011. The filter bubble: What the internet is hiding from you. London: Penguin.10.3139/9783446431164Search in Google Scholar

Park, Joseph Sung-Yul & Lionel Wee. 2012. Markets of English: Linguistic capital and language policy in a globalizing world (Routledge Studies in Sociolinguistics 5). New York: Routledge.Search in Google Scholar

Park, Joseph Song-Yul & Lionel Wee. 2021. World Englishes and the commodification of language. In Ruanni Tupas, Rani Rudby & Mario Saraceni (eds.), Bloomsbury World Englishes, 64–80. London; New York: Bloomsbury Academic.Search in Google Scholar

Pennycook, Alastair. 2003. Global Englishes, rip slyme, and performativity. Journal of Sociolinguistics 7(4). 513–533. https://doi.org/10.1111/j.1467-9841.2003.00240.x.Search in Google Scholar

Pennycook, Alastair. 2017. The cultural politics of English as an international language (Routledge Linguistics Classics). London; New York: Routledge.Search in Google Scholar

Pennycook, Alastair. 2018. Posthumanist applied linguistics. London; New York: Routledge/Taylor & Francis Group.10.4324/9781315457574Search in Google Scholar

Phan, Thao. 2017. The materiality of the digital and the gendered voice of Siri. Transformations 29. 23–33.Search in Google Scholar

Piller, Ingrid. 2001. Identity constructions in multilingual advertising. Language in Society 30(2). 153–186. https://doi.org/10.1017/S0047404501002019.Search in Google Scholar

Porcheron, Martin, Joel E. Fischer, Stuart Reeves & Sarah Sharples. 2018. Voice interfaces in everyday life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–12. Montreal QC Canada: ACM.10.1145/3173574.3174214Search in Google Scholar

Portmann, Lara. 2022. Crafting an audience: UX writing, user stylization, and the symbolic violence of little texts. Discourse, Context & Media 48. 100622. https://doi.org/10.1016/j.dcm.2022.100622.Search in Google Scholar

Poushneh, Atieh. 2021. Humanizing voice assistant: The impact of voice assistant personality on consumers’ attitudes and behaviors. Journal of Retailing and Consumer Services 58. 102283. https://doi.org/10.1016/j.jretconser.2020.102283.Search in Google Scholar

Preisler, Bent. 1995. Standard English in the world. Multilingua – Journal of Cross-Cultural and Interlanguage Communication 14(4). 341–362. https://doi.org/10.1515/mult.1995.14.4.341.Search in Google Scholar

Rampton, Ben, Janet Maybin & Celia Roberts. 2015. Theory and method in linguistic ethnography. In Julia Snell, Sara Shaw & Fiona Copland (eds.), Linguistic Ethnography, 14–50. London: Palgrave Macmillan UK.10.1057/9781137035035_2Search in Google Scholar

Rosenwald, George C. & Richard L. Ochberg (eds.). 1992. Storied lives: The cultural politics of self-understanding. New Haven: Yale University Press.Search in Google Scholar

Roulston, Kathryn & Myungweon Choi. 2018. Qualitative interviews. In The SAGE handbook of qualitative data collection, 233–249. London: SAGE Publications Ltd.10.4135/9781526416070.n15Search in Google Scholar

Savas, Özlem. 2019. Affective digital media of new migration from Turkey: Feelings, affinities, and politics. International Journal of Communication 13. 5405–5426.Search in Google Scholar

Sayers, Dave, Rui Sousa-Silva, Sviatlana Höhn, Lule Ahmedi, Kais Allkivi-Metsoja, Dimitra Anastasiou, Štefan Beňuš, Lynne Bowker, Eliot Bytyçi, Alejandro Catala, Anila Çepani, Rubén Chacón-Beltrán, Sami Dadi, Fisnik Dalipi, Vladimir Despotovic, Agnieszka Doczekalska, Sebastian Drude, Karën Fort, Robert Fuchs, Christian Galinski, Federico Gobbo, Tunga Gungor, Siwen Guo, Klaus Höckner, Petralea Láncos, Tomer Libal, Tommi Jantunen, Dewi Jones, Blanka Klimova, Eminerkan Korkmaz, Sepesy Maučec Mirjam, Miguel Melo, Fanny Meunier, Bettina Migge, Barbu Mititelu Verginica, Aurélie Névéol, Arianna Rossi, Antonio Pareja-Lora, Christina Sanchez-Stockhammer, Aysel Şahin, Angela Soltan, Claudia Soria, Sarang Shaikh, Marco Turchi & Sule Yildirim Yayilgan. 2021. The dawn of the human-machine era: A forecast of new and emerging language technologies. Jyväskylä: COST: European Cooperation in Science & Technology, University of Jyväskylä. Report for EU COST Action CA19102 ‘Language in the Human-Machine Era’.10.17011/jyx/reports/20210518/1Search in Google Scholar

Schneider, Britta. 2022. Multilingualism and AI: The regimentation of language in the age of digital capitalism. Signs and Society 10(3). 362–387. https://doi.org/10.1086/721757.Search in Google Scholar

Seaver, Nick. 2019. Knowing algorithms. In Janet Vertesi & David Ribes (eds.), A Field Guide for Science & Technology Studies, 412–422. London: Princeton University Press.10.2307/j.ctvc77mp9.30Search in Google Scholar

Spilioti, Tereza. 2020. The weirding of English, trans-scripting, and humour in digital communication. World Englishes 39(1). 106–118. https://doi.org/10.1111/weng.12450.Search in Google Scholar

Stevenson, Patrick. 2008. The German language and the future of Europe: Towards a research agenda on the politics of language. German Life and Letters 61(4). 483–496. https://doi.org/10.1111/j.1468-0483.2008.00438.x.Search in Google Scholar

Sweeney, Miriam. 2016. The intersectional interface. In Safiya Umoja Noble & Brendesha M. Tynes (eds.), The intersectional Internet: Race, Sex, Class and Culture Online (Digital Formations vol. 105), 215–228. New York: Peter Lang Publishing, Inc.Search in Google Scholar

Taibi, Hadjer & Khawla Badwan. 2022. Chronotopic translanguaging and the mobile languaging subject: Insights from an Algerian academic sojourner in the UK. Multilingua 41(3). 281–298. https://doi.org/10.1515/multi-2021-0122.Search in Google Scholar

Thurlow, Crispin. 2018. Digital discourse: Locating language in new/social media. In The SAGE Handbook of Social Media, 135–145. London: SAGE Publications Ltd.10.4135/9781473984066.n8Search in Google Scholar

Türkmen, Gülay. 2019. “But you don’t look Turkish!”: The changing face of Turkish immigration to Germany. ResetDoc. Available at: https://www.resetdoc.org/story/dont-look-turkish-changing-face-turkish-immigration-germany/.Search in Google Scholar

Voicebot.ai. 2021. Voice assistant timeline. http://voicebot.ai/voice-assistant-history-timeline/ (accessed 22 June 2021).Search in Google Scholar

Wagner, Petra, Jonas Beskow, Simon Betz, Jens Edlund, Joakim Gustafson, Gustav Eje Henter, Sébastien Le Maguer, Zofia Malisz, Éva Székely, Christina Tånnander & Jana Voße. 2019. Speech synthesis evaluation – state-of-the-art assessment and suggestion for a novel research program. In 10th ISCA Workshop on Speech Synthesis (SSW 10), 105–110. Vienna: ISCA.10.21437/SSW.2019-19Search in Google Scholar

West, Mark, Rebecca Kraut & Han Ei Chew. 2019. I’d blush if I could – closing gender divides in digital skills through education. UNESCO. Technical Report. Available at: https://unesdoc.unesco.org/ark:/48223/pf0000367416.page=1.Search in Google Scholar

Woolard, Kathryn A. 2020. Language ideology. In James Stanlaw (ed.), The International Encyclopedia of Linguistic Anthropology, 1–21. Hoboken, NJ: Wiley.10.1002/9781118786093.iela0217Search in Google Scholar

Wu, Yunhan, Daniel Rough, Anna Bleakley, Justin Edwards, Orla Cooney, Philip R. Doyle, Leigh Clark & Benjamin R. Cowan. 2020. See what I’m saying? Comparing intelligent personal assistant use for native and non-native language speakers. In 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services, 1–9. Oldenburg, Germany: ACM.10.1145/3379503.3403563Search in Google Scholar

Received: 2023-05-09

Accepted: 2024-02-06

Published Online: 2024-02-27

Published in Print: 2024-07-26

This work is licensed under the Creative Commons Attribution 4.0 International License.