Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3313831.3376322acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections

WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition

Published: 23 April 2020 Publication History


Learning to speak in foreign languages is hard. Speech shadowing has been rising as a proven way to practice speaking, which asks a learner to listen and repeat a native speech template as simultaneously as possible. However, shadowing can be hard to do because learners can frequently fail to follow the speech and unintentionally interrupt a practice session. Worse, as a technical way to evaluate shadowing performance in real-time has not been established, no automated solutions are available to help. In this paper, we propose a technical framework with context-dependent speech recognition to evaluate shadowing in real-time. We propose a shadowing tutor system called WithYou, which can automatically adjust the playback and the difficulty of a speech template when learners fail, so shadowing becomes smooth and tailored. Results from a user study show that WithYou provides greater speech improvements (14%) than the conventional method (2.7%) with a lower cognitive load.

Supplemental Material

MP4 File
MP4 File
Preview video
MP4 File
Supplemental video


Amber Bloomfield, Sarah C. Wayland, Elizabeth Rhoades, Allison Blodgett, Jared Linck, and Steven Ross. 2010. What makes listening difficult? Factors affecting second language listening comprehension. Technical Report. MARYLAND UNIV COLLEGE PARK.
Kingsley Bolton and David Graddol. 2012. English in China today: The current popularity of English in China is unprecedented, and has been fuelled by the recent political and social development of Chinese society. English Today 28, 3 (2012), 3--9.
David Coniam. 2014. The linguistic accuracy of chatbots: usability from an ESL perspective. Text & Talk 34, 5 (2014), 545--567.
Dale-Chall Readability Test 2018. Dale-Chall Readability Formula with Word List. (2018).
Febe de Wet, Christa Van der Walt, and T.R. Niesler. 2009. Automatic assessment of oral language proficiency and listening comprehension. Speech Communication 51, 10 (2009), 864--874.
Rodolfo Delmonte. 2011. Exploring speech technologies for language learning. In Speech and Language Technologies. InTech.
Gilbert Dizon. 2017. Using intelligent personal assistants for second language learning: A case study of alexa. TESOL Journal 8, 4 (2017), 811--830.
Farzad Ehsani, Jared Bernstein, Amir Najmi, and Ognjen Todic. 1997. Subarashii: Japanese interactive spoken language education. In Fifth European Conference on Speech Communication and Technology.
Farzad Ehsani and Eva Knodt. 1998. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. (1998).
ELSA. 2018. ELSA - Speak English fluently, easily, confidently. (2018).
Maxine Eskenazi. 1999. Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype. (1999).
Maxine Eskenazi, Yan Ke, Jordi Albornoz, and Katharina Probst. 2000. The fluency pronunciation trainer: Update and user issues. In Proceedings INSTiL2000, Vol. 1.
Educational Testing Service (ETS). 2017. Test and score data summary for TOEFL iBT and PBT tests: January2017--December 2017 test data. (2017).
Jennifer A Foote and Kim McDonough. 2017. Using shadowing with mobile technology to improve L2 pronunciation. Journal of Second Language Pronunciation 3, 1 (2017), 34--56.
LK Fryer and Rollo Carpenter. 2006. Bots as language learning tools. Language Learning & Technology (2006).
Siti Aisyah Ginting. 2019. Shadowing Technique; Teaching Listening Skill to ESOL Learners in University. SALTeL Journal (Southeast Asia Language Teaching and Learning) 2, 2 (2019), 83--87.
C Ray Graham, Deryle Lonsdale, Casey Kennington, Aaron Johnson, and Jeremiah McGhee. 2008. Elicited Imitation as an Oral Proficiency Measure with ASR Scoring. In LREC.
Yo Hamada. 2016a. Teaching EFL Learners Shadowing for Listening: Developing learners' bottom-up skills. Routledge.
Yo Hamada. 2016b. Wait! Is it Really Shadowing? The Language Teacher 40, 1 (2016), 14--17.
Florian Hönig, Anton Batliner, and Elmar Nöth. 2012. Automatic assessment of non-native prosody annotation, modelling and evaluation. In International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT). 21--30.
Florian Hönig, Anton Batliner, Karl Weilhammer, and Elmar Nöth. 2009. Islands of Failure: Employing word accent information for pronunciation quality assessment of English L2 learners. In International Workshop on Speech and Language Technology in Education.
Florian Hönig, Anton Batliner, Karl Weilhammer, and Elmar Nöth. 2010. Automatic assessment of non-native prosody for english as l2. In Speech Prosody 2010-Fifth International Conference.
Kun-Ting Hsieh, Da-Hui Dong, and Li-Yi Wang. 2013. A preliminary study of applying shadowing technique to English intonation instruction. Taiwan Journal of Linguistics 11, 2 (2013), 43--65.
Wenping Hu, Yao Qian, and Frank K Soong. 2013. A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In Interspeech. 1886--1890.
Wenping Hu, Yao Qian, and Frank K Soong. 2015. An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech. In SLaTE. 71--76.
Hussein Hussein, Hue San Do, Hansjörg Mixdorff, Hongwei Ding, Qianyong Gao, Guoping Hue, Si Wei, and Zhao Chao. 2011. Mandarin tone perception and production by German learners. In Speech and Language Technology in Education.
Jiyou Jia. 2004. The study of the application of a web-based chatbot system on the teaching of foreign languages. In Society for Information Technology & Teacher Education International Conference. Association for the Advancement of Computing in Education (AACE), 1201--1207.
Jiyou Jia. 2009. CSIEC: A computer assisted English learning chatbot based on textual knowledge and reasoning. Knowledge-Based Systems 22, 4 (2009), 249--255.
Yoon Kim, Horacio Franco, and Leonardo Neumeyer. 1997. Automatic pronunciation scoring of specific phone segments for language instruction. In Fifth European Conference on Speech Communication and Technology.
O Kondas. 1967. The treatment of stammering in children by the shadowing method. Behaviour Research and Therapy 5, 4 (1967), 325--329.
Akinobu Lee and Tatsuya Kawahara. 2009. Recent development of open-source speech recognition engine julius. In Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association, 2009 Annual ..., 131--137.
Ju Lin, Yanlu Xie, and Jinsong Zhang. 2016. Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures. In INTERSPEECH. 2666--2670.
Dean Luo, Naoya Shimomura, Nobuaki Minematsu, Yutaka Yamauchi, and Keikichi Hirose. 2008. Automatic pronunciation evaluation of language learners' utterances generated through shadowing. In Ninth Annual Conference of the International Speech Communication Association.
Brian Mak, Manhung Siu, Mimi Ng, Yik-Cheung Tam, Yu-Chung Chan, Kin-Wah Chan, Ka-Yee Leung, Simon Ho, Fong-Ho Chong, Jimmy Wong, and others. 2003. PLASER: pronunciation learning via automatic speech recognition. In Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-Volume 2. Association for Computational Linguistics, 23--29.
Sven L Mattys and Alan Baddeley. 2019. Working memory and second language accent acquisition. Applied Cognitive Psychology (2019).
Hansjörg Mixdorff, Daniel Külls, Hussein Hussein, Shu Gong, Guoping Hu, and Si Wei. 2009. Towards a computer-aided pronunciation training system for German learners of Mandarin. In International Workshop on Speech and Language Technology in Education.
Jack Mostow and others. 2001. Evaluating tutors that listen: An overview of Project LISTEN. (2001).
NASA-TLX 2018. The Official NASA Task Load Index (TLX). (2018).
Hamzah Md Omar and Miko Umehara. 2010. Using'A Shadowing'Technique'to Improve English Pronunciation Deficient Adult Japanese Learners: An Action Research on Expatriate Japanese Adult Learners. Journal of Asia TEFL 7, 2 (2010).
Osato Shiki, Yoko MORI, Shuhei KADOTA, and Shinsuke YOSHIDA. 2010. Exploring differences between shadowing and repeating practices: An analysis of reproduction rate and types of reproduced words. ARELE: Annual Review of English Language Education in Japan 21 (2010), 81--90.
U.S.DEPARTMENT OF STATE. 2018. FSI's Experience with Language Learning. (2018).
Pei-Hao Su, Chuan-Hsun Wu, and Lin-Shan Lee. 2015. A recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23, 1 (2015), 127--141.
Hideki Sumiyoshi. 2019. The Effect of Shadowing: Exploring the Speed Variety of Model Audio and Sound Recognition Ability in the Japanese as a Foreign Language Context. Electronic Journal of Foreign Language Teaching 16, 1 (2019), 5--21.
Kadota Syuhei. 2012. Shadoingu to ondoku to eigoshutoku no kagaku.[Science of shadowing, oral reading, and English acquisition]. Tokyo: Cosmopier.
Luis Von Ahn. 2013. Duolingo: learn a language for free while helping to translate the web. In Proceedings of the 2013 international conference on Intelligent user interfaces. ACM, 1--2.
Hongcui Wang, Christopher J Waple, and Tatsuya Kawahara. 2009. Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication 51, 10 (2009), 995--1005.
Michiko Watanabe, Keikichi Hirose, Yasuharu Den, and Nobuaki Minematsu. 2008. Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech communication 50, 2 (2008), 81--94.
Richard C Waters. 1995. The audio interactive tutor. Computer Assisted Language Learning 8, 4 (1995), 325--354.
Silke Maren Witt and others. 1999. Use of speech recognition in computer-assisted language learning. Ph.D. Dissertation. University of Cambridge Cambridge, United Kingdom.
Silke M Witt and Steve J Young. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech communication 30, 2--3 (2000), 95--108.
Fereshteh Yavari and Sajad Shafiee. 2019. Effects of Shadowing and Tracking on Intermediate EFL Learners' Oral Fluency. International Journal of Instruction 12, 1 (2019), 869--884.
Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, and Nobuaki Minematsu. 2017. Automatic scoring of shadowing speech based on DNN posteriors and their DTW. Proc. Interspeech 2017 (2017), 1422--1426.

Cited By

View all
  • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
  • (2023)Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA MethodologyLanguages10.3390/languages80402428:4(242)Online publication date: 20-Oct-2023
  • (2023)CriTrainer: An Adaptive Training Tool for Critical Paper ReadingProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606816(1-17)Online publication date: 29-Oct-2023
  • Show More Cited By

Index Terms

  1. WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition



    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors


    Published In

    cover image ACM Conferences
    CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
    April 2020
    10688 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2020


    Request permissions for this article.

    Check for updates


    • Honorable Mention

    Author Tags

    1. computer assisted language learning (call)
    2. intelligent tutoring system, language learning
    3. shadowing
    4. speaking
    5. speech recognition


    • Research-article


    CHI '20

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 27 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
    • (2023)Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA MethodologyLanguages10.3390/languages80402428:4(242)Online publication date: 20-Oct-2023
    • (2023)CriTrainer: An Adaptive Training Tool for Critical Paper ReadingProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606816(1-17)Online publication date: 29-Oct-2023
    • (2023)AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition ModelsProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582722(200-208)Online publication date: 12-Mar-2023
    • (2022)AIx speed: Playback Speed Optimization using Listening Comprehension of Speech Recognition ModelsAdjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526114.3558727(1-3)Online publication date: 29-Oct-2022
    • (2022)BeParrot: Efficient Interface for Transcribing Unclear Speech via RespeakingProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511164(832-840)Online publication date: 22-Mar-2022
    • (2021)JustSpeak: Automated, User-Configurable, Interactive Agents for Speech TutoringProceedings of the ACM on Human-Computer Interaction10.1145/34597445:EICS(1-24)Online publication date: 29-May-2021
    • (2021)Designing CAST: A Computer-Assisted Shadowing Trainer for Self-Regulated Foreign Language Listening PracticeProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445190(1-13)Online publication date: 6-May-2021
    • (2021)Communication with Non-native Speakers Through the Service of Speech-To-Speech Interpreting Systems: Testing Technology Capacity and Exploring Specialists’ ViewsServices – SERVICES 202110.1007/978-3-030-96585-3_1(1-17)Online publication date: 10-Dec-2021

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.


    HTML Format

    View this article in HTML Format.

    HTML Format






    Share this Publication link

    Share on social media