research-article

WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition

Authors:

Takashi Miyaki,

Jun RekimotoAuthors Info & Claims

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Pages 1 - 12

https://doi.org/10.1145/3313831.3376322

Published: 23 April 2020 Publication History

Abstract

Learning to speak in foreign languages is hard. Speech shadowing has been rising as a proven way to practice speaking, which asks a learner to listen and repeat a native speech template as simultaneously as possible. However, shadowing can be hard to do because learners can frequently fail to follow the speech and unintentionally interrupt a practice session. Worse, as a technical way to evaluate shadowing performance in real-time has not been established, no automated solutions are available to help. In this paper, we propose a technical framework with context-dependent speech recognition to evaluate shadowing in real-time. We propose a shadowing tutor system called WithYou, which can automatically adjust the playback and the difficulty of a speech template when learners fail, so shadowing becomes smooth and tailored. Results from a user study show that WithYou provides greater speech improvements (14%) than the conventional method (2.7%) with a lower cognitive load.

Supplemental Material

MP4 File

Download
42.06 MB

MP4 File

Preview video

Download
7.81 MB

MP4 File

Supplemental video

Download
19.46 MB

References

[1]

Amber Bloomfield, Sarah C. Wayland, Elizabeth Rhoades, Allison Blodgett, Jared Linck, and Steven Ross. 2010. What makes listening difficult? Factors affecting second language listening comprehension. Technical Report. MARYLAND UNIV COLLEGE PARK.

[2]

Kingsley Bolton and David Graddol. 2012. English in China today: The current popularity of English in China is unprecedented, and has been fuelled by the recent political and social development of Chinese society. English Today 28, 3 (2012), 3--9.

[3]

David Coniam. 2014. The linguistic accuracy of chatbots: usability from an ESL perspective. Text & Talk 34, 5 (2014), 545--567.

[4]

Dale-Chall Readability Test 2018. Dale-Chall Readability Formula with Word List. (2018). http://www.readabilityformulas.com/free-dale-challtest.php.

[5]

Febe de Wet, Christa Van der Walt, and T.R. Niesler. 2009. Automatic assessment of oral language proficiency and listening comprehension. Speech Communication 51, 10 (2009), 864--874.

Digital Library

[6]

Rodolfo Delmonte. 2011. Exploring speech technologies for language learning. In Speech and Language Technologies. InTech.

[7]

Gilbert Dizon. 2017. Using intelligent personal assistants for second language learning: A case study of alexa. TESOL Journal 8, 4 (2017), 811--830.

[8]

Farzad Ehsani, Jared Bernstein, Amir Najmi, and Ognjen Todic. 1997. Subarashii: Japanese interactive spoken language education. In Fifth European Conference on Speech Communication and Technology.

[9]

Farzad Ehsani and Eva Knodt. 1998. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. (1998).

[10]

ELSA. 2018. ELSA - Speak English fluently, easily, confidently. https://elsaspeak.com/home. (2018).

[11]

Maxine Eskenazi. 1999. Using automatic speech processing for foreign language pronunciation tutoring: Some issues and a prototype. (1999).

[12]

Maxine Eskenazi, Yan Ke, Jordi Albornoz, and Katharina Probst. 2000. The fluency pronunciation trainer: Update and user issues. In Proceedings INSTiL2000, Vol. 1.

[13]

Educational Testing Service (ETS). 2017. Test and score data summary for TOEFL iBT and PBT tests: January2017--December 2017 test data. (2017).

[14]

Jennifer A Foote and Kim McDonough. 2017. Using shadowing with mobile technology to improve L2 pronunciation. Journal of Second Language Pronunciation 3, 1 (2017), 34--56.

[15]

LK Fryer and Rollo Carpenter. 2006. Bots as language learning tools. Language Learning & Technology (2006).

[16]

Siti Aisyah Ginting. 2019. Shadowing Technique; Teaching Listening Skill to ESOL Learners in University. SALTeL Journal (Southeast Asia Language Teaching and Learning) 2, 2 (2019), 83--87.

[17]

C Ray Graham, Deryle Lonsdale, Casey Kennington, Aaron Johnson, and Jeremiah McGhee. 2008. Elicited Imitation as an Oral Proficiency Measure with ASR Scoring. In LREC.

[18]

Yo Hamada. 2016a. Teaching EFL Learners Shadowing for Listening: Developing learners' bottom-up skills. Routledge.

[19]

Yo Hamada. 2016b. Wait! Is it Really Shadowing? The Language Teacher 40, 1 (2016), 14--17.

[20]

Florian Hönig, Anton Batliner, and Elmar Nöth. 2012. Automatic assessment of non-native prosody annotation, modelling and evaluation. In International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT). 21--30.

[21]

Florian Hönig, Anton Batliner, Karl Weilhammer, and Elmar Nöth. 2009. Islands of Failure: Employing word accent information for pronunciation quality assessment of English L2 learners. In International Workshop on Speech and Language Technology in Education.

[22]

Florian Hönig, Anton Batliner, Karl Weilhammer, and Elmar Nöth. 2010. Automatic assessment of non-native prosody for english as l2. In Speech Prosody 2010-Fifth International Conference.

[23]

Kun-Ting Hsieh, Da-Hui Dong, and Li-Yi Wang. 2013. A preliminary study of applying shadowing technique to English intonation instruction. Taiwan Journal of Linguistics 11, 2 (2013), 43--65.

[24]

Wenping Hu, Yao Qian, and Frank K Soong. 2013. A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In Interspeech. 1886--1890.

[25]

Wenping Hu, Yao Qian, and Frank K Soong. 2015. An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech. In SLaTE. 71--76.

[26]

Hussein Hussein, Hue San Do, Hansjörg Mixdorff, Hongwei Ding, Qianyong Gao, Guoping Hue, Si Wei, and Zhao Chao. 2011. Mandarin tone perception and production by German learners. In Speech and Language Technology in Education.

[27]

Jiyou Jia. 2004. The study of the application of a web-based chatbot system on the teaching of foreign languages. In Society for Information Technology & Teacher Education International Conference. Association for the Advancement of Computing in Education (AACE), 1201--1207.

[28]

Jiyou Jia. 2009. CSIEC: A computer assisted English learning chatbot based on textual knowledge and reasoning. Knowledge-Based Systems 22, 4 (2009), 249--255.

Digital Library

[29]

Yoon Kim, Horacio Franco, and Leonardo Neumeyer. 1997. Automatic pronunciation scoring of specific phone segments for language instruction. In Fifth European Conference on Speech Communication and Technology.

[30]

O Kondas. 1967. The treatment of stammering in children by the shadowing method. Behaviour Research and Therapy 5, 4 (1967), 325--329.

[31]

Akinobu Lee and Tatsuya Kawahara. 2009. Recent development of open-source speech recognition engine julius. In Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association, 2009 Annual ..., 131--137.

[32]

Ju Lin, Yanlu Xie, and Jinsong Zhang. 2016. Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures. In INTERSPEECH. 2666--2670.

[33]

Dean Luo, Naoya Shimomura, Nobuaki Minematsu, Yutaka Yamauchi, and Keikichi Hirose. 2008. Automatic pronunciation evaluation of language learners' utterances generated through shadowing. In Ninth Annual Conference of the International Speech Communication Association.

[34]

Brian Mak, Manhung Siu, Mimi Ng, Yik-Cheung Tam, Yu-Chung Chan, Kin-Wah Chan, Ka-Yee Leung, Simon Ho, Fong-Ho Chong, Jimmy Wong, and others. 2003. PLASER: pronunciation learning via automatic speech recognition. In Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing-Volume 2. Association for Computational Linguistics, 23--29.

Digital Library

[35]

Sven L Mattys and Alan Baddeley. 2019. Working memory and second language accent acquisition. Applied Cognitive Psychology (2019).

[36]

Hansjörg Mixdorff, Daniel Külls, Hussein Hussein, Shu Gong, Guoping Hu, and Si Wei. 2009. Towards a computer-aided pronunciation training system for German learners of Mandarin. In International Workshop on Speech and Language Technology in Education.

[37]

Jack Mostow and others. 2001. Evaluating tutors that listen: An overview of Project LISTEN. (2001).

[38]

NASA-TLX 2018. The Official NASA Task Load Index (TLX). (2018). https://humansystems.arc.nasa.gov/groups/TLX/.

[39]

Hamzah Md Omar and Miko Umehara. 2010. Using'A Shadowing'Technique'to Improve English Pronunciation Deficient Adult Japanese Learners: An Action Research on Expatriate Japanese Adult Learners. Journal of Asia TEFL 7, 2 (2010).

[40]

Osato Shiki, Yoko MORI, Shuhei KADOTA, and Shinsuke YOSHIDA. 2010. Exploring differences between shadowing and repeating practices: An analysis of reproduction rate and types of reproduced words. ARELE: Annual Review of English Language Education in Japan 21 (2010), 81--90.

[41]

U.S.DEPARTMENT OF STATE. 2018. FSI's Experience with Language Learning. https://www.state.gov/m/fsi/sls/c78549.htm. (2018).

[42]

Pei-Hao Su, Chuan-Hsun Wu, and Lin-Shan Lee. 2015. A recursive dialogue game for personalized computer-aided pronunciation training. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23, 1 (2015), 127--141.

Digital Library

[43]

Hideki Sumiyoshi. 2019. The Effect of Shadowing: Exploring the Speed Variety of Model Audio and Sound Recognition Ability in the Japanese as a Foreign Language Context. Electronic Journal of Foreign Language Teaching 16, 1 (2019), 5--21.

[44]

Kadota Syuhei. 2012. Shadoingu to ondoku to eigoshutoku no kagaku.[Science of shadowing, oral reading, and English acquisition]. Tokyo: Cosmopier.

[45]

Luis Von Ahn. 2013. Duolingo: learn a language for free while helping to translate the web. In Proceedings of the 2013 international conference on Intelligent user interfaces. ACM, 1--2.

Digital Library

[46]

Hongcui Wang, Christopher J Waple, and Tatsuya Kawahara. 2009. Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication 51, 10 (2009), 995--1005.

Digital Library

[47]

Michiko Watanabe, Keikichi Hirose, Yasuharu Den, and Nobuaki Minematsu. 2008. Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech communication 50, 2 (2008), 81--94.

[48]

Richard C Waters. 1995. The audio interactive tutor. Computer Assisted Language Learning 8, 4 (1995), 325--354.

[49]

Silke Maren Witt and others. 1999. Use of speech recognition in computer-assisted language learning. Ph.D. Dissertation. University of Cambridge Cambridge, United Kingdom.

[50]

Silke M Witt and Steve J Young. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech communication 30, 2--3 (2000), 95--108.

[51]

Fereshteh Yavari and Sajad Shafiee. 2019. Effects of Shadowing and Tracking on Intermediate EFL Learners' Oral Fluency. International Journal of Instruction 12, 1 (2019), 869--884.

[52]

Junwei Yue, Fumiya Shiozawa, Shohei Toyama, Yutaka Yamauchi, Kayoko Ito, Daisuke Saito, and Nobuaki Minematsu. 2017. Automatic scoring of shadowing speech based on DNN posteriors and their DTW. Proc. Interspeech 2017 (2017), 1422--1426.

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
Farrús M(2023)Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA MethodologyLanguages10.3390/languages80402428:4(242)Online publication date: 20-Oct-2023
https://doi.org/10.3390/languages8040242
Yuan KLin HCao SPeng ZGuo QMa X(2023)CriTrainer: An Adaptive Training Tool for Critical Paper ReadingProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606816(1-17)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606816
Show More Cited By

Index Terms

WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques
      1. Auditory feedback

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction

Speech intelligibility is the most important parameter in evaluation of speech quality. In the contribution, a new objective intelligibility assessment of general speech processing algorithms is proposed. It is based on automatic recognition methods ...
WithYou: An Interactive Shadowing Coach with Speech Recognition
UIST '16 Adjunct: Adjunct Proceedings of the 29th Annual ACM Symposium on User Interface Software and Technology

Speech shadowing, in which the subject listens to native narration sound and tries to repeat it immediately while listening, is a proven way of practicing speaking skills when learning foreign languages. However, since the narration is independent of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

April 2020

10688 pages

ISBN:9781450367080

DOI:10.1145/3313831

General Chairs:
Regina Bernhaupt
Eindhoven University of Technology, Netherlands
,
Florian 'Floyd' Mueller
Monash University, Australia
,
David Verweij
Newcastle University, UK
,
Josh Andres
RMIT, Australia
,
Program Chairs:
Joanna McGrenere
University of British Columbia, Canada
,
Andy Cockburn
University of Canterbury, New Zealand
,
Ignacio Avellino
University of Maryland Baltimore County, USA
,
Alix Goguey
Grenoble Alpes University, France
,
Pernille Bjørn
University of Copenhagen, Denmark
,
Shengdong (Shen) Zhao
National University of Singapore, Singapore
,
Briane Paul Samson
Future University Hakodate, Japan & De La Salle University, Philippines
,
Rafal Kocielnik
University of Washington, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Honorable Mention

Author Tags

Qualifiers

Research-article

Conference

CHI '20

Sponsor:

SIGCHI

CHI '20: CHI Conference on Human Factors in Computing Systems

April 25 - 30, 2020

HI, Honolulu, USA

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
872
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)7

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chiba MYamada WOchiai K(2024)Shadowed Speech: An Approach for Slowing Speech Rate Using Adaptive Delayed Auditory FeedbackJournal of Information Processing10.2197/ipsjjip.32.93832(938-947)Online publication date: 2024
https://doi.org/10.2197/ipsjjip.32.938
Farrús M(2023)Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA MethodologyLanguages10.3390/languages80402428:4(242)Online publication date: 20-Oct-2023
https://doi.org/10.3390/languages8040242
Yuan KLin HCao SPeng ZGuo QMa X(2023)CriTrainer: An Adaptive Training Tool for Critical Paper ReadingProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606816(1-17)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606816
Kawamura KRekimoto JWard JMcGill MMarky K(2023)AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition ModelsProceedings of the Augmented Humans International Conference 202310.1145/3582700.3582722(200-208)Online publication date: 12-Mar-2023
https://dl.acm.org/doi/10.1145/3582700.3582722
Kawamura KRekimoto J(2022)AIx speed: Playback Speed Optimization using Listening Comprehension of Speech Recognition ModelsAdjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526114.3558727(1-3)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3526114.3558727
Arakawa RYakura HGoto M(2022)BeParrot: Efficient Interface for Transcribing Unclear Speech via RespeakingProceedings of the 27th International Conference on Intelligent User Interfaces10.1145/3490099.3511164(832-840)Online publication date: 22-Mar-2022
https://dl.acm.org/doi/10.1145/3490099.3511164
Zhang XMiyaki TRekimoto J(2021)JustSpeak: Automated, User-Configurable, Interactive Agents for Speech TutoringProceedings of the ACM on Human-Computer Interaction10.1145/34597445:EICS(1-24)Online publication date: 29-May-2021
https://dl.acm.org/doi/10.1145/3459744
Reza MYoon DKitamura YQuigley AIsbister KIgarashi TBjørn PDrucker S(2021)Designing CAST: A Computer-Assisted Shadowing Trainer for Self-Regulated Foreign Language Listening PracticeProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445190(1-13)Online publication date: 6-May-2021
https://dl.acm.org/doi/10.1145/3411764.3445190
Atabekova A(2021)Communication with Non-native Speakers Through the Service of Speech-To-Speech Interpreting Systems: Testing Technology Capacity and Exploring Specialists’ ViewsServices – SERVICES 202110.1007/978-3-030-96585-3_1(1-17)Online publication date: 10-Dec-2021
https://dl.acm.org/doi/10.1007/978-3-030-96585-3_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten