Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3544548.3581210acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Performative Vocal Synthesis for Foreign Language Intonation Practice

Published: 19 April 2023 Publication History

Abstract

Typical foreign language (L2) pronunciation training focuses mainly on individual sounds. Intonation, the patterns of pitch change across words or phrases is often neglected, despite its key role in word-level intelligibility and in the expression of attitudes and affect. This paper examines hand-controlled real-time vocal synthesis, known as Performative Vocal Synthesis (PVS), as an interaction technique for practicing L2 intonation in computer aided pronunciation training (CAPT).
We evaluate a tablet-based interface where users gesturally control the pitch of a pre-recorded utterance by drawing curves on the touchscreen. 24 subjects (12 French learners, 12 British controls) imitated English phrases with their voice and the interface. Results of an acoustic analysis and expert perceptive evaluation showed that learners’ gestural imitations yielded more accurate results than vocal imitations of the fall-rise intonation pattern typically difficult for francophones, suggesting that PVS can help learners produce intonation patterns beyond the capabilities of their natural voice.

Supplementary Material

MP4 File (3544548.3581210-talk-video.mp4)
Pre-recorded Video Presentation
MP4 File (3544548.3581210-video-preview.mp4)
Video Preview
MP4 File (3544548.3581210-video-figure.mp4)
Video Figure

References

[1]
2022. Guthman Musical Instrument Competition, 2022 Competition. https://guthman.gatech.edu/2022-competition. Accessed: 2022-09-15.
[2]
Pierre Badin, Atef Ben Youssef, Gérard Bailly, Frédéric Elisei, and Thomas Hueber. 2010. Visual articulatory feedback for phonetic correction in second language learning. In L2SW, Workshop on" Second Language Studies: Acquisition, Learning, Education and Technology. P1–10.
[3]
Paul Boersma and David Weenink. 1992-2022. Praat: doing phonetics by computer [Computer program]. Version 6.1.08, retrieved 5 December 2019 from http://www.praat.org.
[4]
Elena Boitsova, Evgeny Pyshkin, Yasuta Takako, Natalia Bogach, Iurii Lezhenin, Anton Lamtev, and Vadim Diachkov. 2018. StudyIntonation courseware kit for EFL prosody teaching. In Proceedings of the 9th International Conference on Speech Prosody. 413–417.
[5]
Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, 2021. PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
[6]
Dorothy Chun. 1998. Signal analysis software for teaching discourse intonation. Language Learning & Technology 2, 1 (1998), 74–93.
[7]
Cristina Crison, Daniel Romero, Joaquín Romero, and Rovira i Virgili. 2018. The practical application of hand gestures as a means of improving English intonation. In Proc. ISAPh 2018 International Symposium on Applied Phonetics. 45–50.
[8]
Alan Cruttenden 1997. Intonation. Cambridge University Press.
[9]
Madalena Cruz-Ferreira. 1984. Perception and interpretation of non-native intonation patterns. In Proceedings of the tenth International Congress of Phonetic Sciences. De Gruyter Mouton, 565–569.
[10]
Christophe d’Alessandro, Lionel Feugère, Sylvain Le Beux, Olivier Perrotin, and Albert Rilliard. 2014. Drawing melodies: Evaluation of chironomic singing synthesis. The Journal of the Acoustical Society of America 135, 6 (2014), 3601–3612.
[11]
Christophe d’Alessandro, Xiao Xiao, Grégoire Locqueville, and Boris Doval. 2019. Borrowed voices. In International Conference on New Interfaces for Musical Expression NIME’19. 2–2.
[12]
Kees De Bot. 1983. Visual feedback of intonation I: Effectiveness and induced practice behavior. Language and speech 26, 4 (1983), 331–350.
[13]
Tracey M Derwing and Marian J Rossiter. 2002. ESL learners’ perceptions of their pronunciation needs and strategies. System 30, 2 (2002), 155–166.
[14]
Christophe d’Alessandro, Albert Rilliard, and Sylvain Le Beux. 2011. Chironomic stylization of intonation. The Journal of the Acoustical Society of America 129, 3 (2011), 1594–1604.
[15]
Marc Evrard, Samuel Delalez, Christophe d’Alessandro, and Albert Rilliard. 2015. Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis. In Sixteenth Annual Conference of the International Speech Communication Association.
[16]
S. Sidney Fels and Geoffrey E. Hinton. 1993. Glove-talk: A neural network interface between a data-glove and a speech synthesizer. Neural Networks, IEEE Trans. on 4, 1 (1993), 2–8.
[17]
S. Sydney Fels and Geoffrey E. Hinton. 1998. Glove-Talk II-a neural-network interface which maps gestures to parallel formant speech synthesizer controls. IEEE Trans.on Neural Networks 9, 1 (Jan 1998), 205–212. https://doi.org/10.1109/72.655042
[18]
Lionel Feugère, Christophe d’Alessandro, and Boris Doval. 2013. Performative voice synthesis for edutainment in acoustic phonetics and singing: A case study using the “Cantor Digitalis”. In International Conference on Intelligent Technologies for Interactive Entertainment. Springer, 169–178.
[19]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters.Psychological bulletin 76, 5 (1971), 378.
[20]
Atsushi Fujimori, Noriko Yoshimura, and Noriko Yamane. 2015. The development of visual CALL materials for learning L2 English prosody. In Conference proceedings. ICT for language learning. libreriauniversitaria. it Edizioni, 249.
[21]
Abbas Pourhossein Gilakjani and Mohammad Reza Ahmadi. 2011. Why Is Pronunciation So Difficult to Learn?.English language teaching 4, 3 (2011), 74–83.
[22]
Pierre A Hallé, Yueh-Chin Chang, and Catherine T Best. 2004. Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of phonetics 32, 3 (2004), 395–421.
[23]
Sophie Herment and Anne Tortel. 2021. The intonation contour of non-finality revisited: implications for EFL teaching. In English pronunciation instruction: Research-based Insights, Applied Linguisics series, Anastazija Kirkova-Naskova, Alice Henderson, and Jonás Fouz-González (Eds.). John Benjamins, Amsterdam, Netherlands, 175–195.
[24]
Rebecca Hincks. 2003. Speech technologies for pronunciation feedback and evaluation. ReCALL 15, 1 (2003), 3–20.
[25]
Daniel Hirst and Albert Di Cristo. 1998. Intonation systems. A survey of Twenty Languages(1998).
[26]
Thomas Kisler, Uwe Reichel, and Florian Schiel. 2017. Multilingual processing of speech via web services. Computer Speech & Language 45 (2017), 326–347.
[27]
Grégoire Locqueville, Christophe d’Alessandro, Samuel Delalez, Boris Doval, and Xiao Xiao. 2020. Voks: Digital instruments for chironomic control of voice samples. Speech Communication 125(2020), 97–113.
[28]
Steven G McCafferty. 2006. Gesture and the materialization of second language prosody. (2006).
[29]
Murray J Munro and Tracey M Derwing. 1999. Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language learning 49(1999), 285–310.
[30]
Yishuang Ning, Zhiyong Wu, Jia Jia, Fanbo Meng, Helen Meng, and Lianhong Cai. 2015. HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4934–4938.
[31]
Martha C Pennington. 1999. Computer-aided pronunciation pedagogy: Promise, limitations, directions. Computer assisted language learning 12, 5 (1999), 427–440.
[32]
Brechtje Post, Mariapaola d’Imperio, and Carlos Gussenhoven. 2007. Fine phonetic detail and intonational meaning. In International Congress of Phonetic Science (ICPhS). 191–196.
[33]
Evgeny Pyshkin, John Blake, Anton Lamtev, Iurii Lezhenin, Artyom Zhuikov, and Natalia Bogach. 2019. Prosody training mobile application: Early design assessment and lessons learned. In 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Vol. 2. IEEE, 735–740.
[34]
R Core Team. 2020. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
[35]
Tetyana Smotrova. 2017. Making pronunciation visible: Gesture in teaching pronunciation. Tesol Quarterly 51, 1 (2017), 59–89.
[36]
Marton Soskuthy. 2021. Evaluating generalised additive mixed modelling strategies for dynamic speech analysis. Journal of Phonetics 84(2021), 101017.
[37]
Ron I Thomson and Tracey M Derwing. 2015. The effectiveness of L2 pronunciation instruction: A narrative review. Applied Linguistics 36, 3 (2015), 326–344.
[38]
Juhani Toivanen. 2007. Fall-rise intonation usage in Finnish English second language discourse. Proceedings of Fonetik 2007, TMH-QPSR, 50 (1) (2007), 85–88.
[39]
Jacolien van Rij, Martijn Wieling, R. Harald Baayen, and Hedderik van Rijn. 2020. itsadug: Interpreting Time Series and Autocorrelated Data Using GAMMs. R package version 2.4.
[40]
Gregory Ward and Julia Hirschberg. 1985. Implicating uncertainty: The pragmatics of fall-rise intonation. Language (1985), 747–776.
[41]
Martijn Wieling. 2018. Analyzing dynamic phonetic data using generalized additive mixed modeling: A tutorial focusing on articulatory differences between L1 and L2 speakers of English. Journal of Phonetics 70(2018), 86–116.
[42]
S. N. Wood. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73, 1 (2011), 3–36.
[43]
S. N. Wood. 2017. Generalized Additive Models: An Introduction with R (2 ed.). Chapman and Hall/CRC.
[44]
Xiao Xiao, Nicolas Audibert, Grégoire Locqueville, Christophe d’Alessandro, Barbara Kuhnert, and Claire Pillot-Loiseau. 2021. Prosodic Disambiguation Using Chironomic Stylization of Intonation with Native and Non-Native Speakers. In Interspeech 2021. ISCA, 516–520.
[45]
Xiao Xiao, Grégoire Locqueville, Christophe d’Alessandro, and Boris Doval. 2019. T-Voks: the singing and speaking theremin. In NIME 2019 International Conference on New Interfaces for Musical Expression. 110–115.
[46]
Johan ’t Hart. 1981. Differential sensitivity to pitch distance, particularly in speech. The Journal of the Acoustical Society of America 69, 3 (1981), 811–821.

Cited By

View all
  • (2024)Tuning In to Intangibility : Reflections from My First 3 Years of Theremin LearningProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661584(2649-2659)Online publication date: 1-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
April 2023
14911 pages
ISBN:9781450394215
DOI:10.1145/3544548
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 April 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CAPT
  2. gesture
  3. intonation
  4. language learning
  5. performative vocal synthesis
  6. prosody
  7. vocal synthesis

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)136
  • Downloads (Last 6 weeks)18
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Tuning In to Intangibility : Reflections from My First 3 Years of Theremin LearningProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661584(2649-2659)Online publication date: 1-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media