Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin

Published: 01 November 2016 Publication History

Abstract

In this work, we analyze phonetic and prosodic pronunciation patterns from iCALL, a speech corpus designed to evaluate Mandarin mispronunciations by non-native speakers of European origin and to address the lack of large-scale, non-native corpora with comprehensive annotations for applications in CAPT (computer-assisted pronunciation training). iCALL consists of 90,841 utterances from 305 speakers with a total duration of 142 hours. The speakers are from diverse linguistic backgrounds (spanning Germanic, Romance, and Slavic native languages). The read utterances are phonetically balanced with phonetic, tonal, and fluency annotations. Our findings on iCALL reveal that lexical tone errors are over six times more prevalent than phonetic errors, French speakers are twice as likely to mispronounce Tone 2, 3, 4 when compared to English speakers, native Romance language speakers are more likely to make de-aspiration and aspiration mistakes, and fluency scores correlate inversely with tone and phone error rate.

References

[1]
ISO 7098:2015, Information and documentation Romanization of Chinese. standard, International Organizaton for Standardization, Geneva, Switzerland, 2015.
[2]
C. Baker, Foundations of Bilingual Education and Bilingualism, Clevedon: Multilingual Matters, 1993.
[3]
M. Campbell, M. Paquin, French Fluency: Glossika Mass Sentences, Nolsen Bedon, Ltd., 2014.
[4]
J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, The AMI meeting corpus: A pre-announcement, Springer, Machine learning for multimodal interaction, 2006.
[5]
M. Celce-Murcia, D.M. Brinton, J.M. Goodwin, Teaching Pronunciation: A reference for teachers of English to speaker of other languages, Cambridge University Press, 1996.
[6]
C.Y. Chen, A fifth tone in the Mandarin spoken in Singapore, J. Chin. Ling. (1983) 92-119.
[7]
N.F. Chen, V. Shivakumar, M. Harikumar, B. Ma, H. Li, Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages, 2013.
[8]
N.F. Chen, S.W. Tam, W. Shen, J.P. Campbell, Characterizing phonetic transformations and acoustic differences across English dialects, IEEE ACM Trans. Audio Speech Lang. Process., 22 (2014) 110-124.
[9]
N.F. Chen, R. Tong, D. Wee, P. Lee, B. Ma, H. Li, iCALL corpus: Mandarin chinese spoken by non-native speakers of European descent, 2015.
[10]
N.F. Chen, R. Tong, D. Wee, P. Lee, B. Ma, H. Li, Singakids-mandarin: Speech corpus of Singaporean children speaking Mandarin Chinese, 2016.
[11]
C.Y. Chiu, Y.F. Liao, D. Kulls, H. Mixdorff, S.l. Chen, A preliminary study on corpus design for computer-assisted German and Mandarin language learning, in:, Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on, IEEE, 2009, pp. 154-159.
[12]
D. Crystal, English worldwide, History English Language (2006) 420-439.
[13]
D. Crystal, Two thousand million?, English Today, 24 (2008) 3-6.
[14]
Da, J., last accessed, June 22, 2015. Chinese text computing.
[15]
S. Duanmu, The phonology of standard Chinese, Oxford University Press, 2007.
[16]
Ministry of Education, S., 1998. Introduction of Hanyu Pinyin at primary one from 1999.
[17]
M. Eskenazi, G.A. Levow, H. Meng, G. Parent, D. Suendermann, Crowdsourcing for speech processing: Applications to data collection, transcription and assessment, John Wiley & Sons, 2013.
[18]
C.o. Europe, Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Applied Linguistics Non Series, Cambridge University Press, 2001.
[19]
M.M. Faris, C.T. Best, M.D. Tyler, An examination of the different ways that non-native phones may be perceptually assimilated as uncategorized, J. Acoust. Soc. Am., 139 (2016) EL1-EL5.
[20]
R. Gruhn, T. Cincarek, S. Nakamura, A multi-accent non-native English database, ASJ (2004).
[21]
Gut, U., Seminar, E., 2004. The leap corpus.
[22]
F. Honig, A. Batliner, K. Weilhammer, E. Noth, Islands of failure: Employing word accent information for pronunciation quality assessment of English l2 learners, 2009.
[23]
Hsiu-Chuan, S., 2008. Hanyu Pinyin to be standard system in 2009.
[24]
S. Jarvis, A. Pavlenko, Crosslinguistic influence in language and cognition, Routledge, 2008.
[25]
P. Jyothi, M. Hasegawa-Johnson, Acquiring speech transcriptions using mismatched crowdsourcing., 2015.
[26]
M. Kramer, The Phonology of Italian, Oxford University Press, 2009.
[27]
A. Lee, N.F. Chen, J. Glass, Personalized mispronunciation detection and diagnosis based on unsupervised error pattern discovery, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016.
[28]
A. Lee, J. Glass, Mispronunciation detection without nonnative training data, 2015.
[29]
M. Lewis, G. Simons, C. (Eds.) (eds.), Ethnologue: Languages of the World, Eighteenth edition, in:, Dallas, Texas: SIL International, 2015.
[30]
K. Li, X. Qian, S. Kang, P. Liu, H. Meng, Integrating acoustic and state-transition models for free phone recognition in l2 English speech using multi-distribution deep neural networks, 2015.
[31]
W. Li, S.M. Siniscalchi, N.F. Chen, C.H. Lee, Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling, 2016.
[32]
Melzer, P., 1998. Library of congress Pinyin conversion project: New Chinese romanization guidelines.
[33]
H. Meng, Y.Y. Lo, L. Wang, W.Y. Lau, Deriving salient learners mispronunciations from cross-language phonological comparisons, Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on IEEE, 2007.
[34]
W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, C. Souter, The ISLE corpus of non-native spoken English, 2000.
[35]
N.e.a. Minematsu, Development of english speech database read by Japanese to support CALL research, 2004.
[36]
K. Nishina, Development of Japanese speech database read by non-native speakers for constructing CALL system, 2004.
[37]
J. Norman, Chinese. Cambridge Language Surveys, Cambridge University Press, 1988.
[38]
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlek, P., Qian, Y., Schwarz, P. etal., 2011. The Kaldi speech recognition toolkit.
[39]
R. Price, Education in Modern China. China: History, Philosophy, Economics, Routledge, 2005.
[40]
Z. Qin, P.P.K. Mok, Perception of Cantonese tones by Mandarin, English and French speakers, 2011.
[41]
M. Raab, R. Gruhn, E. Noeth, Non-native speech databases, Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on, IEEE, 2007.
[42]
S. Schaden, University Duisburg-Essen, 2006.
[43]
Schwartz, R., Shen, W., Campbell, J., Paget, S., Vonwiller, J., Estival, D., Cieri, C., 2007. Construction of a phonotactic dialect corpus using semiautomatic annotation. Technical Report. DTIC Document.
[44]
Shang, G., Zhao, S., 2012. Singapore Mandarin: Its positioning, internal structure and corpus planning.
[45]
C. Teixeira, I. Trancoso, A.J. Serralheiro, Recognition of non-native accents, 1997.
[46]
R. Tong, N.F. Chen, B.P. Lim, B. Ma, H. Li, Tokenizing fundamental frequency variation for Mandarin tone error detection, 2014.
[47]
R. Tong, N.F. Chen, B. Ma, H. Li, Goodness of tone (GOT) for non-native Mandarin tone recognition, 2015.
[48]
R. Tong, N.F. Chen, B. Ma, H. Li, Context aware mispronunciation detection for Mandarin pronunciation training, 2016.
[49]
R. Tong, B.P. Lim, N.F. Chen, B. Ma, H. Li, Subspace Gaussian mixture model for computer-assisted language learning, 2014.
[50]
Y. Wang, A. Jongman, J.A. Sereno, Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training, J. Acoust. Soc. Am., 113 (2003) 1033-1043.
[51]
Y.B. Wang, L.S. Lee, Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training, Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, IEEE, 2012.
[52]
M.S. Whitley, Spanish/English contrasts: A course in Spanish linguistics, Georgetown University Press, 2002.
[53]
S.M. Witt, Automatic error detection in pronunciation training: Where we are and where we need to go, Proceedings of IS ADEPT, 6 (2012).
[54]
Yan, M., 2010. 40 million people worldwide study chinese.
[55]
M. Yip, Tone, Cambridge University Press, 2002.
[56]
Z. Yu, V. Ramanarayanan, D. Suendermann-Oeft, X. Wang, K. Zechner, L. Chen, J. Tao, A. Ivanou, Y. Qian, Using bidirectional lstm recurrent neural networks to learn high-level abstractions of sequential features for automated scoring of non-native spontaneous speech, in:, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), IEEE, 2015, pp. 338-345.
[57]
Zhang, J., last accessed, June 22, 2015. Hanyu pinyin for Mandarin speakers.

Cited By

View all
  • (2022)Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphoneSpeech Communication10.1016/j.specom.2022.08.002144:C(42-56)Online publication date: 1-Oct-2022
  • (2022)RETRACTED ARTICLE: Automatic speech recognition systems: A survey of discriminative techniquesMultimedia Tools and Applications10.1007/s11042-022-13645-x82:9(13307-13339)Online publication date: 9-Sep-2022
  • (2020)Speech-Driven End-to-End Language Discrimination toward Chinese DialectsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/338902119:5(1-24)Online publication date: 1-Jun-2020
  • Show More Cited By
  1. Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Speech Communication
      Speech Communication  Volume 84, Issue C
      November 2016
      96 pages

      Publisher

      Elsevier Science Publishers B. V.

      Netherlands

      Publication History

      Published: 01 November 2016

      Author Tags

      1. Computer-assisted language learning (CALL)
      2. Database
      3. First language (L1)
      4. Linguistic resources
      5. Second language (L2)
      6. Tonal languages

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Leveraging audible and inaudible signals for pronunciation training by sensing articulation through a smartphoneSpeech Communication10.1016/j.specom.2022.08.002144:C(42-56)Online publication date: 1-Oct-2022
      • (2022)RETRACTED ARTICLE: Automatic speech recognition systems: A survey of discriminative techniquesMultimedia Tools and Applications10.1007/s11042-022-13645-x82:9(13307-13339)Online publication date: 9-Sep-2022
      • (2020)Speech-Driven End-to-End Language Discrimination toward Chinese DialectsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/338902119:5(1-24)Online publication date: 1-Jun-2020
      • (2018)Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition NetworksJournal of Signal Processing Systems10.5555/3231391.323146390:7(1077-1087)Online publication date: 1-Jul-2018
      • (2018)Interaction Challenges in AI Equipped Environments Built to Teach Foreign Languages Through Dialogue and Task-CompletionProceedings of the 2018 Designing Interactive Systems Conference10.1145/3196709.3196717(597-609)Online publication date: 8-Jun-2018

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media