Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2993148.2993155acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Language proficiency assessment of English L2 speakers based on joint analysis of prosody and native language

Published: 31 October 2016 Publication History

Abstract

In this work, we present an in-depth analysis of the interdependency between the non-native prosody and the native language (L1) of English L2 speakers, as separately investigated in the Degree of Nativeness Task and the Native Language Task of the INTERSPEECH 2015 and 2016 Computational Paralinguistics ChallengE (ComParE). To this end, we propose a multi-task learning scheme based on auxiliary attributes for jointly learning the tasks of L1 classification and prosody score regression. The effectiveness of this scheme is demonstrated in extensive experimental runs, comparing various standardised feature sets of prosodic, cepstral, spectral, and voice quality descriptors, as well as automatic feature selection. In the result, we show that the prediction of both prosody score and L1 can be improved by considering both tasks in a holistic way. In particular, we achieve an 11% relative gain in regression performance (Spearman's correlation coefficient) on prosody scores, when comparing the best multi- and single-task learning results.

References

[1]
D. Abercrombie. Elements of general phonetics, volume 203. Edinburgh University Press Edinburgh, Chicago, IL, 1967.
[2]
L. M. Arslan and J. H. Hansen. Language accent classification in american english. Speech Communication, 18(4):353 367, 1996.
[3]
F. Biadsy. Automatic dialect and accent recognition and its application to speech recognition. PhD thesis, Columbia University, 2011.
[4]
E. Coutinho, F. Honig, Y. Zhang, S. Hantke, A. Batliner, E. Nöth, and B. Schuller. Assessing the prosody of non-native speakers of english: Measures and feature sets. In Proc. of LREC, pages 1328-1332, Portorou, Slovenia, 2016. ELRA.
[5]
S. B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Transactions on Acoustics, Speech and Signal Processing, 28(4):357-366, 1980.
[6]
A. De Swaan. Words of the world: The global language system. John Wiley & Sons, 2013.
[7]
F. Eyben, F. Weningcr, F. Groß, and B. Schuller. Recent developments in openSMILE, the Munich open-source multimedia feature extractor. In Proc. of ACM Multimedia, pages 835-838, Barcelona, Spain, 2013. ACM.
[8]
F. Eyben, M. Wöllmer, and B. Schuller. openSMILE The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. of ACM Multimedia, pages 1459-1462, Florence, Italy, 2010. ACM.
[9]
E. Grabe and E. L. Low. Durational variability in speech and the rhythm class hypothesis. Papers in laboratory phonology, 7, 2002.
[10]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11:10-18, 2009.
[11]
F. Hönig, A. Batliner, and E. Nöth. Automatic assessment of non-native prosody - annotation, modelling and evaluation. In Proc. of the International Symposium on Automatic Detection of Errors in Pronunciation Training, pages 21-30, Stockholm, 2012.
[12]
F. Hönig, A. Batliner, K. Weilhammer, and E. Nöth. Islands of failure: Employing word accent information for pronunciation quality assessment of english L2 learners. In Proc. of SLATE, Wroxall Abbey, 2009.
[13]
F. Hönig, T. Bocklet, K. Riedhammer, A. Batliner, and E. Nöth. The automatic assessment of non-native prosody: Combining classical prosodic analysis with acoustic modelling. In Proc. of Interspeech, pages 823-826, Portland Oregon, USA, 2012. ISCA.
[14]
L. Kat and P. Fung. Fast accent identification and accented speech recognition. In Proc. of ICASSP, volume 1, pages 221-224, Phoenix, Arizona, 1999. IEEE.
[15]
J. Lopes, I. Trancoso, and A. Abad. A nativeness classifier for ted talks. In Proc. of ICASSP, pages 5672-5675, Prague, Czech Republic, 2011. IEEE.
[16]
A. Neri, C. Cucchiarini, and H. Strik. ASR-based corrective feedback on pronunciation: does it really work? In Proc. of Interspeech, pages 1818-1821, Pittsburgh, PA, 2006. ISCA.
[17]
M. Piat, D. Fohr, and I. Illina. Foreign accent identification based on prosodic parameters. In Proc. of Interspeech, pages 759-762, Brisbane, Australia, 2008. ISCA.
[18]
T. Piske, I. R. MacKay, and J. E. Flege. Factors affecting degree of foreign accent in an l2: A review. Journal of phonetics, 29(2):191-215, 2001.
[19]
J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regulariued likelihood methods. In Advances in large margin classifiers, pages 61-74. MIT Press, Cambridge, MA, 1999.
[20]
F. Ramus. Acoustic correlates of linguistic rhythm: Perspectives. In Proc. of Speech Prosody, pages 115-120, Aix-en-Provence, 2002.
[21]
J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine learning, 85(3):333-359, 2011.
[22]
B. Schuller and A. Batliner. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. Wiley, 2013.
[23]
B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Hönig, J. R. Orozco-Arroyave, E. Höth, Y. Zhang, and F. Weninger. The INTERSPEECH 2015 Computational Paralinguistics Challenge: Degree of Nativeness, Parkinson's & Eating Condition. In Proc. of Interspeech, pages 478-482, Dresden, Germany, 2015. ISCA.
[24]
B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, and K. Evanini. The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception & Sincerity. In Proc. Interspeech, San Francsico, CA, 2016. ISCA. 2001-2005.
[25]
C. Teixeira, H. Franco, E. Shriberg, K. Precoda, and M. K. Sönmez. Prosodic features for automatic text-independent evaluation of degree of nativeness for language learners. In Proc. of Interspeech, Beijing, 2000. 187-190.
[26]
J. Tepperman, T. Stanley, K. Hacioglu, and B. Pellom. Testing suprasegmental english through parroting. In Proc. of Speech Prosody, Chicago, IL, 2010.
[27]
F. Weninger, F. Eyben, B. Schuller, M. Mortillaro, and K. R. Scherer. On the acoustics of emotion in audio: What speech, music and sound have in common. Frontiers in Emotion Science, 4(292):1-12, 2013.
[28]
S. M. Witt. Use of speech recognition in computer-assisted language learning. Ph.D. dissertation, University of Cambridge, 1999.
[29]
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.
[30]
Y. Zhang, F. Weninger, Z. Ren, and B. Schuller. Sincerity and deception in speech: Two sides of the same coin? A transfer- and multi-task learning perspective. In Proc. of Interspeech, pages 2041-2045, San Francisco, CA, 2016. ISCA.
[31]
Y. Zhang, Y. Zhou, J. Shen, and B. Schuller. Semi-autonomous data enrichment based on cross-task labelling of missing targets for holistic speech analysis. In Proc. of ICASSP, pages 6090-6094, Shanghai, P. R. China, 2016. IEEE.

Cited By

View all
  • (2022)Holistic Affect Recognition Using PaNDA: Paralinguistic Non-Metric Dimensional AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2019.296188113:2(769-780)Online publication date: 1-Apr-2022
  • (2021)Predicting Group Work Performance from Physical Handwriting Features in a Smart English ClassroomProceedings of the 2021 5th International Conference on Digital Signal Processing10.1145/3458380.3458404(140-145)Online publication date: 26-Feb-2021
  • (2020)A Generic Human–Machine Annotation Framework Based on Dynamic Cooperative LearningIEEE Transactions on Cybernetics10.1109/TCYB.2019.290149950:3(1230-1239)Online publication date: Mar-2020
  • Show More Cited By

Index Terms

  1. Language proficiency assessment of English L2 speakers based on joint analysis of prosody and native language

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
    October 2016
    605 pages
    ISBN:9781450345569
    DOI:10.1145/2993148
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Cross-Task Labelling
    2. Feature Evaluation
    3. Native Language Identification
    4. Non-Native Prosody

    Qualifiers

    • Short-paper

    Funding Sources

    • EU FP HORIZON2020

    Conference

    ICMI '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Holistic Affect Recognition Using PaNDA: Paralinguistic Non-Metric Dimensional AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2019.296188113:2(769-780)Online publication date: 1-Apr-2022
    • (2021)Predicting Group Work Performance from Physical Handwriting Features in a Smart English ClassroomProceedings of the 2021 5th International Conference on Digital Signal Processing10.1145/3458380.3458404(140-145)Online publication date: 26-Feb-2021
    • (2020)A Generic Human–Machine Annotation Framework Based on Dynamic Cooperative LearningIEEE Transactions on Cybernetics10.1109/TCYB.2019.290149950:3(1230-1239)Online publication date: Mar-2020
    • (2018)Multimodal Negative-Attitude Recognition Toward Automatic Conflict-Scene Detection in Negotiation DialogSocial Computing and Social Media. Technologies and Analytics10.1007/978-3-319-91485-5_21(268-278)Online publication date: 31-May-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media