An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Ying Qin ORCID: orcid.org/0000-0003-4606-7174¹,
Yuzhong Wu¹,
Tan Lee¹ &
…
Anthony Pak Hin Kong²

740 Accesses
17 Citations
Explore all metrics

Abstract

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The 2-layer Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from basic speech features to the classification outcome. The pathology-specific features used for assessment are learned implicitly by the neural network model. The Class Activation Mapping (CAM) method is utilized to visualize how the learned features contribute to the assessment result. Experimental results show that the end-to-end approach can achieve comparable performance to the conventional two-step approach in the classification task, and the CNN model is able to learn impairment-related features that are similar to the hand-crafted features. The experimental results also indicate that CNN model performs better than 2-layer GRU model in this specific task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimal hybrid AI-ResNet for accurate severity detection and classification of patients with aphasia disorder

Article 11 August 2023

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Article 22 February 2024

Application of Machine Learning Algorithms to Disordered Speech

References

Benson, D.F., Benson, D.F., Ardila, A. (1996). Aphasia: A Clinical Perspective, (pp. 89–98). Oxford: Oxford University Press.
Google Scholar
Adam, H. (2014). Dysprosody in aphasia: an acoustic analysis evidence from palestinian arabic. Journal of Language and Linguistic Studies, 10(1), 153–162.
Google Scholar
National Aphasia Association. (2018). Aphasia definitions. https://www.aphasia.org/aphasia-definitions/, accessed 9 August 2018.
Wikipedia contributors. (2018a). Aphasia — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Aphasia, accessed 10 September 2018.
Wikipedia Contributors. (2018). Anomic aphasia — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Anomic_aphasia, accessed 10 September 2018.
Peintner, B., Jarrold, W., Vergyri, D., Richey, C., Tempini, M.L.G., Ogar, J. (2008). Learning diagnostic models using speech and language measures. In Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS) (pp. 4648–4651). Vancouver: IEEE.
Fraser, K.C., Rudzicz, F., Rochon, E. (2013). Using text and acoustic features to diagnose progressive aphasia and its subtypes. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 2177–2181). Lyon: ISCA.
Fraser, K.C., Meltzer, J.A., Graham, N.L., Leonard, C., Hirst, G., Black, S.E., Rochon, E. (2014). Automated classification of primary progressive aphasia subtypes from narrative speech transcripts. Cortex, 55, 43–60.
Article Google Scholar
Fraser, K.C., Rudzicz, F., Graham, N., Rochon, E. (2013). Automatic speech recognition in the diagnosis of primary progressive aphasia. In Proceedings of the 4th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (pp. 47–54). Grenoble: ACL/ISCA Special Interest Group.
Le, D., & Provost, E.M. (2016). Improving automatic recognition of aphasic speech with Aphasiabank. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 2681–2685). San Francisco: ISCA.
Le, D., Licata, K., Provost, E.M. (2018). Automatic quantitative analysis of spontaneous aphasic speech. Speech Communication, 100, 1–12.
Kohlschein, C., Klischies, D., Meisen, T., Schuller, B.W., Werner, C.J. (2018). Automatic processing of clinical aphasia data collected during diagnosis sessions: challenges and prospects. In Proceedings of resources and processing of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric impairments (RaPID-2), satellite of the 11th Language Resources and Evaluation Conference (LREC) (pp. 11–18). Miyazaki: ELRA.
Lee, T., Kong, A., Chan, V., Wang, H. (2013). Analysis of auto-aligned and auto-segmented oral discourse by speakers with aphasia: a preliminary study on the acoustic parameter of duration. Procedia, Social and Behavioral Sciences, 94, 71–72.
Article Google Scholar
Lee, T., Liu, Y., Huang, P.W., Chien, J.T., Lam, W.K., Yeung, Y.T., Law, T.K., Lee, K.Y., Kong, A.P.H., Law, S.P. (2016). Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6475–6479). Shanghai: IEEE.
Qin, Y., Lee, T., Kong, A.P.H. (2018a). Automatic speech assessment for aphasic patients based on syllable-level embedding and supra-segmental duration features. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 5994–5998). Calgary: IEEE.
Qin, Y., Lee, T., Feng, S., Kong, A.P.H. (2018b). Automatic speech assessment for people with aphasia using TDNN-BLSTM with multi-task learning. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 3418–3422). Hyderabad: ISCA.
Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of International Conference on Machine Learning (ICML) (pp. 1764–1772). Beijing: IMLS.
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:cs.CL/1609.08144.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y. (1259). On the properties of neural machine translation: Encoder-decoder approaches. arXiv:cs.CL/1409.
Tang, Y., Huang, Y., Wu, Z., Meng, H., Xu, M., Cai, L. (2016). Question detection from acoustic features using recurrent neural network with gated recurrent unit. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 6125–6129). Shanghai: IEEE.
Rana, R. (2016). Gated Recurrent Unit (GRU) for emotion classification from noisy speech. arXiv:cs.HC/1612.07778.
Chung, H., Lee, Y.K., Lee, S.J., Park, J.G. (2017). Spoken english fluency scoring using convolutional neural networks. In Proceedings of Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) (pp. 1–6). Seoul: COCOSDA.
Vásquez-Correa, J., Orozco-Arroyave, J.R., Nöth, E. (2017). Convolutional neural network to model articulation impairments in patients with Parkinson’s disease. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 314–318). Stockholm: ISCA.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2921–2929). Las Vegas: IEEE.
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of International Conference on Computer Vision (ICCV) (pp. 618–626). Venice: CVF.
Wu, Y., & Lee, T. (2019). Enhancing sound texture in CNN-based acoustic scene classification. arXiv:cs.SD/1901.01502.
Kong, A.P.H., & Law, S.P. (2018). Cantonese Aphasiabank: An annotated database of spoken discourse and co-verbal gestures by healthy and language-impaired native cantonese speakers. Behavior research methods, pp. 1–14.
MacWhinney, B., Fromm, D., Forbes, M., Holland, A. (2011). Aphasiabank: Methods for studying discourse. Aphasiology, 25(11), 1286–1307.
Article Google Scholar
Kong, A.P.H., Law, S.P., Kwan, C.C.Y., Lai, C., Lam, V. (2015). A coding system with independent annotations of gesture forms and functions during verbal communication: Development of a database of speech and gesture (DoSaGE). Journal of Nonverbal Behavior, 39(1), 93–111.
Article Google Scholar
Yiu, E.M. (1992). Linguistic assessment of Chinese-speaking aphasics: Development of a Cantonese Aphasia Battery. Journal of Neurolinguistics, 7(4), 379–424.
Article Google Scholar
Kong, A.P.H. (2016). Analysis of neurogenic disordered discourse production: From theory to practice. Routledge.
Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of Annual Conference on Neural Information Processing Systems (NIPS) (pp. 1097–1105). Lake Tahoe: NIPS Foundation.
Simonyan, K., & Zisserman, A. (1556). Very deep convolutional networks for large-scale image recognition. arXiv:cs.CV/1409.
Lin, M., Chen, Q., Yan, S. (2013). Network in network. arXiv:cs.NE/1312.4400.
Kingma, DP, & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:cs.LG/1412.6980.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017). Automatic differentiation in pytorch In Conference on Neural Information Processing Systems Workshop (NIPS-w), NIPS Foundation, Long Beach.
Vuk, M., & Curk, T. (2006). ROC Curve, lift chart and calibration plot. Metodoloski zvezki, 3(1), 89.
Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Article MathSciNet Google Scholar
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157.
Article Google Scholar
Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by a GRF project grant (Ref: CUHK14227216) from the Hong Kong Research Grants Council, a Direct Grant from the CUHK Research Committee, the CUHK Research Sustainability Fund, and the CUHK Shenzhen Research Institute. The Cantonese AphasiaBank project was supported by a fund from the National Institutes of Health (project number: NIH-R01-DC010398).

Author information

Authors and Affiliations

Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Ying Qin, Yuzhong Wu & Tan Lee
School of Communication Sciences and Disorders, University of Central Florida, Orlando, FL, USA
Anthony Pak Hin Kong

Authors

Ying Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Tan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Pak Hin Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Qin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Y., Wu, Y., Lee, T. et al. An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia. J Sign Process Syst 92, 819–830 (2020). https://doi.org/10.1007/s11265-019-01511-3

Download citation

Received: 15 February 2019
Revised: 03 October 2019
Accepted: 01 December 2019
Published: 18 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11265-019-01511-3

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An optimal hybrid AI-ResNet for accurate severity detection and classification of patients with aphasia disorder

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Application of Machine Learning Algorithms to Disordered Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An optimal hybrid AI-ResNet for accurate severity detection and classification of patients with aphasia disorder

Variable STFT Layered CNN Model for Automated Dysarthria Detection and Severity Assessment Using Raw Speech

Application of Machine Learning Algorithms to Disordered Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation