Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Conventional automatic assessment of pathological speech usually follows two main steps: (1) extraction of pathology-specific features; (2) classification or regression on extracted features. Given the great variety of speech and language disorders, feature design is never a straightforward task, and yet it is most crucial to the performance of assessment. This paper presents an end-to-end approach to automatic speech assessment for Cantonese-speaking People With Aphasia (PWA). The assessment is formulated as a binary classification task to discriminate PWA with high scores of subjective assessment from those with low scores. The 2-layer Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) models are applied to realize the end-to-end mapping from basic speech features to the classification outcome. The pathology-specific features used for assessment are learned implicitly by the neural network model. The Class Activation Mapping (CAM) method is utilized to visualize how the learned features contribute to the assessment result. Experimental results show that the end-to-end approach can achieve comparable performance to the conventional two-step approach in the classification task, and the CNN model is able to learn impairment-related features that are similar to the hand-crafted features. The experimental results also indicate that CNN model performs better than 2-layer GRU model in this specific task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9

Similar content being viewed by others

References

  1. Benson, D.F., Benson, D.F., Ardila, A. (1996). Aphasia: A Clinical Perspective, (pp. 89–98). Oxford: Oxford University Press.

    Google Scholar 

  2. Adam, H. (2014). Dysprosody in aphasia: an acoustic analysis evidence from palestinian arabic. Journal of Language and Linguistic Studies, 10(1), 153–162.

    Google Scholar 

  3. National Aphasia Association. (2018). Aphasia definitions. https://www.aphasia.org/aphasia-definitions/, accessed 9 August 2018.

  4. Wikipedia contributors. (2018a). Aphasia — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Aphasia, accessed 10 September 2018.

  5. Wikipedia Contributors. (2018). Anomic aphasia — Wikipedia, the free encyclopedia. https://en.wikipedia.org/wiki/Anomic_aphasia, accessed 10 September 2018.

  6. Peintner, B., Jarrold, W., Vergyri, D., Richey, C., Tempini, M.L.G., Ogar, J. (2008). Learning diagnostic models using speech and language measures. In Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS) (pp. 4648–4651). Vancouver: IEEE.

  7. Fraser, K.C., Rudzicz, F., Rochon, E. (2013). Using text and acoustic features to diagnose progressive aphasia and its subtypes. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 2177–2181). Lyon: ISCA.

  8. Fraser, K.C., Meltzer, J.A., Graham, N.L., Leonard, C., Hirst, G., Black, S.E., Rochon, E. (2014). Automated classification of primary progressive aphasia subtypes from narrative speech transcripts. Cortex, 55, 43–60.

    Article  Google Scholar 

  9. Fraser, K.C., Rudzicz, F., Graham, N., Rochon, E. (2013). Automatic speech recognition in the diagnosis of primary progressive aphasia. In Proceedings of the 4th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT) (pp. 47–54). Grenoble: ACL/ISCA Special Interest Group.

  10. Le, D., & Provost, E.M. (2016). Improving automatic recognition of aphasic speech with Aphasiabank. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 2681–2685). San Francisco: ISCA.

  11. Le, D., Licata, K., Provost, E.M. (2018). Automatic quantitative analysis of spontaneous aphasic speech. Speech Communication, 100, 1–12.

  12. Kohlschein, C., Klischies, D., Meisen, T., Schuller, B.W., Werner, C.J. (2018). Automatic processing of clinical aphasia data collected during diagnosis sessions: challenges and prospects. In Proceedings of resources and processing of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric impairments (RaPID-2), satellite of the 11th Language Resources and Evaluation Conference (LREC) (pp. 11–18). Miyazaki: ELRA.

  13. Lee, T., Kong, A., Chan, V., Wang, H. (2013). Analysis of auto-aligned and auto-segmented oral discourse by speakers with aphasia: a preliminary study on the acoustic parameter of duration. Procedia, Social and Behavioral Sciences, 94, 71–72.

    Article  Google Scholar 

  14. Lee, T., Liu, Y., Huang, P.W., Chien, J.T., Lam, W.K., Yeung, Y.T., Law, T.K., Lee, K.Y., Kong, A.P.H., Law, S.P. (2016). Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6475–6479). Shanghai: IEEE.

  15. Qin, Y., Lee, T., Kong, A.P.H. (2018a). Automatic speech assessment for aphasic patients based on syllable-level embedding and supra-segmental duration features. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 5994–5998). Calgary: IEEE.

  16. Qin, Y., Lee, T., Feng, S., Kong, A.P.H. (2018b). Automatic speech assessment for people with aphasia using TDNN-BLSTM with multi-task learning. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 3418–3422). Hyderabad: ISCA.

  17. Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of International Conference on Machine Learning (ICML) (pp. 1764–1772). Beijing: IMLS.

  18. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:cs.CL/1609.08144.

  19. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  20. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y. (1259). On the properties of neural machine translation: Encoder-decoder approaches. arXiv:cs.CL/1409.

  21. Tang, Y., Huang, Y., Wu, Z., Meng, H., Xu, M., Cai, L. (2016). Question detection from acoustic features using recurrent neural network with gated recurrent unit. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 6125–6129). Shanghai: IEEE.

  22. Rana, R. (2016). Gated Recurrent Unit (GRU) for emotion classification from noisy speech. arXiv:cs.HC/1612.07778.

  23. Chung, H., Lee, Y.K., Lee, S.J., Park, J.G. (2017). Spoken english fluency scoring using convolutional neural networks. In Proceedings of Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA) (pp. 1–6). Seoul: COCOSDA.

  24. Vásquez-Correa, J., Orozco-Arroyave, J.R., Nöth, E. (2017). Convolutional neural network to model articulation impairments in patients with Parkinson’s disease. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH) (pp. 314–318). Stockholm: ISCA.

  25. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2921–2929). Las Vegas: IEEE.

  26. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of International Conference on Computer Vision (ICCV) (pp. 618–626). Venice: CVF.

  27. Wu, Y., & Lee, T. (2019). Enhancing sound texture in CNN-based acoustic scene classification. arXiv:cs.SD/1901.01502.

  28. Kong, A.P.H., & Law, S.P. (2018). Cantonese Aphasiabank: An annotated database of spoken discourse and co-verbal gestures by healthy and language-impaired native cantonese speakers. Behavior research methods, pp. 1–14.

  29. MacWhinney, B., Fromm, D., Forbes, M., Holland, A. (2011). Aphasiabank: Methods for studying discourse. Aphasiology, 25(11), 1286–1307.

    Article  Google Scholar 

  30. Kong, A.P.H., Law, S.P., Kwan, C.C.Y., Lai, C., Lam, V. (2015). A coding system with independent annotations of gesture forms and functions during verbal communication: Development of a database of speech and gesture (DoSaGE). Journal of Nonverbal Behavior, 39(1), 93–111.

    Article  Google Scholar 

  31. Yiu, E.M. (1992). Linguistic assessment of Chinese-speaking aphasics: Development of a Cantonese Aphasia Battery. Journal of Neurolinguistics, 7(4), 379–424.

    Article  Google Scholar 

  32. Kong, A.P.H. (2016). Analysis of neurogenic disordered discourse production: From theory to practice. Routledge.

  33. Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of Annual Conference on Neural Information Processing Systems (NIPS) (pp. 1097–1105). Lake Tahoe: NIPS Foundation.

  34. Simonyan, K., & Zisserman, A. (1556). Very deep convolutional networks for large-scale image recognition. arXiv:cs.CV/1409.

  35. Lin, M., Chen, Q., Yan, S. (2013). Network in network. arXiv:cs.NE/1312.4400.

  36. Kingma, DP, & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:cs.LG/1412.6980.

  37. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017). Automatic differentiation in pytorch In Conference on Neural Information Processing Systems Workshop (NIPS-w), NIPS Foundation, Long Beach.

  38. Vuk, M., & Curk, T. (2006). ROC Curve, lift chart and calibration plot. Metodoloski zvezki, 3(1), 89.

    Google Scholar 

  39. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

    Article  MathSciNet  Google Scholar 

  40. McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157.

    Article  Google Scholar 

  41. Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by a GRF project grant (Ref: CUHK14227216) from the Hong Kong Research Grants Council, a Direct Grant from the CUHK Research Committee, the CUHK Research Sustainability Fund, and the CUHK Shenzhen Research Institute. The Cantonese AphasiaBank project was supported by a fund from the National Institutes of Health (project number: NIH-R01-DC010398).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Qin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Wu, Y., Lee, T. et al. An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia. J Sign Process Syst 92, 819–830 (2020). https://doi.org/10.1007/s11265-019-01511-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-01511-3

Keywords

Navigation