Evaluating Video and Facial Muscle Activity for a Better Assistive Technology: A Silent Speech Based HCI

Sridhar P. Arjunan⁵,
Wai C. Yau⁵ &
Dinesh K. Kumar⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 53))

1393 Accesses

Abstract

There is an urgent need for having interfaces that directly employ the natural communication and manipulation skills of humans. Vision based systems that are suitable for identifying small actions and suitable for communication applications will allow the deployment for machine control by people with restricted limb movements, such as neuro-trauma patients. Because of the limited abilities of these people, it is also important that these systems have inbuilt intelligence and are suitable for learning about the user and reconfigure itself appropriately. Patients who have suffered neuro-trauma often have restricted body and limb movements. In such cases, hand, arms and the body movements may be impossible, thus head activity and face expression become important in designing Human computer interface (HCI) systems for machine control. Silent speech-based assistive technologies (AT) are important for users with difficulty to vocalize by providing the flexibility for the users to control computers without making a sound. This chapter evaluates the feasibility of using facial muscle activity signals and mouth video to identify speech commands, in the absence of voice signals. This chapter investigates the classification power of mouth videos in identifying English vowels and consonants. This research also examines the use of non invasive, facial surface Electromyogram (SEMG) to identify unvoiced English and German vowels based on the muscle activity and also provide a feedback to the visual system. The results suggest that video-based systems and facial muscle activity work reliably for simple speech-based commands for AT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Design of a Silent Speech Interface using Facial Gesture Recognition and Electromyography

A Wearable High-Resolution Facial Electromyography for Long Term Recordings in Freely Behaving Humans

Article Open access 01 February 2018

Facial electromyogram-based facial gesture recognition for hands-free control of an AR/VR environment: optimal gesture set selection and validation of feasibility as an assistive technology

Article 11 April 2023

References

Ursula, H., Pierre, P.: Facial reactions to emotional facial expressions: affect or cognition? Cogn. Emot. 12(4), 509–531 (1998)
Google Scholar
Feng, J., Sears, A., Karat, C.: A longitudinal evaluation of hands-free speech-based navigation during dictation. Int. J. Hum. Comput. Stud. 64, 553–569 (2006)
Article Google Scholar
Kuhn, T., Jameel, A., Stuempfle, M., Haddadi, A.: Hybrid in-car speech recognition for mobile multimedia applications. In: IEEE Vehicular Technology Conference, Houston, TX., USA, 2009–2013 (1999)
Google Scholar
Starkie, B., 2001. Programming Spoken Dialogs Using Grammatical Inference. AI 2001: Advances in Artificial Intelligence: 14th International Joint Conference on Artificial Intelligence. Adelaide, Australia
Google Scholar
Yau, W.C., Kumar, D.K., Arjunan, S.P. 2006. Visual Speech Recognition Method Using Translation. Scale and Rotation Invariant Features, IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia
Google Scholar
Dikshit, P. S., Schubert, R. W., 1995. Electroglottograph as an additional source of information in isolated word recognition. Fourteenth Southern Biomedical, Engineering Conference, 1–4.
Google Scholar
Arjunan, S., Kumar, D.K., Weghorn, H., Naik, G.: Facial muscle activity patterns for recognition of utterances in native and foreign language: testing for its reliability and flexibility. In: Mago, V., Bhatia, N. (eds.) Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition: Advancing Technologies, pp. 212–231. Information Science Reference, Hershey (2012)
Google Scholar
Potamianos, G., Neti, C., Gravier, G., Senior, A. W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proceedings of IEEE, vol. 91 (2003)
Google Scholar
Hazen, T.J.: Visual model structures and synchrony constraints for audio-visual speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 1082–1089 (2006)
Article Google Scholar
Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. IEEE Global Telecommunication Conference (1984)
Google Scholar
Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Trans. Syst. Man Cybern. B Cybern. 34, 564–570 (2001)
Article Google Scholar
Adjoudani, A., Benoit, C., Levine, E.P.: On the integration of auditory and visual parameters in an HMM-based ASR, Models, Systems, and Applications, Speechreading by Humans and Machines, pp. 461–472. (1996)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
Article Google Scholar
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1998)
Google Scholar
Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transform. IEEE Trans. Inf. Theory 38, 587–607 (1992)
Article MathSciNet Google Scholar
Khontazad, A., Hong, Y.H.: Invariant image recognition by zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12, 489–497 (1990)
Article Google Scholar
Teague, M.R.: Image analysis via the general theory of moments. J. Opt. Soc. Am. 70, 920–930 (1980)
Article MathSciNet Google Scholar
Teh, C.H., Chin, R.T.: On image analysis by the methods of moments. IEEE Trans. Pattern Anal. Mach. Intell. 10, 496–513 (1988)
Article MATH Google Scholar
Yau, W.C., Kumar, D.K., Weghorn, H.: Motion Features for Visual Speech Recognition. In: Liew, A., Wang, S. (eds.) Visual speech recognition: Lip segmentation and mapping, pp. 388–415. Medical Information Science Reference, Hershey (2009)
Chapter Google Scholar
Lapatki, G., Stegeman, D.F., Jonas, I.E.: A surface EMG electrode for the simultaneous observation of multiple facial muscles. J. Neurosci. Methods 123, 117–128 (2003)
Article Google Scholar
Parsons, T. W.: Voice and speech processing, 1st edn, McGraw-Hill Book Company, New York (1986)
Google Scholar
Basmajian, J.V., Deluca, C.J.: Muscles alive: Their functions revealed by electromyography. 5th edn. (1985)
Google Scholar
Chan, D.C., Englehart, K., Hudgins, B., Lovely, D.F.: A multi-expert speech recognition system using acoustic and myoelectric signals. 24th Annual IEEE EMBS/BMES Conference (2002)
Google Scholar
Kumar, S., Kumar, D.K., Alemu, M., Burry, M.: EMG based voice recognition. In: Prooceddings of intelligent sensors, sensor networks and information processing conference (2004)
Google Scholar
Arjunan, S.P., Weghorn, H., Kumar, D.K., Naik, G., Yau, W.C.: Recognition of human voice utterances from facial surface EMG without using audio signals. Enterp. Info. Syst. Lect. Notes Bus. Info. Process. 12(6), 366–378 (2009)
Google Scholar
Tuisku, O., Surakka, V., Vanhala, T., Rantanen, V., Lekkala, J.: Wireless Face Interface: Using voluntary gaze direction and facial muscle activations for human computer interaction. Interact. Comput. 24(1), 1–9 (2012)
Article Google Scholar
Fridlund, A.J., Cacioppo, J.T.: Guidelines for human electromyographic research. J. Biol. Psychol. 23(5), 567–589 (1986)
Google Scholar
Freedman, D., Pisani, R., Purves, R.: Statistics. Norton College Books, New York (1997)
Google Scholar
Gutierrez-Osuna, R., Lecture 13: Validation. http://research.cs.tamu.edu/prism/lectures/iss/iss_l13.pdf (Last Access: June 2012)
Foo, S.W., Dong, L.: Recognition of visual speech elements using hidden Markov models. Lect. Notes Comput. Sci. Springer-Verlag 2532, 607–614 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Biosignals Lab, School of Electrical and Computer Engineering, RMIT University, GPO Box 2476, Melbourne, Victoria, 3001, Australia
Sridhar P. Arjunan, Wai C. Yau & Dinesh K. Kumar

Authors

Sridhar P. Arjunan
View author publications
You can also search for this author in PubMed Google Scholar
Wai C. Yau
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh K. Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sridhar P. Arjunan .

Editor information

Editors and Affiliations

Department of Computer Science, Troy University, Troy, Alabama, USA
Vijay Kumar Mago
The IRMACS Centre, Simon Fraser University The MoCSSy Program, Burnaby, British Columbia, Canada
Vahid Dabbaghian

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Arjunan, S.P., Yau, W.C., Kumar, D.K. (2014). Evaluating Video and Facial Muscle Activity for a Better Assistive Technology: A Silent Speech Based HCI. In: Mago, V., Dabbaghian, V. (eds) Computational Models of Complex Systems. Intelligent Systems Reference Library, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-319-01285-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-01285-8_7
Published: 01 November 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01284-1
Online ISBN: 978-3-319-01285-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics