Abstract
There is an urgent need for having interfaces that directly employ the natural communication and manipulation skills of humans. Vision based systems that are suitable for identifying small actions and suitable for communication applications will allow the deployment for machine control by people with restricted limb movements, such as neuro-trauma patients. Because of the limited abilities of these people, it is also important that these systems have inbuilt intelligence and are suitable for learning about the user and reconfigure itself appropriately. Patients who have suffered neuro-trauma often have restricted body and limb movements. In such cases, hand, arms and the body movements may be impossible, thus head activity and face expression become important in designing Human computer interface (HCI) systems for machine control. Silent speech-based assistive technologies (AT) are important for users with difficulty to vocalize by providing the flexibility for the users to control computers without making a sound. This chapter evaluates the feasibility of using facial muscle activity signals and mouth video to identify speech commands, in the absence of voice signals. This chapter investigates the classification power of mouth videos in identifying English vowels and consonants. This research also examines the use of non invasive, facial surface Electromyogram (SEMG) to identify unvoiced English and German vowels based on the muscle activity and also provide a feedback to the visual system. The results suggest that video-based systems and facial muscle activity work reliably for simple speech-based commands for AT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ursula, H., Pierre, P.: Facial reactions to emotional facial expressions: affect or cognition? Cogn. Emot. 12(4), 509–531 (1998)
Feng, J., Sears, A., Karat, C.: A longitudinal evaluation of hands-free speech-based navigation during dictation. Int. J. Hum. Comput. Stud. 64, 553–569 (2006)
Kuhn, T., Jameel, A., Stuempfle, M., Haddadi, A.: Hybrid in-car speech recognition for mobile multimedia applications. In: IEEE Vehicular Technology Conference, Houston, TX., USA, 2009–2013 (1999)
Starkie, B., 2001. Programming Spoken Dialogs Using Grammatical Inference. AI 2001: Advances in Artificial Intelligence: 14th International Joint Conference on Artificial Intelligence. Adelaide, Australia
Yau, W.C., Kumar, D.K., Arjunan, S.P. 2006. Visual Speech Recognition Method Using Translation. Scale and Rotation Invariant Features, IEEE International Conference on Advanced Video and Signal based Surveillance, Sydney, Australia
Dikshit, P. S., Schubert, R. W., 1995. Electroglottograph as an additional source of information in isolated word recognition. Fourteenth Southern Biomedical, Engineering Conference, 1–4.
Arjunan, S., Kumar, D.K., Weghorn, H., Naik, G.: Facial muscle activity patterns for recognition of utterances in native and foreign language: testing for its reliability and flexibility. In: Mago, V., Bhatia, N. (eds.) Cross-Disciplinary Applications of Artificial Intelligence and Pattern Recognition: Advancing Technologies, pp. 212–231. Information Science Reference, Hershey (2012)
Potamianos, G., Neti, C., Gravier, G., Senior, A. W.: Recent Advances in Automatic Recognition of Audio-Visual Speech. Proceedings of IEEE, vol. 91 (2003)
Hazen, T.J.: Visual model structures and synchrony constraints for audio-visual speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 1082–1089 (2006)
Petajan, E.D.: Automatic Lip-reading to Enhance Speech Recognition. IEEE Global Telecommunication Conference (1984)
Kaynak, M.N., Qi, Z., Cheok, A.D., Sengupta, K., Chung, K.C.: Audio-visual modeling for bimodal speech recognition. IEEE Trans. Syst. Man Cybern. B Cybern. 34, 564–570 (2001)
Adjoudani, A., Benoit, C., Levine, E.P.: On the integration of auditory and visual parameters in an HMM-based ASR, Models, Systems, and Applications, Speechreading by Humans and Machines, pp. 461–472. (1996)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23, 257–267 (2001)
Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1998)
Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transform. IEEE Trans. Inf. Theory 38, 587–607 (1992)
Khontazad, A., Hong, Y.H.: Invariant image recognition by zernike moments. IEEE Trans. Pattern Anal. Mach. Intell. 12, 489–497 (1990)
Teague, M.R.: Image analysis via the general theory of moments. J. Opt. Soc. Am. 70, 920–930 (1980)
Teh, C.H., Chin, R.T.: On image analysis by the methods of moments. IEEE Trans. Pattern Anal. Mach. Intell. 10, 496–513 (1988)
Yau, W.C., Kumar, D.K., Weghorn, H.: Motion Features for Visual Speech Recognition. In: Liew, A., Wang, S. (eds.) Visual speech recognition: Lip segmentation and mapping, pp. 388–415. Medical Information Science Reference, Hershey (2009)
Lapatki, G., Stegeman, D.F., Jonas, I.E.: A surface EMG electrode for the simultaneous observation of multiple facial muscles. J. Neurosci. Methods 123, 117–128 (2003)
Parsons, T. W.: Voice and speech processing, 1st edn, McGraw-Hill Book Company, New York (1986)
Basmajian, J.V., Deluca, C.J.: Muscles alive: Their functions revealed by electromyography. 5th edn. (1985)
Chan, D.C., Englehart, K., Hudgins, B., Lovely, D.F.: A multi-expert speech recognition system using acoustic and myoelectric signals. 24th Annual IEEE EMBS/BMES Conference (2002)
Kumar, S., Kumar, D.K., Alemu, M., Burry, M.: EMG based voice recognition. In: Prooceddings of intelligent sensors, sensor networks and information processing conference (2004)
Arjunan, S.P., Weghorn, H., Kumar, D.K., Naik, G., Yau, W.C.: Recognition of human voice utterances from facial surface EMG without using audio signals. Enterp. Info. Syst. Lect. Notes Bus. Info. Process. 12(6), 366–378 (2009)
Tuisku, O., Surakka, V., Vanhala, T., Rantanen, V., Lekkala, J.: Wireless Face Interface: Using voluntary gaze direction and facial muscle activations for human computer interaction. Interact. Comput. 24(1), 1–9 (2012)
Fridlund, A.J., Cacioppo, J.T.: Guidelines for human electromyographic research. J. Biol. Psychol. 23(5), 567–589 (1986)
Freedman, D., Pisani, R., Purves, R.: Statistics. Norton College Books, New York (1997)
Gutierrez-Osuna, R., Lecture 13: Validation. http://research.cs.tamu.edu/prism/lectures/iss/iss_l13.pdf (Last Access: June 2012)
Foo, S.W., Dong, L.: Recognition of visual speech elements using hidden Markov models. Lect. Notes Comput. Sci. Springer-Verlag 2532, 607–614 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Arjunan, S.P., Yau, W.C., Kumar, D.K. (2014). Evaluating Video and Facial Muscle Activity for a Better Assistive Technology: A Silent Speech Based HCI. In: Mago, V., Dabbaghian, V. (eds) Computational Models of Complex Systems. Intelligent Systems Reference Library, vol 53. Springer, Cham. https://doi.org/10.1007/978-3-319-01285-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-01285-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01284-1
Online ISBN: 978-3-319-01285-8
eBook Packages: EngineeringEngineering (R0)