The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces-user input involving new media (speech, multi-touch, gestures, writing) embedded in multimodal-multisensor interfaces. These interfaces support smartphones, wearables, in-vehicle, robotic, and many other applications that are now highly competitive commercially.
This edited collection is written by international experts and pioneers in the field. It provides a textbook for students, and a reference and technology roadmap for professionals working in this rapidly emerging area.
Volume 1 of the handbook presents relevant theory and neuroscience foundations for guiding the development of high-performance systems. Additional chapters discuss approaches to user modeling, interface design that supports user choice, synergistic combination of modalities with sensors, and blending of multimodal input and output. They also highlight an in-depth look at the most common multimodal-multisensor combinations- for example, touch and pen input, haptic and non-speech audio output, and speech co-processed with visible lip movements, gaze, gestures, or pen input. A common theme throughout is support for mobility and individual differences among users-including the world's rapidly growing population of seniors.
These handbook chapters provide walk-through examples and video illustrations of different system designs and their interactive use. Common terms are defined, and information on practical resources is provided (e.g., software tools, data resources) for hands-on project work to develop and evaluate multimodal-multisensor systems. In the final chapter, experts exchange views on a timely and controversial challenge topic, and how they believe multimodal-multisensor interfaces should be designed in the future to most effectively advance human performance.
Book Downloads
Preface
The content of this handbook would be most appropriate for graduate students, and of primary interest to students studying computer science and information technology, human-computer interfaces, mobile and ubiquitous interfaces, and related ...
Theoretical foundations of multimodal interfaces and systems
This chapter discusses the theoretical foundations of multisensory perception and multimodal communication. It provides a basis for understanding the performance advantages of multimodal interfaces, as well as how to design them to reap these ...
The impact of multimodal-multisensory learning on human performance and brain activation patterns
The human brain is inherently a multimodal-multisensory dynamic learning system. All information that is processed by the brain must first be encoded through sensory systems and this sensory input can only be attained through motor movement. Although ...
Multisensory haptic interactions: understanding the sense and designing for it
Our haptic sense comprises both taction or cutaneous information obtained through receptors in the skin, and kinesthetic awareness of body forces and motions. Broadly speaking, haptic interfaces to computing systems are anything a user touches or is ...
A background perspective on touch as a multimodal (and multisensor) construct
This chapter will illustrate, through a series of examples, seven different perspectives of how touch input can be re-framed and re-conceived as a multimodal, multisensor construct.
These perspectives often can particularly benefit from considering the ...
Understanding and supporting modality choices
One of the characteristic benefits of multimodal-multisensor processing is that it gives users more freedom of choice than they would otherwise have. The most central type of choice concerns the use of input modalities: When performing a particular task ...
Using cognitive models to understand multimodal processes: the case for speech and gesture production
Multimodal behavior has been studied for a long time and in many fields, e.g., in psychology, linguistics, communication studies, education, and ergonomics. One of the main motivations has been to allow humans to use technical systems intuitively, in a ...
Multimodal feedback in HCI: haptics, non-speech audio, and their applications
Computer interfaces traditionally depend on visual feedback to provide information to users, with large, high-resolution screens the norm. Other sensory modalities, such as haptics and audio, have great potential to enrich the interaction between user ...
Multimodal technologies for seniors: challenges and opportunities
This chapter discusses interactive technologies in the service of seniors. Adults over 65 form one of the largest and most rapidly growing user groups in the industrialized society. Interactive technologies have been steadily improving in their ability ...
Gaze-informed multimodal interaction
Observe a person pointing out and describing something. Where is that person looking? Chances are good that this person also looks at what she is talking about and pointing at. Gaze is naturally coordinated with our speech and hand movements. By ...
Multimodal speech and pen interfaces
This chapter describes interfaces that enable users to combine digital pen and speech input for interacting with computing systems. Such interfaces promise natural and efficient interaction, taking advantage of skills that users have developed over many ...
Multimodal gesture recognition
Starting from the famous "Put That There!" demonstration prototype, developed by the Architecture Machine Group at MIT in the late 1970s, the growing potential of multimodal gesture interfaces in natural human-machine communication setups has stimulated ...
Audio and visual modality combination in speech processing applications
- Gerasimos Potamianos,
- Etienne Marcheret,
- Youssef Mroueh,
- Vaibhava Goel,
- Alexandros Koumbaroulis,
- Argyrios Vartholomaios,
- Spyridon Thermos
Chances are that most of us have experienced difficulty in listening to our interlocutor during face-to-face conversation while in highly noisy environments, such as next to heavy traffic or over the background of high-intensity speech babble or loud ...
Perspectives on learning with multimodal technology
To set the stage for this multidisciplinary discussion among experts on the challenging topic of learning with multimodal technology, weasksomebasic questions:
• What have neuroscience, cognitive and learning sciences, and humancomputer interaction ...
Cited By
- Barz M, Bednarik R, Bulling A, Conati C and Sonntag D HumanEYEze 2024: Workshop on Eye Tracking for Multimodal Human-Centric Computing Proceedings of the 26th International Conference on Multimodal Interaction, (696-697)
- Vatavu R AI as Modality in Human Augmentation: Toward New Forms of Multimodal Interaction with AI-Embodied Modalities Proceedings of the 26th International Conference on Multimodal Interaction, (591-595)
-
Dobiasch M, Oppl S, Stöckl M and Baca A (2023). Pegasos: a framework for the creation of direct mobile coaching feedback systems, Journal on Multimodal User Interfaces, 10.1007/s12193-023-00411-y, 18:1, (1-19), Online publication date: 1-Mar-2024.
-
Berna Moya J, van Oosterhout A, Marshall M and Martinez Plasencia D (2024). HapticWhirl, a Flywheel-Gimbal Handheld Haptic Controller for Exploring Multimodal Haptic Feedback, Sensors, 10.3390/s24030935, 24:3, (935)
-
Nazeer M, Salagrama S, Kumar P, Sharma K, Parashar D, Qayyum M and Patil G (2024). Improved Method for Stress Detection Using Bio-Sensor Technology and Machine Learning Algorithms, MethodsX, 10.1016/j.mex.2024.102581, (102581), Online publication date: 1-Jan-2024.
-
Shen W and Lin F (2024). The Design of AI-Enabled Experience-Based Knowledge Management System to Facilitate Knowing and Doing in Communities of Practice Knowledge Management in Organisations, 10.1007/978-3-031-63269-3_22, (292-303),
-
Esteves J and Gonçalves B (2024). Designing Audio-Based Multimodal Interfaces for English Teaching: A Conceptual Model Based on an Integrative Literature Review Advances in Design and Digital Communication IV, 10.1007/978-3-031-47281-7_5, (53-66),
-
Zhang Xiaojun , Corpas Pastor G and Zhang J (2023). Chapter 7. Videoconference interpreting goes multimodal Interpreting Technologies – Current and Future Trends, 10.1075/ivitra.37.07zha, (169-194)
-
Crowley J, Coutaz J, Grosinger J, Vazquez-Salceda J, Angulo C, Sanfeliu A, Iocchi L and Cohn A A Hierarchical Framework for Collaborative Artificial Intelligence, IEEE Pervasive Computing, 10.1109/MPRV.2022.3208321, 22:1, (9-18)
-
Xie C, Liu Y and Zhou H (2023). Exploration of Design Issues from an Embodied Perspective Design, User Experience, and Usability, 10.1007/978-3-031-35699-5_28, (384-395),
-
Yam-Viramontes B, Cardona-Reyes H, González-Trejo J, Trujillo-Espinoza C and Mercado-Ravell D (2022). Commanding a drone through body poses, improving the user experience, Journal on Multimodal User Interfaces, 10.1007/s12193-022-00396-0, 16:4, (357-369), Online publication date: 1-Dec-2022.
-
Senaratne H, Oviatt S, Ellis K and Melvin G (2022). A Critical Review of Multimodal-multisensor Analytics for Anxiety Assessment, ACM Transactions on Computing for Healthcare, 10.1145/3556980, 3:4, (1-42), Online publication date: 31-Oct-2022.
-
Garcia-Ruiz M, Santana-Mancilla P, Gaytan-Lugo L and Iniguez-Carrillo A (2022). Participatory Design of Sonification Development for Learning about Molecular Structures in Virtual Reality, Multimodal Technologies and Interaction, 10.3390/mti6100089, 6:10, (89)
-
Choi Y, Kim J and Hong J Immersion Measurement in Watching Videos Using Eye-tracking Data, IEEE Transactions on Affective Computing, 10.1109/TAFFC.2022.3209311, 13:4, (1759-1770)
-
Stingl R, Zimmerer C, Fischbach M and Latoschik M (2022). Are You Referring to Me? - Giving Virtual Objects Awareness 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), 10.1109/ISMAR-Adjunct57072.2022.00139, 978-1-6654-5365-3, (671-673)
-
Mangaroska K, Sharma K, Gašević D and Giannakos M (2021). Exploring students' cognitive and affective states during problem solving through multimodal data: Lessons learned from a programming activity, Journal of Computer Assisted Learning, 10.1111/jcal.12590, 38:1, (40-59), Online publication date: 1-Feb-2022.
- Tian L, Oviatt S, Muszynski M, Chamberlain B, Healey J and Sano A (2022). Applied Affective Computing, 10.1145/3502398, Online publication date: 25-Jan-2022.
-
Arévalo Arboleda S, Becker M and Gerken J (2022). Does One Size Fit All? A Case Study to Discuss Findings of an Augmented Hands-Free Robot Teleoperation Concept for People with and without Motor Disabilities, Technologies, 10.3390/technologies10010004, 10:1, (4)
-
Oviatt S (2022). Multimodal Interaction, Interfaces, and Analytics Handbook of Human Computer Interaction, 10.1007/978-3-319-27648-9_22-1, (1-29),
-
Šumak B, Brdnik S and Pušnik M (2021). Sensors and Artificial Intelligence Methods and Algorithms for Human–Computer Intelligent Interaction: A Systematic Mapping Study, Sensors, 10.3390/s22010020, 22:1, (20)
-
Lauer L, Altmeyer K, Malone S, Barz M, Brünken R, Sonntag D and Peschel M (2021). Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School Children, Sensors, 10.3390/s21196623, 21:19, (6623)
-
Krüger N, Fischer K, Manoonpong P, Palinko O, Bodenhagen L, Baumann T, Kjærum J, Rano I, Naik L, Juel W, Haarslev F, Ignasov J, Marchetti E, Langedijk R, Kollakidou A, Jeppesen K, Heidtmann C and Dalgaard L (2021). The SMOOTH-Robot: A Modular, Interactive Service Robot, Frontiers in Robotics and AI, 10.3389/frobt.2021.645639, 8
-
Mangaroska K, Martinez‐Maldonado R, Vesin B and Gašević D (2021). Challenges and opportunities of multimodal data in human learning: The computer science students' perspective, Journal of Computer Assisted Learning, 10.1111/jcal.12542, 37:4, (1030-1047), Online publication date: 1-Aug-2021.
- Bhatti O, Barz M and Sonntag D EyeLogin - Calibration-free Authentication Method for Public Displays Using Eye Gaze ACM Symposium on Eye Tracking Research and Applications, (1-7)
- Oviatt S, Lin J and Sriramulu A (2021). I Know What You Know: What Hand Movements Reveal about Domain Expertise, ACM Transactions on Interactive Intelligent Systems, 11:1, (1-26), Online publication date: 31-Mar-2021.
-
Yeamkuan S and Chamnongthai K (2021). 3D Point-of-Intention Determination Using a Multimodal Fusion of Hand Pointing and Eye Gaze for a 3D Display, Sensors, 10.3390/s21041155, 21:4, (1155)
-
Adam D and Okimoto M (2021). Multimodal Technology: Improving Accessibility of the Design of Home Appliances Advances in Usability, User Experience, Wearable and Assistive Technology, 10.1007/978-3-030-80091-8_53, (452-460),
-
Biswas R, Barz M and Sonntag D (2020). Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking, KI - Künstliche Intelligenz, 10.1007/s13218-020-00679-2, 34:4, (571-584), Online publication date: 1-Dec-2020.
-
Chen S and Epps J (2020). Multimodal Coordination Measures to Understand Users and Tasks, ACM Transactions on Computer-Human Interaction, 10.1145/3412365, 27:6, (1-26), Online publication date: 24-Nov-2020.
-
Zimmerer C, Wolf E, Wolf S, Fischbach M, Lugrin J and Latoschik M (2020). Finally on Par?! Multimodal and Unimodal Interaction for Open Creative Design Tasks in Virtual Reality ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 10.1145/3382507.3418850, 9781450375818, (222-231), Online publication date: 21-Oct-2020.
-
Kuriakose B, Shrestha R and Sandnes F (2020). Multimodal Navigation Systems for Users with Visual Impairments—A Review and Analysis, Multimodal Technologies and Interaction, 10.3390/mti4040073, 4:4, (73)
-
Davila Delgado J, Oyedele L, Demian P and Beach T (2020). A research agenda for augmented and virtual reality in architecture, engineering and construction, Advanced Engineering Informatics, 10.1016/j.aei.2020.101122, 45, (101122), Online publication date: 1-Aug-2020.
-
Conati C, Lallé S, Rahman M and Toker D (2020). Comparing and Combining Interaction Data and Eye-tracking Data for the Real-time Prediction of User Cognitive Abilities in Visualization Tasks, ACM Transactions on Interactive Intelligent Systems, 10.1145/3301400, 10:2, (1-41), Online publication date: 30-May-2020.
-
Sterpu G, Saam C and Harte N How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 10.1109/TASLP.2020.2980436, 28, (1052-1064)
-
Setiawan D, Priambodo B, Desi Anasanti M, Hazidar A, Naf’an E, Masril M, Handriani I, Kudr Nseaf A and Pratama Putra Z (2019). Designing a Multimodal Graph System to Support Non-Visual Interpretation of Graphical Information, Journal of Physics: Conference Series, 10.1088/1742-6596/1339/1/012059, 1339:1, (012059), Online publication date: 1-Dec-2019.
-
Meena Y, Cecotti H, Wong-Lin K and Prasad G (2019). Design and evaluation of a time adaptive multimodal virtual keyboard, Journal on Multimodal User Interfaces, 10.1007/s12193-019-00293-z, 13:4, (343-361), Online publication date: 1-Dec-2019.
- Biswas R, Mogadala A, Barz M, Sonntag D and Klakow D Automatic Judgement of Neural Network-Generated Image Captions Statistical Language and Speech Processing, (261-272)
- Prange A and Sonntag D Modeling Cognitive Status through Automatic Scoring of a Digital Version of the Clock Drawing Test Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization, (70-77)
- Alharbi O, Arif A, Stuerzlinger W, Dunlop M and Komninos A WiseType: A Tablet Keyboard with Color-Coded Visualization and Various Editing Options for Error Correction Proceedings of the 45th Graphics Interface Conference, (1-10)
-
Zimmerer C, Fischbach M and Latoschik M (2018). Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition Networks, Multimodal Technologies and Interaction, 10.3390/mti2040081, 2:4, (81)
-
Alonso V and de la Puente P (2018). System Transparency in Shared Autonomy: A Mini Review, Frontiers in Neurorobotics, 10.3389/fnbot.2018.00083, 12
-
Yamauchi T, Seo J and Sungkajun A (2018). Interactive Plants: Multisensory Visual-Tactile Interaction Enhances Emotional Experience, Mathematics, 10.3390/math6110225, 6:11, (225)
- Crowley J Put That There Proceedings of the 20th ACM International Conference on Multimodal Interaction, (4-4)
- Introduction The Handbook of Multimodal-Multisensor Interfaces, (1-16)
-
Huang H, Gartner G, Krisp J, Raubal M and Van de Weghe N (2018). Location based services: ongoing evolution and research agenda, Journal of Location Based Services, 10.1080/17489725.2018.1508763, 12:2, (63-93), Online publication date: 3-Apr-2018.
-
Zimmerer C, Fischbach M and Latoschik M (2018). Space Tentacles - Integrating Multimodal Input into a VR Adventure Game 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), 10.1109/VR.2018.8446151, 978-1-5386-3365-6, (745-746)
Index Terms
- The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1