Abstract
This study presents a user interface that was intentionally designed to support multimodal interaction by compensating for the weaknesses of speech compared with pen input and vice versa. The test application was email using a web pad with pen and speech input. In the case of pen input, information was represented as visual objects, which were easily accessible. Graphical metaphors were used to enable faster and easier manipulation of data. Speech input was facilitated by displaying the system speech vocabulary to the user. All commands and accessible fields with text labels could be spoken in by name. Commands and objects that the user could access via speech input were shown on a dynamic basis in a window. Multimodal interaction was further enhanced by creating a flexible object-action order such that the user could utter or select a command with a pen followed by the object which was to be enacted upon, or the other way round (e.g., “New Message” or “Message New”). The flexible action-object interaction design combined with voice and pen input led to eight possible action-object-modality combinations. The complexity of the multimodal interface was further reduced by making generic commands such as “New” applicable across corresponding objects. Use of generic commands led to a simplification of menu structures by reducing the number of instances in which actions appeared. In this manner, more content information could be made visible and consistently accessible via pen and speech input. Results of a controlled experiment indicated that the shortest task completion times for the eight possible input conditions were when speech-only was used to refer to an object followed by the action to be performed. Speech-only input with action-object order was also relatively fast. In the case of pen input-only, the shortest task completion times were found when an object was selected first followed by the action to be performed. In multimodal trials in which both pen and speech were used, no significant effect was found for object-action order, suggesting benefits of providing users with a flexible action-object interaction style in multimodal or speech-only systems.
Similar content being viewed by others
References
Oviatt S, Kuhn K (1998) Referential features and linguistic indirection in multimodal language. Proc Int Conf Spoken Language Processing (ICSLP). ASSTA Inc., Sydney, pp 2339–2342
Cohen PR (1992) The role of natural language in a multimodal interface. In: Mackinlay J, Green M (eds) Proc 5th Annu Symp User Interface Software and Technology (USIT) ’92, Monterey, CA. ACM Press, New York, pp 143–149
Grasso MA, Ebert DS, Timothy WF (1998) The integrality of Speech in multimodal interfaces. ACM Trans Comput Hum Interact 5(4):303–325
Ducheneaut N, Bellotti V (2001) E-mail as habitat: an exploration of embedded personal information management. Interactions 8(5):30–38
Bälter O (1998) Electronic mail in a working context. Doctoral Dissertation. Royal Institute of Technology, Department of Numerical Analysis and Computing Science
Whittaker S, Sidner C (1996) Email overload: exploring personal information management of email. In: Proc CHI’96, AMC Conf Computer Human Interaction. ACM Press, New York, pp 276–283
Oviatt S, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In: Proc CHI’97, ACM Conf Computer Human Interaction, Atlanta. ACM Press, New York, pp 414–422
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Keyson, D., de Hoogh, M. & Aasman, J. Designing for pen and speech input in an object-action framework: the case of email. UAIS 2, 134–142 (2003). https://doi.org/10.1007/s10209-003-0046-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10209-003-0046-x