Article

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults

Authors:

Rebecca Lunsford,

Rachel CoulstonAuthors Info & Claims

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

Pages 167 - 174

https://doi.org/10.1145/1088463.1088494

Published: 04 October 2005 Publication History

Abstract

In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.

References

[1]

Bakx, I., K.v. Turnhout, & J. Terken. Facial orientation during multi-party interaction with information kiosks. Proceedings of the Interact Conference, 2003, Zurich, Switzerland: IOS Press, 701--704.

[2]

Berk, L.E. Why children talk to themselves. Scientific American, 1994, 271(5), 78--83.

[3]

Boersma, P. & D. Weenik, Praat: doing phonetics by computer (Version 4.2). 2005. (URL: www.praat.org)

[4]

Buxton, W. Integrating the periphery and context: A new taxonomy of telematics. Proceedings of the Graphics Interface Conference, 1995, Quebec City, Quebec: Morgan Kaufman, 239--246.

[5]

Comblain, A. Working memory in Down's Syndrome: Training the rehearsal strategy. Down's Syndrome: Research and Practice, 1994, 2(3), 123--126.

[6]

Czaja, S.J. & C.C. Lee., Designing computer systems for older adults. Handbook of Human-Computer Interaction, J. Jacko & A. Sears, eds: LEA, NY. 2002, 413--427.

Digital Library

[7]

Duncan, R.M. & J.A. Cheyne. Private speech in young adults: Task difficulty, self-regulation, and psychological predication. Cognitive Development, 2002, 16, 889--906.

[8]

Katzenmaier, M., R. Steifelhagen, & T. Schultz. Identifying the addressee in human-human-robot interactions based on head pose and speech. Proceedings of the International Conference on Multimodal Interfaces, 2004, State College, PA: ACM Press, 144--151.

Digital Library

[9]

Luria, A.R. The Role of Speech in the Regulation of Normal and Abnormal Behavior, 1961, Liveright, NY.

[10]

Meichenbaum, D. & J. Goodman. Reflection-impulsivity and verbal control of motor behavior. Child Development, 1969, 40, 785--797.

[11]

Messer, S.B. Reflection-impulsivity: A review. Psychological Bulletin, 1976, 83(6), 1026--1052.

[12]

Neti, C., G. Iyengar, G. Potamianos, A. Senior, & B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction. Proceedings of the International Conference on Spoken Language Processing, 2000, Beijing: Chinese Friendship Publishers, 11--14.

[13]

Oppermann, D., F. Schiel, S. Steininger, & N. Berger. Off-talk - a problem for human-machine-interaction? Proceedings of the EuroSpeech Conference, 2001, Aalborg, Denmark: ISCA Secretariat, 2197--2200.

[14]

Oviatt, S.L., P.R. Cohen, & M.Q. Wang. Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity. Speech Communication, 1994, 15, 283--300.

Digital Library

[15]

Oviatt, S.L., R. Coulston, S. Tomko, B. Xiao, R. Lunsford, M. Wesson, & L. Carmichael. Toward a theory of organized multimodal integration patterns during human-computer interaction. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 44--51.

Digital Library

[16]

Paek, T., E. Horvitz, & E. Ringger. Continuous listening for unconstrained spoken dialog. Proceedings of the ICSLP, 2000, Beijing, China: 138--141.

[17]

Svirsky, M.A., H. Lane, J.S. Perkell, & J. Wozniak. Effects of short-term auditory deprivation on speech production in adult cochlear implant users. Journal of the Acoustic Society of America, 1992, 3, 1284--1300.

[18]

Wilpon, J. & C. Jacobsen. A study of speech recognition for children and the elderly. Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 1996, Atlanta, GA: IEEE Press, 349--352.

Digital Library

[19]

Winsler, A. & J. Naglieri. Overt and covert verbal problem-solving strategies: Developmental trends in use, awareness, and relations with task performance in children aged 5 to 17. Child Development, 2003, 74(3), 659--678.

[20]

Xiao, B., R. Lunsford, R. Coulston, M. Wesson, & S.L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. Proceedings of the International Conference on Multimodal Interfaces, 2003, Vancouver, BC: ACM Press, 265--272.

Digital Library

Cited By

Chung AChiu D(2016)OPAC Usability Problems of ArchivesInternational Journal of Systems and Service-Oriented Engineering10.4018/IJSSOE.20160101046:1(54-70)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.4018/IJSSOE.2016010104
Le Maitre JChetouani M(2013)Self-talk Discrimination in Human–Robot Interaction Situations for Supporting Social AwarenessInternational Journal of Social Robotics10.1007/s12369-013-0179-x5:2(277-289)Online publication date: 26-Jan-2013
https://doi.org/10.1007/s12369-013-0179-x
Oviatt SJacko J(2012)Multimodal InterfacesHuman–Computer Interaction Handbook10.1201/b11963-22(405-430)Online publication date: 14-May-2012
https://doi.org/10.1201/b11963-22
Show More Cited By

Index Terms

Audio-visual cues distinguishing self- from system-directed speech in younger and older adults
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping
      2. User centered design
    2. Interaction design theory, concepts and paradigms

Recommendations

Toward open-microphone engagement for multiparty interactions
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

There currently is considerable interest in developing new open-microphone engagement techniques for speech and multimodal interfaces that perform robustly in complex mobile and multiparty field environments. State-of-the-art audio-visual open-...
Human perception of intended addressee during computer-assisted meetings
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Recent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how ...
When do we interact multimodally?: cognitive load and multimodal communication patterns
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

October 2005

344 pages

ISBN:1595930280

DOI:10.1145/1088463

General Chairs:
Gianni Lazzari
ITC-irst, Trento (Italy)
,
Fabio Pianesi
ITC-irst, Trento (Italy)
,
Program Chairs:
James Crowley
I.N.P. Grenoble (France)
,
Kenji Mase
Nagoya University (Japan)
,
Sharon Oviatt
Oregon Health & Sciences University

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI05

Sponsor:

ICMI05: Seventh International Conference on Multimodal Interfaces 2005

October 4 - 6, 2005

Torento, Italy

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
337
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chung AChiu D(2016)OPAC Usability Problems of ArchivesInternational Journal of Systems and Service-Oriented Engineering10.4018/IJSSOE.20160101046:1(54-70)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.4018/IJSSOE.2016010104
Le Maitre JChetouani M(2013)Self-talk Discrimination in Human–Robot Interaction Situations for Supporting Social AwarenessInternational Journal of Social Robotics10.1007/s12369-013-0179-x5:2(277-289)Online publication date: 26-Jan-2013
https://doi.org/10.1007/s12369-013-0179-x
Oviatt SJacko J(2012)Multimodal InterfacesHuman–Computer Interaction Handbook10.1201/b11963-22(405-430)Online publication date: 14-May-2012
https://doi.org/10.1201/b11963-22
Oviatt SSwindells CArthur ACzerwinski MLund ATan D(2008)Implicit user-adaptive system engagement in speech and pen interfacesProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1357054.1357204(969-978)Online publication date: 6-Apr-2008
https://dl.acm.org/doi/10.1145/1357054.1357204
Barthelmess POviatt S(2008)Multimodal InterfacesHCI Beyond the GUI10.1016/B978-0-12-374017-5.00012-2(391-444)Online publication date: 2008
https://doi.org/10.1016/B978-0-12-374017-5.00012-2
Lunsford ROviatt SArthur AQuek FYang JMassaro DAlwan AHazen T(2006)Toward open-microphone engagement for multiparty interactionsProceedings of the 8th international conference on Multimodal interfaces10.1145/1180995.1181049(273-280)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.1145/1180995.1181049
Tse EGreenberg SShen CQuek FYang JMassaro DAlwan AHazen T(2006)GSI demoProceedings of the 8th international conference on Multimodal interfaces10.1145/1180995.1181012(76-83)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.1145/1180995.1181012
Lunsford ROviatt SQuek FYang JMassaro DAlwan AHazen T(2006)Human perception of intended addressee during computer-assisted meetingsProceedings of the 8th international conference on Multimodal interfaces10.1145/1180995.1181002(20-27)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.1145/1180995.1181002

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten