1 Introduction
Sound is an important interaction modality and a large part of human interaction happens in the aural domain. While research in
Human-Robot Interaction (
HRI) has long explored spoken language for interacting with humans, sound as a broader—and, to a significant degree, non-lexical (i.e., without words)—medium has been given comparably less attention. Yet, the range of sounds that robots can produce is vast, encompassing, among others, mechanical noise, music, and utterances that mimic human and animal vocalizations with varying degrees of realism. The sound of a robot’s machinery can shape our perceptions and expectations [
11,
17], music serves as a medium for robots to engage and communicate [
18], and shared musical experiences can strengthen the bond between humans and robots [
5]. Sonifications may enhance the legibility of movement and gestures [
4,
7,
15] and beep sounds may be used to communicate emotions [
2,
14]. Getting closer to the margins of language, robots may take inspiration from non-lexical fillers such as “uh” [
13,
16] and backchannels such as “mhmm” [
8,
12]. More generally, pitch, intensity, and other human prosodic variations may be drawn on in robot sound design [
3,
10]. The information that can be extracted from sound in a robot’s environment is equally rich. Beyond the recognition of semantic content, robots use, for example, sound source localization to gain a better understanding of their environment [
9], or analyze a human’s voice timbre and tone to distinguish speakers [
6] and detect emotion [
1].
The disciplines involved in these research pursuits are diverse, ranging from engineering, music, sound, and sonic interaction design to psychology, linguistics, and conversation analysis. However, there has so far been little knowledge exchange between the different disciplines despite them often tackling similar challenges around robot sound. This special issue aims to take a step towards unifying the Sound in HRI community, celebrating and showcasing the diversity of the different approaches while highlighting points of convergence.
While there are existing works that take a broader perspective on certain sound-related subfields, such as semantic-free utterances [
19] or robotic musicianship, a central resource that provides a comprehensive overview of these diverse disciplines, and how they interact with each other, has yet to be created. This Special Issue is, to our knowledge, the first to address this by providing researchers with a broad picture of how the medium of sound can be utilized to enrich and refine interactions between robots and humans. The issue is meant for social robotics researchers and practitioners across academia and industry, both as an introduction for readers interested in entering the field, as well as a summary of findings and best practices for readers with prior experience. By showcasing the various ways in which sound informs, influences, and engages people, we aim to provide readers with new ways of using this modality to create richer, more nuanced HRIs.
2 Objective
When considering the use of sound in HRI, it is important to note that (i) some form of sound is always present during interactions, i.e., due to their material presence robots are not silent, and (ii) listeners continuously interpret and make sense of sound in some way. Sound can provide important insights both for robots and their operators: Humans draw on auditory cues when making assumptions about a robot’s characteristics, intentions, and capabilities. These cues occur not in isolation but are also tightly coupled to a robot’s body and movement, both of which are equally perceptually relevant. Designers and researchers should, therefore, carefully consider the soundscapes accompanying the HRIs they are designing and ensure that any information conveyed through this channel is transmitted deliberately.
In light of these considerations, this Special Issue aims to support HRI researchers adopting a holistic perspective on sound in HRI, and making thoughtful and deliberate choices on how the medium of sound is used in their work. We argue that a well-considered utilization of sound will result in more successful designs than an ill-considered one that does not take into account the wide array of challenges and opportunities that this modality offers. The Special Issue pursues three objectives: First, to provide the reader with a comprehensive overview of how sound is, and could be applied in HRI research. Second, to present a collection of studies exploring the medium of sound in a multitude of application contexts, from the sonification of humanoid robots and drones, across the timing and timbre of utterances, to the extraction of relevant information from background noise. Third, to showcase a cross-section of the field’s interdisciplinary research pursuits, providing a collective resource of research directions, methodologies, and use cases that will inspire future work in the field.
3 The Articles
This Special Issue contains 11 articles investigating sound in HRI. Two of them provide the reader with an overview of the field and present comprehensive conceptual frameworks that can be used to gain a holistic perspective of the modality. Another two articles showcase how a robot can extract relevant information from its sonic environment through the detection of social presence and the use of spatial audio in teleoperation contexts. Four articles present robot sonification strategies for drones, robot swarms, robotic arms, and humanoid robots, with goals including the conveyance of emotion and the creation of a general sonic presence. The final three articles refine robot communication by embedding additional information within it, specifically exploring the effects of timing and timbre of utterances. The articles can be summarized as follows.
Brian J. Zhang and Naomi T. Fitter’s “
Nonverbal Sound in HRI: a Systematic Review” presents an overview of the field through a systematic review of 148 peer-reviewed articles to identify and compare different use cases and approaches for incorporating nonverbal sound in HRI. This work results in two taxonomies and recommendations for the design, generation, and validation of nonverbal robot sound.
Focusing on sound perception, two articles look at refining a robot’s and robot operator’s awareness of its sonic environment. Nicholas Georgiou and colleagues in their article “
Is Someone There Or Is That The TV? Detecting Social Presence Using Sound” investigates the use of sound-based classification to improve interaction capabilities of social robots in a home environment, distinguishing between natural conversation from users and speech content from media playback. Utilizing a unique dataset of different acoustic environments, the authors assessed various machine learning classifiers and found the C-
Support Vector Classification (
SVC) algorithm to be most effective, ultimately proposing a sound classification pipeline for home robots to better engage with users. Adam K. Coyne and colleagues in their article “
“Who said that?” Applying the Situation Awareness Global Assessment Technique to Social Telepresence” introduces an objective measurement, based on the
Situation Awareness Global Assessment Technique (
SAGAT), to evaluate operator situation awareness during the teleoperation of social robots. They apply this technique to evaluate the impact of different audio feedback on operator situation awareness. While the initial study showed no significant differences between mono- and binaural audio feedback, correlations among measures were noted, indicating the potential for developing specialized assessment techniques for social situation awareness in teleoperated robots and guiding future robot design decisions.
Focusing on sound generation, Bastian Orthmann and colleagues’ “
Sounding Robots: Design and Evaluation of Auditory Displays for Unintentional HRI” works towards a holistic framework of implicit robot communication by presenting a multi-layer sound-based classification system designed to communicate various states and intentions of robots in a shared space with humans, such as urgency, availability, and directionality. Through a series of online studies, they found that the created sounds were generally recognized as intended by participants, suggesting that sounds can be an effective tool for intuitive and implicit communication of robot states and intended actions.
Four articles present sonification strategies for a range of robotic agents. Ziming Wang and colleagues’ “
The Effects of Natural Sounds and Proxemic Distances on the Perception of a Noisy Domestic Flying Robot” investigates the impact of adding natural sounds, specifically birdsong and rain, to the consequential noise of domestic drones during close-range interactions with humans, with a mixed-methods study showing that these sounds and proxemic distances significantly influence user perceptions. Moreover, findings show that user perceptions are also strongly influenced by their past experiences, leading to six concrete design recommendations for sound implementation in domestic drones. Elias Naphausen and colleagues’ “
New Design Potentials of Non-mimetic Sonification in HRI” details a project exploring the potential of non-mimetic sonification (using sound properties like pitch, volume, and timbre) to improve HRI, leveraging data from a 7-axis manipulator to create an augmented audible presence and enable new forms of interaction. It presents research parameters, an empirical study setup, and potential implications of integrating these sonification findings into a unified HRI process, particularly focusing on the interplay between machinic and auditory dimensionality. Adrian B. Latupeirissa and colleagues’ “
Probing Aesthetics Strategies for Robot Sound: Complexity and Materiality in Movement Sonification” explores the aesthetic impact of the sonification of a Pepper robot’s movements across three studies using two sets of sound models. Findings suggest that participants preferred more complex sound models and subtle sounds that blend well with ambient noise, with sound preferences influenced by the context in which the robot-generated sounds were experienced. Maria Mannone and colleagues’ “
The Sound of Swarm. Auditory Description of Swarm Robotic Movements” presents a theoretical framework linking musical parameters (pitch, timbre, loudness, and articulation) with robotic parameters (position, identity, motor status, and sensor status) to facilitate communication within a robotic swarm through sound. Utilizing Hilbert spaces, the framework enables quantum representations of musical states, presenting potential applications through case studies involving simulated scenarios with robo-caterpillars, robo-ants, and robo-fish.
Finally, two articles take inspiration from how humans use vocal sounds and explore the design of robot utterances. Xiaozhen Liu and colleagues’
“Robots” “Woohoo” and “Argh” Can Enhance Users’ Emotional and Social Perceptions: An Exploratory Study on Non-Lexical Vocalizations and Non-Linguistic Sounds ’ investigates how sounds can convey basic emotions in a humanoid robot, Pepper, and how this influences user perception. The article explores the interplay between different vocalizations, non-linguistic utterances, and robot gestures. Findings indicate that vocalizations produced with a natural voice resulted in significantly higher emotion recognition accuracy and induced higher trust, naturalness, and preference, while musical sounds showed lower perception ratings. Kerstin Fischer and Oliver Niebuhr’s “
Which Voice for which Robot? Designing Robot Voices that Indicate Robot Size” investigates the correlation between human voice acoustic parameters and body size, with an aim to design suitable voices for robots of varying sizes. It discovers certain acoustic features significantly linked with body height and weight, which when applied to robotic speech, can reliably cue a listener to the perceived size of the robot.
We are delighted to highlight the breadth and diversity of disciplines, methodologies, and application contexts represented in this Special Issue on Sound in HRI. The included articles reflect the rich tapestry of this multidisciplinary field and showcase examples of the work currently being done on robot sound perception and production for different platforms and contexts. We would like to express our sincere appreciation to the reviewers who have dedicated their time and expertise to ensuring the quality of this publication. Their rigorous evaluation and insightful feedback have significantly contributed to this Special Issue. It is our hope that this collection of work offers an insightful and inspiring overview of this exciting area of study that can serve both as an introduction to the topic of Sound in HRI and as a resource for identifying new research directions and fostering interdisciplinary collaborations.