1 Introduction
The robotics industry is in the middle of a technological renaissance as it transitions from the traditional application domain of highly repetitive, structured manufacturing, to the realm of adaptive and agile utility in sectors such as medicine, service, defense, and entertainment. Great leaps of robot capabilities—driven by advances in machine learning, sensing, computational power, and material designs—promise to herald in a technological age in which robots are ubiquitous and integrated into the fabrics of society. However, despite significant advancements in robotic capabilities, technologies like artificial intelligence, and the fundamental understanding of human factors and usability, the adoption of robots in society is still largely relegated to applications in which there is virtually no interaction between humans and machines. While the “cutting edge” in robotics research is making rapid advancement, the state-of-practice lags further behind. The adoption of these advanced technologies is slowed by the accompanied uncertainty inherent with anything new and unproven. Early adopters bear the burden of discovering and exposing the systems’ strengths and weaknesses, while also assuming significant financial risk. Few are willing to assume the role of test subject.
Although robotics research necessitates testing to evaluate the performance of novel algorithms and designs, the test methods and metrics used in the literature are often selected specifically to highlight performance along a narrow set of criteria to make the research look as good as possible. As such, the performance reported by researchers should often be considered a “best case” example and not necessarily indicative of the performance of the system on average. In the case of human-robot interaction (HRI) research, the technologies are often tailored to address specific challenges, for specific demographics, in specific applications, and under specific circumstances. Moreover, the metrics selected are tailored to these unique combinations of factors. The result is that both the technologies developed and the resulting lessons learned are generally not readily extrapolated to other domains or scenarios.
Ultimately, this inability to broadly apply systems, test methods, metrics, and measurement results has a negative impact on the maturation of the HRI field of research. Incompatibilities between HRI research in the various application domains (education, child care, industry, medicine, service, etc.) and subcategories of research (mechanical design, interfaces, safety, human factors, etc.) limit collaborations and “big picture” revelations. As a result, HRI as a field of research is disjoint, with the siloed impacts of groundbreaking work extending no further than to a niche collective of like-minded scientists. The importance of the impetuses, designs, and measurement results is lost in the noise of a broad research community talking to the ether.
The purpose of this special issue of the ACM Transactions on Human-Robot Interaction (THRI) is to address this lack of cohesion by drawing specific attention to the means by which different fields approach the measurement of HRI performance. With researchers leveraging custom-built, and largely incompatible, test methods and metrics, comparing technologies and scientific results from Domain A with Domain B is impossible. This collection of research articles is intended to highlight the test methods and metrics used within the HRI field as a whole, to specifically draw attention to the factors that are deemed significant and the mechanisms by which said factors are assessed. A broad spectrum of application domains encompass the topic of HRI teaming, and special attention is paid to those test methods that are broadly applicable across multiple domains. These domains include medical, field, service, personal care, and manufacturing applications. This curated collection of articles highlights the metrics used for addressing HRI metrology and identifying the underlying issues of traceability, objective repeatability and reproducibility, benchmarking, and transparency in HRI.
2 A Trend in Anti-trends
The impact of technology in collaborative human-robot teams is both driven and limited by its performance and ease of use. However, the means by which performance can be measured has not kept pace with the rapid evolution of HRI technologies. The resulting situation is one in which people expect more from robots but have relatively few mechanisms by which they can assess the market when making purchasing decisions, or integrating the systems already acquired. As such, robots specifically intended to interact with people are frequently met with enthusiasm but ultimately fall short of expectations. Thus, robots designed to work with people—and not merely working around them—are often relegated to being novelties with little lasting legacy [Eisenmann
2021].
Ultimately, the reason behind this is a fundamental lack of trust in the capabilities of the robotic systems.
HRI research is focused on developing new and better theories, algorithms, and hardware specifically intended to push innovation. Yet, determining whether these advances are, indeed, actually driving technology forward is a particular challenge. Few repeatability studies are ever performed, and the test methods and metrics used to demonstrate effectiveness and efficiency are often based on subjectively qualitative measures for which all external factors may not necessarily be accounted; or, worse, they may be based on measures that are specifically chosen to highlight the strengths of new approaches without also exposing the limitations [Zimmerman et al.
2022]. As such, despite the rapid progression of HRI technology in the research realm, advances in applied robotics lag behind. Without verification and validation, the gap between the cutting edge and the state-of-practice is destined to expand.
The necessity for validated test methods and metrics for HRI is driven by the desire for repeatable, consistent, and informative evaluations of HRI methodologies to demonstrably prove functionality. Such evaluations are critical for advancing the underlying models of HRI and for providing guidance to developers and consumers of HRI technologies to meter expectations while promoting adoption. Being able to succinctly describe the capabilities of collaborative robot systems, to benchmark their functionality and performance, and to provide a basis for would-be consumers to compare the myriad of products and select solutions that meet their needs is ultimately integral to the real-world success of HRI. With metrics constantly being defined, redefined, and abandoned altogether, verifying and validating HRI solutions is a constantly moving target.
3 Moving Beyond the Laboratory
The global use of robotics is rapidly increasing, and is expected to continue growing exponentially through the coming years [International Federation of Robotics (IFR)
2021a;,
2021b]. In the manufacturing domain, for example, new “collaborative” robots are introduced to the market every year, new startups are created to address niche market issues, and new definitions of what makes a robot “collaborative,” “user friendly,” and “interactive” are introduced. This leads to confusion in the marketplace and makes it difficult for end-users to make informed decisions about which robotic technologies they are purchasing.
Similarly, in the consumer market, the world has witnessed the introduction of social robots specifically intended to entertain, inform, and be integrated into the family home. These technologies are both innovative and enthralling, and they garner considerable attention from the media and consumer base. Once the novelty factor wears off, however, the questions remain: How effective are these HRI technologies at achieving their stated goals, and are these goals the ones that actually need to be met?
Often, the consumers of HRI technologies do not actually know what capabilities these robots have, what their own needs are relative to the stated purpose of the robots, or how to assess whether the robots in which they are investing meet those needs. In addition, these technologies may be expensive and/or require additional training before use, time that may be wasted if the technology is found to be unsuited for the consumer’s needs. Few are able to take the risk of investing in unproven robotic technologies, largely due to factors such as cost, unproven performance, and the real potential for negatively impacting production, customer service, and customer well-being. As such, the utility of HRI solutions is largely relegated to markets that are already actively engaged, while new markets are hesitant to invest.
Given the diversity of robot designs, software, applications, and users, establishing a common and comprehensive test methodology is challenging. However, commonalities and analogues exist between application domains, and these intersections are useful for establishing a preliminary standardized test methodology for assessing and assuring HRI performance. Such test methods help guide developers of HRI technologies by providing mechanisms by which they can evaluate their robots, report the robots’ capabilities and performance, and establish benchmark performance to drive improvements and innovation. Likewise, these metrics would be used by consumers to better understand the capabilities of the robots that they are buying, and establish means by which they can evaluate whether said robots are sufficient for their needs.
Given the propensity of experimental systems to fail in unexpected ways once deployed in the real world, it is little wonder why broad investment in and success of HRI technologies is yet to manifest. Only when armed with verified and validated tools for proving functional performance, safety, and positive return on investment can the field of HRI begin to demonstrate significant growth outside of the laboratory environment. Until then, the gap between research and practice will only continue to grow.
4 Articles in This Special Issue
This special issue of ACM THRI presents a curated collection of articles that highlight different aspects of metrics and metrology for human-robot interaction research. This collection of manuscripts covers a broad scope of application domains, robot designs, and bases for assessment.
The first two articles, for example, focus on the efficacy of interfaces. The manuscript by Lin, Krishnan, and Li presents a hierarchical framework to evaluate robot interfaces in nursing applications. The article by Páez and Gonzalez, in contrast, focuses on the utility of robot interfacing to assess and optimize learning in educational environments.
The second pair of articles focus more on frameworks for simulating and evaluating conditions in HRI studies. The article by Fontaine and Nikolaidis presents an algorithm for generating failure scenarios for shared autonomy, while the article by Biswas et al. presents a simulation-based framework for presenting pedestrian movements for social interactions with mobile robots.
The third collection, consisting of four manuscripts, presents assessments of the impacts of robot actions and embodiment in human-robot teams. The article by Ferrier-Barbut et al. focuses on the effectiveness of training methodologies in co-manipulated laparoscopic surgery. Sharma and Vemuri evaluate the perception and acceptance of robots, largely based on uncanny valley theory. In Norton et al., a collection of intentional and directed test methods and metrics are presented to evaluate and communicate robot proficiency HRI teaming applications. While Ma et al. present a set of metrics for characterizing ecological and cognitive aspects of teamwork for spacecraft maintenance applications.
This then leads into the fourth set of articles, which focuses on technologies and frameworks for supporting designs of experiments in HRI research. Chan et al. present a series of experiments assessing the effectiveness of leveraging augmented reality interfaces in simulated manufacturing tasks. The work by Masumoto, Washburn, and Riek documents a framework that supports the designs of experiments for HRI research, particularly focusing on assisting researchers who may not necessarily be specialized in HRI studies. And Krausman et al. present a framework of metrology methods to support the measurement, development, and maintenance of trust in shared autonomy human-robot teams.
Finally, the article by Wang et al. presents an alternative mechanism to traditional HRI test methods by developing an evaluation framework centered around competitions. In this effort, the authors link the operators’ perceptions of the robots as compared with the robots’ actual performance.