Available online at www.sciencedirect.com
ScienceDirect
Cognitive Systems Research 43 (2017) 190–207
www.elsevier.com/locate/cogsys
Enabling robotic social intelligence by engineering human
social-cognitive mechanisms
Travis J. Wiltshire a,b, Samantha F. Warta a, Daniel Barber a, Stephen M. Fiore a,⇑
a
University of Central Florida, Orlando, FL, United States
b
University of Southern Denmark, Odense, Denmark
Received 28 March 2016; received in revised form 12 September 2016; accepted 20 September 2016
Available online 24 September 2016
Abstract
For effective human-robot interaction, we argue that robots must gain social-cognitive mechanisms that allow them to function naturally and intuitively during social interactions with humans. However, a lack of consensus on social cognitive processes poses a challenge for how to design such mechanisms for artificial cognitive systems. We discuss a recent integrative perspective of social cognition to
provide a systematic theoretical underpinning for computational instantiations of these mechanisms. We highlight several commitments
of our approach that we refer to as Engineering Human Social Cognition. We then provide a series of recommendations to facilitate the
development of the perceptual, motor, and cognitive architecture for this proposed artificial cognitive system in future work. For each
recommendation, we highlight their relation to the discussed social-cognitive mechanisms, provide the rationale for these recommendations and potential benefits, and detail examples of associated computational formalisms that could be leveraged to instantiate our recommendations. Overall, the goal of this paper is to outline an interdisciplinary and multi-theoretic approach to facilitate the design of
robots that will one day function, and be perceived, as socially interactive and effective teammates.
Ó 2016 Published by Elsevier B.V.
Keywords: Human-robot interaction; Social cognition; Artificial cognitive systems; Robotics; Interaction dynamics
1. Introduction
There is an increasing need to advance the state of the
art in Human-Robot Interaction (HRI) by transitioning
the human perceptions of robots such that they are viewed
as teammates, collaborators, or partners (e.g., Fiore, Elias,
Gallagher, & Jentsch, 2008; Hoffman & Breazeal, 2004;
Lackey, Barber, Reinerman, Badler, & Hudson, 2011;
Phillips, Ososky, Grove, & Jentsch, 2011; Warta, Kapalo,
Best, & Fiore, 2016). Greater consideration for the social
capabilities of robots is warranted to advance HRI. Sup-
⇑ Corresponding author at: 3100 Technology Parkway, Orlando, FL
32816, United States.
E-mail address: sfiore@ist.ucf.edu (S.M. Fiore).
http://dx.doi.org/10.1016/j.cogsys.2016.09.005
1389-0417/Ó 2016 Published by Elsevier B.V.
port for this claim comes in part from the 2013 Robotics
Roadmap, which specifically highlighted the need to
advance social aspects of robotics including social cognition and modeling (see Christensen et al., 2013).
While the study of robot social-cognitive mechanisms
has received less emphasis in HRI, when compared to
issues like autonomy, trust, and reliability (e.g., Hancock
et al., 2011; Lindblom & Andreasson, 2016; Sheridan,
2016; Tsarouchi, Makris, & Chryssolouris, 2016), support
for pursuing such efforts comes from the essentiality of
social cognition for HRI and human-robot teaming
(Christensen et al., 2013; Fiore et al., 2013; Warta et al.,
2016). In particular, such mechanisms will allow robots
to interact with humans in a more natural and intuitive
way (Breazeal, 2004). More pertinently, social-cognitive
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
mechanisms are necessary for effective human-robot coordination and cooperation, as they would afford a robot
the capacity to interpret the mental states (i.e., intentions,
emotions, beliefs, desires) of humans during socially interactive contexts and support the concurrent display of
appropriate behaviors (e.g., Vernon, 2014). In turn, such
mechanisms can enable robotic teammates to dynamically
work with humans toward shared goals (Hoffman &
Breazeal, 2004) and function adaptively in an inherently
information rich social environment (Dautenhahn,
Ogden, & Quick, 2002).
Some of the more well-known artificial general intelligence (AGI) models, or cognitive architectures, began as
a subset of theories before evolving into the AGI models
of today. It is in a similar fashion that we present our ideas
as an integration of theory and modeling recommendations
to advance the social capabilities of robots. Attempts to
develop integrated theories of human cognition that have
detailed functional mechanisms, and can be instantiated
in artificial systems, are often presented as cognitive architectures (e.g., Langley, Laird, & Rogers, 2009). These
computationally-based architectures serve as models for
artificial intelligence (AI) and detail the constructs incorporated within a given theory of human cognition. Several
different iterations of these cognitive architectures exist,
reflecting the diversity of perspectives regarding human
cognition and cognitive processes. Examples of these
AGI models include cognitive architectures like: the Learning Intelligent Distribution Agent (LIDA; Ramamurthy,
Baars, D’Mello, & Franklin, 2006), Soar (Laird, Newell,
& Rosenbloom, 1987), and the Adaptive Control of
Thought – Rational architecture (ACT-R; Anderson,
1993; Anderson et al., 2004).1 Each of these AGI models
are grounded in a series of conceptual commitments, which
allows for integration of the theoretical components into
computational mechanisms that characterize each model.
Toward a similar end, but focusing on the socialcognitive components of artificial intelligence, this paper
represents an elaboration of our previous work regarding
social-cognitive mechanisms in robots (Wiltshire, Barber,
& Fiore, 2013). This paper details progress on key components of our approach to engineering human social cognition and elaboration of each of the modeling
1
LIDA is considered a working model of cognition designed to imitate
human-like learning (i.e., episodic learning) as it utilizes artificial feelings
and emotions to learn to act in a human-like manner. On the other hand,
Soar represents a much less specialized form of cognitive architecture
designed to be capable of general intelligence, which is characterized by
processes such as, problem solving, learning, and intelligent behavior.
Conversely, ACT-R integrates theories of human cognition, visual
attention, and motor movement to enable researchers to create models
pertaining to various tasks (e.g., learning and memory, problem solving
and decision making, language and communication, perception and
attention) by supporting the collection and comparison of quantitative
measures (e.g., accuracy, time elapsed, neurological data) generated from
the ACT-R model and human participants (ACT-R Research Group,
2013; Anderson et al., 2004).
191
recommendations. While our aim is not to propose a new
AGI system per se, we are proposing a systematic set of
conceptual commitments that we hope aid in the development of socially intelligent machines.
As such, the first section of our paper will introduce
human social cognition and its associated theoretical mechanisms. In the second section, relevant sub-components of
cognition are defined in further detail to emphasize the significance of their role in our approach to engineering
human social cognition and shed light on the importance
of their inclusion in robotic design. From this, the third
section will specify the framework of recommendations
we make based on numerous disciplines including HRI,
philosophy, psychology, robotics, and neuroscience. For
these, we draw from theories of embodied cognition,
dual-process theory, ecological psychology, and dynamical
systems theory. Our recommendations do not just represent singular ideas from each of these areas, but instead
outline a hierarchical approach to modeling socialcognitive mechanisms in robots by synthesizing ideas
across disciplinary perspectives to facilitate their future
instantiation. As a final point, we review potential next
steps toward engineering human social cognition in robots
and expound on future research opportunities.
2. Human social cognition
Social cognition encompasses the perceptions, actions,
and cognitive processes involved in the observation of,
and interaction with, others (Frith & Frith, 2007, 2008,
2012). Research in social cognition is primarily concerned
with the mechanisms through which humans are able to
understand their own and others’ mental states (e.g., intentions, emotions, beliefs, desires). An understanding of mental states facilitates the explanation and prediction of the
behavior of others and, in turn, the ability to act accordingly. This is often termed theory of mind, mentalizing,
or mindreading. Whereas some approaches argue social
cognition is primarily for figuring out the minds of others,
other approaches suggest it is for interacting with others
(e.g., Di Paolo & De Jaegher, 2012) and shaping our relationships with them (Bohl, 2015; Fiske & Haslam, 1996).
Thus, there is currently a lack of consensus on the specific
functions served by social cognition.
In part, this lack of consensus is due to a tendency for
research to overlook the distinction between online versus
offline social cognition. Online social cognition is characterized by actual social interactions where a reciprocal
exchange of behaviors between two or more agents leads
to a recursive change in each agent’s mental states and in
turn, their behaviors (Przyrembel, Smallwood, Pauen, &
Singer, 2012). Offline social cognition is characterized by
cases of passive observation where one agent is merely
observing another (Ibid.). This distinction is relevant in
that recent research has shown support for distinct neural
activation during online versus offline social cognition
(e.g., Tylén, Allen, Hunter, & Roepstorff, 2012). As such,
192
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
there may also be distinct social cognitive mechanisms at
play in service of the different functions needed for these
types of interactions. Further, both behavioral and neuroscientific research in social cognition has tended to study
offline social cognition (e.g., Przyrembel et al., 2012;
Schilbach, 2014). Therefore, this distinction between online
versus offline social cognition can lend precision to
approaches for developing artificial social intelligence (see
also Pezzulo, 2012; Wiltshire, Lobato, Jentsch, & Fiore,
2013).
An important area of research in this domain is debate
about the basic fundamental mechanisms of social cognition. One of the first posited mechanisms is espoused by
the Theory of Mind (ToM) approach, which asserts that
mental states are primarily understood through the use of
theoretical inference mechanisms (e.g., Gopnik &
Wellman, 1992; Premack & Woodruff, 1978). According
to this theory, given some perceptual social information,
for example, the social cues of a furrowed brow and pursed
lips, an inferential mechanism is required to probabilistically determine the mental state of the person displaying
those cues (e.g., anger). Another proposed mechanism is
perceptual-motor simulation routines (Blakemore &
Decety, 2001; Goldman, 2006). In this case, upon encountering the aforementioned cues, an individual would
employ a simulation mechanism to assess the mental state
they experience under conditions where they displayed
those observable cues and, from this, attribute that state
to the other person. Lastly, in cases of actual social interaction (i.e., during online social cognition), a direct perception mechanism posits that there is enough information
afforded by the embodiment of an interactor that their
mental states can be understood without the need for inferential or simulative mechanisms (De Jaegher, 2009; De
Jaegher & Di Paolo, 2007; Gallagher, 2007, 2008;
Gangohopadhyay & Schilbach, 2011; Wiltshire, Lobato,
McConnell, & Fiore, 2015).
In short, there is a rich history of theoretical and empirical research in social cognition, with various nuanced
associations and commitments aligned with differing
approaches (see Table 1 for an overview). And, this continues to be an active area of research. We provided this brief
summary of key social-cognitive mechanisms to provide
initial conceptual grounding for our discussion of how to
conceptualize engineering social cognition in artificial systems. In addition to the references provided in Table 1, a
number of integrative reviews and meta-analyses have also
explored the various positions in social cognition (Adolphs,
1999; Bodenhausen & Todd, 2010; Bohl & van den Bos,
2012; Fiske & Haslam, 1996; Macrae & Bodenhausen,
2000; Schilbach et al., 2013; Van Overwalle, 2009;
Wiltshire et al., 2015).
Given the complex nature of social cognition, we next
suggest that explanatory pluralism is needed in this area
of inquiry (e.g., Dale, 2008; Dale, Dietrich, & Chemero,
2009). Specifically, we argue that a better understanding
of the mechanisms of social cognition might be obtainable
through the integration of competing theories into metalevel frameworks that sustain the co-existence of each
(Dale, 2008). With the growing body of evidence for the
various proposed mechanisms, it seems increasingly likely
that humans may have the capacity to employ each of them
depending upon the social cognitive functions needed (see
Wiltshire et al., 2015). However, with so many diverse perspectives, much work remains to integrate these mechanisms and empirically demonstrate the contexts and
situations in which one mechanism would be adopted as
opposed to another.
For instance, in economic exchange situations, where
decision-making has financial consequences, ToM mechanisms endow an individual with the ability to recognize
others with whom a trusting relationship can be formed
(McCabe, Houser, Ryan, Smith, & Trouard, 2001). Additionally, ToM mechanisms assist in the detection of cheaters through recognition of (non)cooperative or deceptive
cues (Dunbar, 1998). Relatedly, direct perception would
serve to prime and potentially reinforce ToM mechanisms
in that an individual would be able to see cooperative or
competitive intentions simply by observing kinematic
movement (Sartori, Becchio, & Castiello, 2011). Simulation
mechanisms would enable an individual to model and
anticipate the behavior of others across multiple outcomes
(Kourtis, Sebanz, & Knoblich, 2010). In the case of social
situations, for example, this would allow for prediction of
friendly or threatening behavior.
Recently, dual-process theorizing was used to integrate
the distinctions between online/offline cognition and the
varied findings and proposed mechanisms associated with
theory of mind, direct perception, and simulation routines
(Bohl & van den Bos, 2012). Dual-process theories of cognition are supported by decades of research in social and
cognitive psychology, and, more recently, by findings in
cognitive neuroscience. Overall, this research provides evidence for two distinct types of cognitive processes that
adhere to different neural pathways and that are evident
at both the neural and functional levels (e.g., Bargh,
1984; Chaiken & Trope, 1999; Satpute & Lieberman,
2006). Type 1 (T1) processes are often characterized as
implicit, automatic, reflexive, and stimulus-driven with primacy assigned to action and thus, qualify as online (cf.
Wilson, 2002). Contrarily, Type 2 (T2) processes are explicit, controlled, reflective, and characteristically off-line
(e.g., Bohl & van den Bos, 2012; Chaiken & Trope,
1999). While the overlap here between T1 processes with
online social cognition and T2 processes with offline social
cognition is clear, the relationship between dual-process
theory and the aforementioned social-cognitive mechanisms is less so (cf. Wilkinson & Ball, 2013).
Within Bohl and van den Bos’ (2012) integrative framework, T1 processes align with the direct perception mechanism of social cognition whereby mirror neurons and other
sensory-motor areas contribute to the understanding of
others’ mental states. Several researchers argue that this
type of mechanism allows for rapid access to the inten-
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
193
Table 1
Social cognition and its mechanisms.
Social Cognition
Mechanism
Theory of Mind (i.e.,
mentalizing, mindreading)
Perceptual-motor Simulation
Routines
Direct Perception
Definition
References
The perceptions, actions, and cognitive processes involved
in observing and understanding others as well as interacting
with others and shaping our relationships with them
Bohl (2015), Di Paolo, and De Jaegher (2012), Fiske and
Haslam (1996), and Frith and Frith (2007, 2008, 2012)
Theoretical inference mechanisms use social information to
probabilistically determine mental states
Simulation mechanisms enable the modeling and
attribution of a mental state based upon the social cues
exhibited
Mental states can be perceived through the embodiment of
the interactor without additional cognitive mechanisms
Gopnik and Wellman (1992) and Premack and Woodruff
(1978)
Blakemore and Decety (2001) and Goldman (2006)
tional and affective states of other agents (e.g., De Jaegher,
2009; Gallagher, 2008). T2 processes, in this account, align
with both the inferential and simulative inference mechanisms of social cognition, which are supported by distinct
regions that have been referred to as the Theory of Mind
system (see also the X system in Satpute & Lieberman,
2006). Both types of processes are interdependent, but
the general function and form of each is distinct. What is
also worth emphasizing is that actual social interaction is
complex and can be characterized by a number of dimensions; verbal and nonverbal behavior, varying contexts,
quantity of participants, and strict timing demands for
reciprocal and joint activity (De Jaegher, Di Paolo, &
Gallagher, 2010). Thus, both T1 and T2 processes, and
their underpinning mechanisms, are required to successfully navigate the complex social environment.
We have only briefly outlined the theorizing on mechanisms and functions of social cognition. But the aforementioned ideas provide the foundation on which we build our
approach for developing artificial social cognition. We next
detail the theoretical underpinnings for defining our
approach to Engineering Human Social Cognition such that
the design of machines capable of engaging in actual social
interaction and collaborative teamwork with humans will
one day become a reality.
3. Engineering human social cognition
In this section, we advance and extend our approach termed Engineering Human Social Cognition (EHSC), which
aims to leverage an interdisciplinary and multi-level understanding of human social-cognitive processes for the development and design of robotic systems that possess social
intelligence (Streater, Bockelman Morrow, & Fiore, 2012;
Wiltshire, Barber et al., 2013). Our goal with EHSC is to
address a number of longstanding problems in humanrobot and human-machine interaction such that robots
and machines can interact more naturally with people as
an effective and collaborative teammate (e.g., Fiore et al.,
2011; Wiltshire, Smith, & Keebler, 2013). For example,
EHSC can contribute to the development of trustworthy
De Jaegher and Di Paolo (2007), De Jaegher (2009),
Gallagher (2007, 2008), Gangohopadhyay and Schilbach
(2011), and Wiltshire et al. (2015)
human-machine social interfaces (Atkinson & Clark,
2013), and also provide robots with the capacity to convey
important aspects of their status and intentions while also
being able to interpret the intentions of their human teammates (Klein, Woods, Bradshaw, Hoffman, & Feltovich,
2004). Similarly, EHSC can aid development of agents capable of playing the role of teammate in the context of complex collaborative problem solving (Fiore et al., 2008;
Wiltshire & Fiore, 2014). This approach aims to support
inter-predictability between robots and human teammates
and, thus, enable effective human-robot coordination
(e.g., Bradshaw et al., 2008; Bradshaw et al., 2009).
The EHSC approach is comprised of four key components crucial to robotic design. The combination of these
will facilitate the transition of robotic agents perceived as
tools to robotic agents perceived as teammates. First, the
field of social signal processing (SSP; e.g., Vinciarelli
et al., 2012) lays the foundation for the creation of machines that can better understand humans and in turn, communicate more effectively with them. Our approach
integrates SSP with dual-processing theories to account
for the functional differences of T1 and T2 processes at
play during social interaction. Second, our emphasis on
social robotics (i.e., robots as socially interactive agents)
builds upon and incorporates the mechanisms enabled by
SSP to allow for more natural and automatic interaction
in HRIs. Here, we have a specific focus on the processes
that motivate the verbal and nonverbal communicative
cues supporting interaction. Third, to achieve more automatic interaction in HRI, we also attend to embodied cognition theory as this provides a more natural grounding for
the expression and comprehension of interactive behaviors
between humans and robots. Finally, the convergence of
these components will help lead to development of a more
autonomous robotic system, capable of coordinating with
teammates, and, to some degree, self-regulation of behaviors in the team context. This, we suggest, is crucial to
the goal of transitioning from tool to teammate. We next
elaborate on each of the key components of the
EHSC approach (see Table 2 for a definition of each
component).
194
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
Table 2
Components of the engineering human social cognition approach.
Component
Definition
Social Signal Processing
A multi-disciplinary field focused on developing social mechanisms for AI capable of interpreting social signals from
social cues
A social signal processing system that integrates mechanisms characteristic of T1 and T2 processes to respectively
enable more reflexive and analytic forms of artificial cognition
Physically embodied agents with a level of autonomy that enables meaningful interaction with humans
An interaction of the brain, body, and environment to influence cognitive processes
An independent, embodied system that operates without the need external control as it works towards its goals and
maintains itself
Dual Social Signal Processing
System
Socially Interactive Robots
Embodied Cognition
Autonomous Robotic System
3.1. Social signal processing
EHSC draws heavily from the multi-disciplinary field of
Social Signal Processing (SSP). EHSC shares SSP’s aim of
providing social mechanisms for computers that are able to
interpret high-level social signals (i.e., mental states) from
combinations of low-level social cues (see Vinciarelli et al.
(2012) for review). In this case, social cues are the physiological and observable activities that are apparent in a person or group of people, and social signals are meaningful
interpretations of these cues as a function of the mental
states attributed to said agents (see also Fiore et al.,
2013; Lobato, Warta, Wiltshire, & Fiore, 2015; Lobato,
Wiltshire, Hudak, & Fiore, 2014; Wiltshire, Lobato,
Wedell et al., 2013; Wiltshire et al., 2015; for more on this
distinction). The interpretation of social cues, in turn, can
be characterized by the type of cognitive process used
(i.e., T1 or T2) and the specific characteristics of the social
situation (see Wiltshire, Snow, Lobato, & Fiore, 2014).
Interestingly, the social cues and signals distinction applies
to both robots interpreting humans and humans interpreting robots capable of displaying social cues (e.g., DeSteno
et al., 2012; Fiore et al., 2013; Warta, 2015; Wiltshire,
Barber et al., 2013). Therefore, expanding upon the above
description of dual-process theory requires further explanation with regard to the dual way in which social-cognitive
processes occur in biological systems and ways they can
be implemented in artificial cognitive systems.
3.2. Dual paths for social signal processing
Mechanisms characteristic of T1 processes in humans
allow for a more direct understanding of, and automatic
interaction with, humans as well as the environment. Additionally, mechanisms characteristic of T2 processes in
humans allow for more complex and deliberate forms of
cognition, such as mental simulation and theoretical inferences that support the prediction and interpretation of
complex and novel social situations (see Bohl & van den
Bos, 2012; Wiltshire et al., 2015). For example, a robot
interacting with a teammate during situations with high
temporal demands can leverage T1 processes in service of
directly engaging in actions appropriate for the situation.
However, when temporal demand is lower, or if a robot
is taking on a passive role, such as during a surveillance
task, T2 processes would allow for more analytic forms
of cognition that may enable the robot and, in turn, its
teammates to better understand and predict the future
states of a situation. To the best of our knowledge, no such
approach currently exists that explicitly attempts to provide a foundation for creating a dual social signal processing system in an embodied robot.
3.3. Socially interactive robotic teammates
Central to our approach is an emphasis on social
robotics since we view this as an enabling factor in the
establishment of human-robot teams (Wiltshire, Smith
et al., 2013). We take the position that, for robots to be
teammates, they must, first and foremost, be socially interactive (e.g., Gallagher, 2007, 2013; Wiltshire, Barber et al.,
2013). We draw from recent neuro-philosophical accounts
that posit the primacy of social interaction for the emergence of more complex social-cognitive mechanisms (Di
Paolo & De Jaegher, 2012). With this in mind, social robots
are defined as: (a) physically embodied agents that, (b)
function at some level of autonomy, and (c) interact and
communicate with humans by (d) adhering to normative
and expected social behaviors (Bartneck & Forlizzi,
2004). Extending upon this definition, robots can be
described as socially interactive if they are able to: (a)
express or perceive emotions, (b) use high-level dialogue,
(c) learn about, and recognize, other agents, (d) establish
and maintain social relationships with humans, and (e)
use and perceive natural social cues (Fong, Nourbakhsh,
& Dautenhahn, 2003). Naturally, any single component
of these definitions poses a significant challenge to designers of robotic systems.
Given the above, we suggest that, in attempting to develop
socially interactive robots, a combinatorial approach is
required that ultimately focuses on enabling meaningful
interaction. To do so requires the understanding and integration of perceptual, motor, and cognitive factors. A key example of such an approach is outlined in Pezzulo’s (2012)
description of the Interaction Engine. In particular, Pezzulo
(2012) describes the design of a computational system that
not only accounts for the online and offline distinction
(though listed as observer versus actor), but also specifies
the tasks required for communication in increasingly complex interaction scenarios (see Table 3 for more details).
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
195
Table 3
Overview of social robotics interaction engine (adapted from Pezzulo, 2012).
Observer
Offline social cognition
Actor
Online social cognition
Scenario Type
Individual Scenario
Tasks of Perceptual Processes
Estimating the state of the observed system
Tasks of Action Processes
Achieving goals related to the environment
Interaction Scenario
Non-Communicative
Interaction Scenario
Communicative
Joint Action Scenario
Tasks of the Observer
Mindreading (estimating cognitive variables of
another agent)
Mindreading for recognizing communicative
intentions
Formation of shared representations
Tasks of the Actor
Achieving goals relative to another’s actions
Linguistic Scenario
Mindreading for recognizing communicative
intentions in speech acts
Built upon Levinson’s (2006) ideas, Pezzulo’s (2012)
Interaction Engine provides the foundational basis for
both linguistic and nonlinguistic interaction. There are several mechanisms comprising the Interaction Engine, but
the mindreading, communicative, and representationsharing mechanisms, three components grounded in predictive processes, play a fundamental role in meaningful
interaction. Where mindreading mechanisms enable abilities akin to a theory of mind, and allow an individual to
assess the beliefs and intentions of others, communicative
mechanisms accomplish interactional goals such as, manipulating
others’
beliefs
and
intentions.
Lastly,
representation-sharing mechanisms allow for conveying
mental models or representations to others, which in turn,
enhances the effectiveness of an individual’s actions when
operating alone in pursuit of an individual goal or coupled
with the actions of others in pursuit of a joint goal.
Broadly, approaches such as the Interaction Engine and
its mechanisms are applicable to a number of contexts
encountered in the social environment, namely observation
and interaction scenarios and in the case of mindreading
tasks, offer the promise of a greater understanding when
it comes to human social cognition and its instantiation
in artificial cognitive systems.
3.4. Embodied cognitive systems
Within embodied cognition, the brain, body, and environment interact in such a way to allow for the formation
of cognitive processes (Varela, Thompson, & Rosch, 1991).
In framing our approach, we suggest that robots be
designed with serious consideration of their embodiment.
Motivation for our commitment to embodiment, when
developing social-cognitive mechanisms, comes from work
in cognitive robotics through the idea that ‘‘when two cognitive systems interact or couple, the shared consensus of
meaning. . .will only be semantically similar if the experiences of the two systems are compatible” (Vernon, 2010,
p. 93). For example, spatial orientation concepts such as
up or down only have meaning if the system can directly
experience these concepts through its own physical body
Achieving goals relative to another’s internal variables
(changing mental states of other agents)
Joint action control (takes joint goal into
consideration, uses shared representation)
Using language to achieve goals relative to another’s
internal variables, common ground formation
(Lakoff & Johnson, 1980). There is increasing support for
the modeling of artificial cognitive systems, and social
robots, on commitments of embodied cognition across disciplines including AI, HRI, philosophy, psychology,
robotics, and neuroscience (e.g., Anderson, 2003;
Barsalou, 2008; Breazeal, Gray, & Berlin, 2009;
Chaminade & Cheng, 2009; Dautenhahn et al., 2002;
Fiore et al., 2008; Franklin, Strain, McCall, & Baars,
2013; Gallagher, 2007, 2013; Hoffman, 2012; Pezzulo
et al., 2013; Pfeifer, Lungarella, & Iida, 2007). Through
our commitment to embodied cognition, we mean that
the designers of artificial cognitive systems must account
for: (a) the environment or ecological niche and the associated physical laws in which (b) the body and morphological
structures of the agent are grounded, (c) the sensorimotor
couplings (i.e., the relations between sensors and effectors),
as a function of the agent’s morphology, that shapes the
dynamic interactions between the agent and the environment, and (d) the situatedness of the agents’ cognitive processes as a function of varying contexts (e.g., Pezzulo et al.,
2013; Pfeifer et al., 2007; Wiltshire, Barber et al., 2013).
This commitment to embodiment is central to social
cognition in humans (e.g., Barsalou, Niedenthal, Barbey,
& Ruppert, 2003; Bohl & van den Bos, 2012). Here, committing to an embodied view of social cognition is crucial,
especially when trying to instantiate social-cognitive mechanisms in robots. However, given that the intent is for
humans and robots to have the ability to collaborate with
each other, this commitment increases in importance. Put
in other words, if humans and robots are to understand
each other and work effectively together, the more similar
their embodiment is and, in turn, their cognition, the easier
it is for interaction between these biological and artificial
cognitive systems to occur (Vernon, 2010).
3.5. Autonomous robotic teammates
Given that one of the long-term goals for designers of
robotic teammates, and EHSC, is for robots to function
autonomously, it is essential to frame our approach with
a definition of autonomous agents. Autonomous agents
196
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
are characterized as an embodied system that is designed to
‘‘satisfy internal and external goals by its own actions while
in continuous long-term interaction with the environment
in which it is situated” (Beer, 1995, p. 173). Although there
are inherent challenges associated with automated systems
performing as team members (cf. Klein et al., 2004), we
submit that providing foundational social-cognitive mechanisms is a necessary step towards mitigating many of these
issues. Our aim with this paper is not to articulate the challenges that could be solved by the instantiation of such
mechanisms, but rather to provide an outline of such a
system.
We suggest that leveraging the efforts of SSP, incorporating dual-process theories of cognition, supporting social
interaction in the design of robotic agents, grounding
robotic design in an embodied approach to cognition,
and integrating the aforementioned in effort to strive for
autonomy in the way robots interact with people, will provide a roadmap to chart the path forward in EHSC. In
turn, we expect these concepts to improve human-robot
interaction as a function of robots being provided with necessary social-cognitive mechanisms (Wiltshire, Barber
et al., 2013). If these mechanisms can be computationally
modeled, we argue that it may be a step toward approximating the sophisticated and flexible nature of human
social-cognitive processes. It is our position that adopting
the factors associated with the EHSC approach would ultimately lead to, not only a more effective robot, but also a
more effective artificial teammate. Now that we have outlined the key components of EHSC, in the remainder our
paper we provide the recommendations for modeling an
artificial social-cognitive system based on our approach.
4. Recommendations for modeling social-cognitive
mechanisms in robots
The recommendations we next advance are drawn from
multiple disciplines including HRI, philosophy, psychology, robotics, and neuroscience, as well as from theories
of embodied cognition, dual-process theory, ecological psychology, and dynamical systems theory. We provide a
novel means for conceptualizing ideas across disciplines
and theories by organizing them into an integrated set of
hierarchical recommendations for engineering human
social cognition (i.e., an attempt toward the goals of
explanatory pluralism; see Dale, 2008; Dale et al., 2009).
Notably, we draw heavily from, and build upon, the ideas
detailed in the Computational Grounded Cognition approach
(Pezzulo et al., 2013), by enriching this account with a conceptualization for embodied cognition drawn from ecological psychology as well as human social cognition. At this
point, our goal is not to provide computational formalisms
as many of these are included in the references we cite.
Because design recommendations for artificial socialcognitive systems are sparse (see, however, Vernon,
2014), we provide interdisciplinary and multi-theoretic rec-
ommendations that can aid in the design and development
of robots that will one-day function, and be perceived, as
socially interactive and effective teammates. The focus of
these recommendations is on developing the perceptual,
motor, and cognitive architecture for such a system. Taken
together, these recommendations serve as an ambitious first
step toward improving human-robot interaction and
teaming.
The subsequent architectural design recommendations
are presented in the following manner; first, relevant background information and a brief description of the recommendation will be outlined to better orient and frame the
given rationale in light of the EHSC approach. Then, several example cases are provided to further illustrate the relevance and real-world application of the presented
recommendation. As such, future efforts can leverage these
recommendations to adopt a technical approach; therefore,
what is presented here is meant to be more illustrative than
exhaustive.
In Fig. 1, we provide a schematic to offer a visual frame
of reference for the recommendations that follow. While
Fig. 1 is a clear oversimplification, it is useful in conveying
several aspects related to the organization of our modeling
recommendations. Specifically, the arrows represent the
flow of social information through the system. In cases
where our recommendations are most associated with T1
processes, there is a more direct relationship between the
system’s receptors and its effectors. Some information must
be stored for use by the system (see recommendation 4) to
be used in more computationally intensive processing.
Thus, the left side of the figure represents the least computationally demanding recommendations and the right side
represents the most demanding. Note that there are plausible bidirectional and coupled relationships between modeling recommendations and components of the system (that
would be represented in part by recommendation 5); however, the figure was constructed in a parsimonious fashion
for legibility. Recommendation 5 is thus represented by the
arrows in the figure although in a simplified form. Fig. 1 is
best used in conjunction with the summary of the modeling
recommendations included in Table 4.
4.1. Leverage the ecological approach to robotics
Many extant computational approaches to embodied
cognition (e.g., Pezzulo et al., 2013) do not incorporate
insights from ecological psychology and its transitional
work in robotics, despite the fact that ecological psychology was fundamental to embodied approaches to cognition
(Wilson, 2002). In robotics, the ecological approach is
characterized by the following principles: (a) treatment of
the agent and the environment as a system, (b) behavior
emerges from the dynamics of the agent-environment system, (c) a direct coupling exists between perception and
action, (d) information for adaptive behavior is available
in the environment, and (e) an agent does not always need
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
197
Fig. 1. Schematic overview of the flow of social information and the relations between T1 and T2 processes, and the EHSC modeling recommendations.
The arrows represent recommendation 5.
to represent the environment in a centralized model
(Duchon, Kaelbling, & Warren, 1998; see also Gibson,
1979; Wiltshire, Barber et al., 2013).
Accordingly, we suggest that artificial cognitive systems
should be designed to provide the opportunity for direct
and non-representational interaction with the physical
and social environment (Wiltshire, Barber et al., 2013).
We argue that the rationale behind leveraging the ecological approach, in light of EHSC, is that it provides a framework to computationally instantiate T1 processes in
robots. That is, incorporating elements of this approach
provides the mechanisms for direct interaction with the
physical and social environment. Such an approach would
also align with the principle of ecological balance (Pfeifer &
Scheier, 1999), which relies on matching task environment
complexity to the proper embodied model. The potential
benefits of adhering to this recommendation are that it
may serve to minimize the expense of computational
resources and reduce latency in the behavioral responses
of the robot.
Example instantiations of efforts in robotics related to
this recommendation can be found in Brooks (1999) as well
as Duchon et al. (1998). Duchon et al. (1998) pioneered
ecological robotics and developed robotic systems that
could navigate the environment solely through optic flow
rather than construction of an internal model of the world.
Likewise, Brooks (1999) developed the subsumption architecture, where each control sub-system for the robot was
added to more basic sub-systems without interfering with
previous systems. This approach was also one of the first
efforts in robotics to emphasize direct perception-action
links, which provided real-time mechanisms for interaction
with the environment that did not rely on a centralized control model (Bermúdez, 2010). It is evident, however, that
both of these instantiations only emphasized interaction
with the physical environment. Therefore, efforts are
needed to develop mechanisms for interaction with the
social environment.
4.2. Utilize physical and social affordances
Drawing yet again on the ecological approach, we
emphasize the utilization of affordances. Affordances can
be defined as the directly perceivable opportunities for
198
Table 4
Modeling recommendations with potential computational formalisms and representative references.
Description
Rationale
Benefits
Example
instantiations
References
1. Leverage the ecological
approach to robotics
Design system to provide the
opportunity for direct and nonrepresentational interaction with
the physical and social
environment
Provides a framework for
instantiating T1 processes in
robots
To minimize the expense of
computational resources, reduce
latency in responses
Subsumption
architectures
Optical flow
Brooks (1999) and Duchon
et al. (1998)
2. Utilize physical and social
affordances
Design system for detecting
affordances - directly perceivable
opportunities for action or
interaction arising between the
agent and environment across
objects, substances, surfaces, and
other agents
Provides robot with more direct or
T1 processes for action and
interaction
Robot behavioral control
mechanisms that specify the
relations between the robot and
the environment, physical objects,
human interactors
Visuo-spatial perspective taking,
effort and affordance analysis
Multiperspectival
affordance control
mechanisms
(effect,
(entity
behavior))
Hafner and Kaplan (2008),
Pandey and Alami (2012),
S
ß ahin et al. (2007), and Uyanik
et al. (2013)
3. Incorporate analysis of
interaction dynamics
System should be designed with
mechanisms for analyzing as well
as synchronizing and
complementing emergent
interaction dynamics
Could provide a T1 mechanism
for analysis of interaction
dynamics unfolding between robot
and human teammates
Improvements in human-robot
and robot-robot coordination
required for joint actions
Dynamical
systems
modeling
techniques
HKB circuits and
equations
Ansermin et al. (2016), Beer
(1995), Kelso et al. (2009),
Marsh et al. (2009), and Treur
(2013)
4. Instantiate modal
perceptual and motor
representations
To the extent that the robot
cannot rely on nonrepresentational interaction with
the physical and social
environment, the embodied
cognitive system of a social robot
must rely on multi-modal sensory
and motor representations
Commitment of Grounded
Computational Cognition – that
incoming information from the
robot’s sensors must be
represented in a form that is
linked to its modality (visual,
auditory, etc.)
Provides the foundation from
which a robot could begin to
manipulate modal perceptions to
not only interpret a human
teammate, but also to form
concepts, memories, and make
decisions
Afferent
and
Efferent modality
streams,
nodes,
action networks
Breazeal et al. (2009), Hoffman
(2012), and Pezzulo et al.
(2013)
5. Couple perception, action,
and cognition
Modal representations require
integration and association with
one another to couple perceptual
sensors with motor effectors
Help enable both T1 and T2
processes; provide a basis for
perceptual learning
May lead to more fluent
interaction between robots and
human teammates
Convergence
zones
Action-perception
activation
networks
Brooks (1999), Damasio
(1989), Hoffman (2012), and
Simmons and Barsalou (2003)
6. Provide motor and
perceptual resonance
mechanisms
System should be designed with
motor and perceptual resonance
mechanisms that allow for
recursive entrainment and
behavior matching
Provide T1 motor and perceptual
resonance mechanisms akin to
those elicited in humans by the
mirror neuron system
Bi-directionality allows for robots
to engage in social interaction
more similarly to those occurring
in humans would likely: (a)
improve overall social
competence, (b) contribute to a
better understanding of human
teammates’ intentions, and (c)
lead to more coordinative joint
actions
Mirror Circuits
Hebbian learning
networks associating visual and
motor
representations
Amit and Mataric (2002),
Barakova and Lourens (2009),
Billard (2002). Chaminade and
Cheng (2009), Elias (2008), Ito
and Tani (2004), and SchützBosbach and Prinz (2007)
(continued on next page)
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
Modeling recommendations
Hoffman (2012), Hoffman and
Breazeal (2010), and SchützBosbach and Prinz (2007)
Markov-chain
Bayesian anticipatory simulations
Intermodal Hebbian reinforcing
Weighted feature
maps
‘‘Simulation-based top-down
perceptual biasing may specifically
be the key to more fluent
coordination between humans and
robots working together in a
socially structured interaction”
(Hoffman, 2012)
Constitutes an essential link
between T1 and T2 processes that
stimulates and triggers the motor
system towards the selection of
appropriate actions while also
informing direct perception-action
links
Breazeal et al. (2009), Johnson
and Demiris (2005), Pezzulo
(2012), and Vernon (2010)
Generation and
simulation modes
Dynamic Bayesian Networks
Provide the robot with a key
mechanism for engaging in mental
state attributions of others,
explaining current events,
predicting future events, and
imagining new events
Instantiate a T2 simulation/
inference process for going beyond
information directly available in
the environment
Description
Provide a simulation mechanism
that allows for the ‘‘reenactment
of perceptual, motor, and
introspective states acquired
during experience with the world,
body, and mind” (Barsalou, 2008)
System should be designed to
provide a means for a robot to
predict future states in service of
anticipating and thus engaging in
coordinative actions with the
physical and social environment
Modeling recommendations
7. Abstract from modal
experiences
8. Leverage simulation-based
top-down perceptual
biasing
Table 4 (continued)
Example
instantiations
Benefits
Rationale
References
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
199
action or interaction arising between the agent and environment where the number of objects, substances, surfaces,
and other agents comprise the interaction (Chemero, 2003;
Gibson, 1979). Direct perception in the ecological sense
typically means perception that does not rely on mental
representations (Chemero, 2009, 2013), an interpretation
echoed in the direct perception approaches to social cognition (see De Jaegher, 2009; De Jaegher & Di Paolo, 2007;
Gallagher, 2007, 2008; Gangohopadhyay & Schilbach,
2011). We have advanced the notion of ‘‘social affordances” to describe how physically embodied cues experienced during interaction are sometimes enough to make
rapid mental state attributions (Best, Warta, Kapalo, &
Fiore, 2016). As such, affordances regarding both the physical and social environment are dependent upon the appropriate structuring of the environment and the agents in that
environment. The combination of these provides information perceivable to an agent through its optic array but is
dependent upon the form of embodiment the agent maintains and the sensory modalities available to it (e.g.,
Chemero, 2003; Kono, 2009). Incorporating affordances
in a robotic system is significant in that such integration
is essential to providing a robot with direct or T1 processes
for interaction, which offers the underlying rationale for
this recommendation. The potential benefit of doing so will
be a positive influence on the behavioral control mechanisms in a robot such that it will be possible to specify
the relations between the robot, the environment, physical
objects, and human interactors (e.g., S
ß ahin, Çakmak,
Doğar, Uğur, & Üçolik, 2007), which may allow for more
fluent interaction.
As far as example instantiations of affordances are concerned, mechanisms for control of autonomous robots
have been utilized based on affordances that specify relations between the agent and the environment and have proven useful for navigation through a physical environment
(e.g., S
ß ahin et al., 2007). Much of the affordance-based
efforts in robotics are primarily of this nature; that is,
focused on the physical environment with little emphasis
on the social environment. Although not clearly following
an ecological approach, Pandey and Alami (2012) developed a system to allow for a set of complex socialcognitive behaviors that includes mechanisms for the analysis of affordances not only between an agent, objects, and
the environment, but also between multiple agents. They
highlighted three key capabilities – visuo-spatial perspective taking, effort analysis, and affordance analysis – as necessary for providing a robot with social-cognitive
mechanisms. In combination, these mechanisms allow a
robot to interpret what a teammate visually perceives at
a given point in time, determine the amount of effort
required by a teammate to execute a certain task given
the current situation, positioning and morphology of a
teammate’s body, and then identify opportunities for
action existing between agent-agent, agent-location,
agent-object, and object-agent (see Pandey and Alami
200
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
(2012) for details). Likewise, Hafner and Kaplan (2008)
developed an affordance-based mechanism to allow interaction behaviors in robots. Their approach centered on
the creation of interpersonal maps whereby an agent is able
to directly map its own body structure onto that of an
observed body, thus allowing for derivation of action possibilities for the conglomerate of the two agents. More
recently, Uyanik et al. (2013) developed a mechanism for
a robot to learn social affordances (albeit in a simple context) that allowed the robot to be able to manipulate
objects and request the assistance of a human as well as
create multi-step plans to accomplish a collaborative goal.
Naturally, many of these implementations of affordances
for robots are subject to contention as some maintain that
affordances, even in robotics, must be non-representational
(e.g., Chemero & Turvey, 2007). Addressing this argument
is beyond the scope of this paper; instead, we merely posit
that there may be room for both representational and nonrepresentational approaches and that ultimately, both may
be necessary in providing an account of cognition capable
of explaining complex cognitive processes, such as those
characteristic of human social cognition (cf. Dale, 2010;
Horton, Chakraborty, & Amant, 2012; Wiltshire et al.,
2015). Regardless, further research is needed to extend
these affordance-based efforts to focus more on the nuances
of social interaction.
4.3. Incorporate analysis of interaction dynamics
From neural activity within an individual to the behavioral coordination of multiple individuals interacting in an
environment and even on socio-cultural levels, there are
emergent and dynamic self-organizational patterns that
prescribe and constrain interaction possibilities as they
occur across temporal and spatial dimensions (e.g., Coey,
Vartlet, & Richardson, 2012; Eiler, Kallen, Harrison, &
Richardson, 2013; Marsh, Richardson, & Schmidt, 2009;
Pfeifer et al., 2007). Many of these interaction dynamics
are emergent when humans engage in joint action and
can take on many forms of coordination (e.g., Knoblich,
Butterfill, & Sebanz, 2011). In accord with this growing
body of research, we suggest the system be designed with
the capacity to analyze and either synchronize or complement with interaction dynamics. The rationale here is that
this could provide the robot with another mechanism characteristic of T1 processes, which would enable the analyses
of interaction dynamics. Such mechanisms could prove
beneficial in that both human-robot and robot-robot coordinative mechanisms, which are crucial in supporting joint
action, could be enhanced to provide more effective and
efficient interactions.
One example instantiation relevant to this recommendation is drawn from the tools of dynamical systems theory
and has been used for analyzing human interaction dynamics from motion sensing data (e.g., Marsh et al., 2009).
With such a mechanism, robots could conceivably analyze
interaction dynamics unfolding between teammates and in
turn, couple with these dynamics to perform effectively.
Another example comes from Kelso, de Guzman,
Reveley, and Tognoli (2009) who developed the Virtual
Partner Interaction (VPI) Paradigm. This paradigm leveraged the Haken-Kelso-Bunz circuit (e.g., Haken, Kelso,
& Bunz, 1985) to allow the virtual partner (i.e., the computer) to couple its interaction with humans. Further, the
authors suggest that the VPI serves as a foundational
instantiation that can facilitate more complex forms of
human-machine coordination. In a more general sense,
Delaherche et al. (2012) review methods for identifying
interpersonal synchronization and articulate how such
mechanisms, when instantiated in machines, have the
potential to favorably advance the social capabilities of
artificial cognitive systems. An example instantiation that
models robot synchronization with humans can be found
in Ansermin, Mostafaoui, Beaussé, and Gaussier (2016).
The major take away point here is that many forms of
human interaction occur over time (i.e., dynamically) and
socially intelligent machines must be able to attune to this
information to interact accordingly.
4.4. Instantiate modal perceptual and motor representations
While the essence of the previously outlined recommendations is that these mechanisms are minimally or nonreliant on representations, as noted above, the system
may need to rely on representations to support more complex forms of cognition (Dale, 2010; Horton et al., 2012).
Toward this end, the artificial cognitive system should rely
on multi-modal sensory and motor representations (e.g.,
Hoffman, 2012). According to Pezzulo et al. (2013), modal
perceptual and motor representations mean that incoming
sensory information is represented in a form linked to its
modality (e.g., visual, auditory, etc.). The rationale for this
recommendation and several of the following is that they
are commitments of the Grounded Computational Cognition approach (Pezzulo et al., 2013). Additionally, in
accord with our account, modal perceptual and motor representations can be an essential form of input for both T1
and T2 processes.
Cognitive concepts of a more complex nature would
necessarily entail processes characteristic of autonomous
agents such as affect and motivation, internal modalities
that a system must be capable of independently constructing in pursuit of goal-oriented qualities (Hoffman, 2012;
Pezzulo et al., 2013). Social-cognitive mechanisms concerning such a non-representational concept as affect may actually endow a system with the ability to imitate the affective
states of its human teammates and furthermore, understand them. Some researchers have actually argued for
the primacy of social learning and understanding through
robotic imitation of humans (Breazeal & Scassellati,
2002). The benefits of this recommendation are that this
type of grounding provides the foundation from which a
robot could begin to manipulate perceptual information
in order to interpret humans (Breazeal et al., 2009) as well
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
as form concepts, memories, and make decisions (Hoffman,
2012).
An example instance of this recommendation is
Hoffman’s (2012) use of modality streams and action networks to link perceptual data with motor action activation.
Modality streams are connected to action networks and
comprised of multiple perceptual process nodes, each representing a different form of perceptual data ranging from
raw sensory input to features, properties, or higher-level
concepts. Action nodes, which trigger motor actions, make
up the structure of the action network. Activation of the
modality stream can progress in either an afferent direction, from the sensory system to concepts and actions, or
efferent direction, which follows the inverse path. Ultimately, these efforts would facilitate modeling of affective
and cognitive states while taking their dynamic and interdependent nature into account (e.g., Pezzulo, 2012;
Treur, 2013).
4.5. Couple perception, action, and cognition
Modal representations require integration and association with one another to couple motor effectors with perceptual sensors, effectively supporting both T1 and T2
processes. This notion is supported by ecological psychology theory in which Gibson (1979) first suggested that cognitive processes were substantiated by the convergence of
perception and action. Neisser (1976), though he takes a
contrasting approach to Gibson in regards to reasoning
about the environment, also put forth the argument that
perception could not be separated from action. However,
more recent work in neuroscience only serves to reinforce
Gibson’s
speculation (e.g., Gangohopadhyay &
Schilbach, 2011; Knoblich & Sebanz, 2006). The rationale
for this recommendation is that designing such a system
would provide a basis for perceptual learning and enable
both T1 and T2 processes. The benefits of designing a system in line with this recommendation are that it would not
only provide a basis for perceptual learning, but it may also
lead to more fluent interaction between robots and human
teammates (Hoffman, 2012). In essence, this would create
an architecture in which perception (i.e., sensors), action
(i.e., actuators), and cognition function as interacting components within an interdependent system. These architectural components would be layered in such a way that
they would function in parallel with one another, allowing
for information exchange and more efficient behavior. This
architecture would incorporate the concept of Convergence
Zones (Damasio, 1989; Simmons & Barsalou, 2003), one of
the example instantiations pertaining to this recommendation. In architectures possessing convergence zones, the
perceptual and motor layers of the models overlap and
interact to enable the simulation and generation of behavior (e.g., Lallee & Dominey, 2013).
Brooks (1999) subsumption architecture was one of the
first robotics efforts emphasizing direct perception-action
links, designed out of necessity to address issues the
201
Sense-Plan-Act (SPA) paradigm faced (Nilsson, 1980).
Where the SPA paradigm took a considerable amount of
time to plan an action and largely relied on internal models, rather than perception, the subsumption architecture
supported the execution of motor commands in direct
response to sensory input, allowing the robot to engage
in real-time interaction with a dynamic environment,
quicker and in a more reactive manner. Symmetrical
action-perception activation networks, a more recent example of this recommendation, were proposed by Hoffman
(2012) to achieve multi-modal integration. Within these
networks, perceptions influence higher-level associations
that contribute to the selection of actions; however, perceptions are conversely biased through motor activities. As
such, this line of thinking has become increasingly adopted
in robotics through recognition of the reciprocal interdependencies of the perception-action continuum. The
perception-action continuum is similar in concept to the
action-perception cycle in which perception is recognized
as the fundamental basis for an agent capable of interacting with its environment (Murphy, 2000).
4.6. Provide motor and perceptual resonance mechanisms
Motor and perceptual resonance mechanisms implemented in robots would allow for recursive entrainment
and behavior matching. The rationale for this recommendation is that these mechanisms would be akin to the T1
motor and perceptual resonance mechanisms elicited in
humans by the mirror neuron system (Elias & Fiore,
2008). The mirror neuron system, a biological structure
found in primates and humans (di Pellegrino, Fadiga,
Fogassi, Gallese, & Rizzolatti, 1992; Gallese & Goldman,
1998) is argued to be involved in predicting others’ mental
states and behaviors through simulation and imitative
capabilities (Gallese & Goldman, 1998). Imitation was later
proposed as the connection between mirror neurons and
‘‘mindreading” skills (Meltzoff & Decety, 2003). One benefit of modeling resonance mechanisms after the mirror neuron system is that it would enable design of a mechanism
that could be trained to perceive and perform any number
of novel actions. Such structures would not be limited in
recognizing the wide array of possible actions (Arbib,
Metta, & van der Smagt, 2008), thus holding the potential
for supporting T1 processes (Bohl & van den Bos, 2012).
Further benefits of such a system would entail a greater
understanding of human teammates’ intentions as they
moved through the environment and provide an improvement in the overall social competence of robots as a function of enabling social interactions that better reflect those
occurring in humans (e.g., Chaminade & Cheng, 2009;
Schütz-Bosbach & Prinz, 2007). Motor and perceptional
resonance mechanisms would ultimately facilitate bettercoordinated joint actions (e.g., Chaminade & Cheng,
2009; Schütz-Bosbach & Prinz, 2007).
Barakova and Lourens (2009) implemented one example
of a mirror neuron framework that provided simulated and
202
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
embodied robots with a mechanism for synchronizing
movements and entraining neuronal firing patterns. This
facilitated turn taking and other teaming related behaviors
between two agents. Additionally, Ito and Tani (2004, see
also Ito, Noda, Hoshino, & Tani, 2006) utilized an
approach based on dynamical systems to design a mechanism capable of enabling a humanoid robot to learn
through imitative interactions. While the system uses a
recurrent neural network – which ‘‘encodes” sensorimotor
trajectories for later recall – to learn behaviors, it can also
be interpreted in terms of a mirror neuron system in that
the encoding and imitation of the action are analogous to
observation and motor generation. Additionally, robot
imitation learning has drawn from the concept of mirror
neurons in the past, imitating these systems through hidden
Markov models (Amit & Mataric, 2002) or neural networks (Billard, 2002). In an interactive context, such a
mechanism could enable a robot to synchronize its movements with a human team member, adapting to dynamic
situations and generating learned behaviors as required.
Dynamic Bayesian Networks for understanding mental
states (Pezzulo, 2012). Similarly, Johnson and Demiris
(2005) designed the Hierarchical Attentive Multiple Models
for Execution and Recognition (HAMMER) architecture,
grounding it in the structure of mirror neurons, to employ
simulation theory in the recognition and imitation of
actions. The HAMMER architecture enables action recognition through observation of another agent’s actions while
simultaneously utilizing a simulation mechanism to compare a set of models (analogous to motor programs) to the
observed action. Essentially, their approach provides a
robot with the ability to observe and recognize the actions
of another and in turn, simulate and generate an imitation
of the observed action by temporarily adopting the other
agent’s perspective and engaging the motor system in the
recognition process. While the HAMMER architecture is
only applied to action recognition, this architecture could
be modified for use in a dynamic social environment to allow
for the inference of mental states as well.
4.8. Leverage simulation-based top-down perceptual biasing
4.7. Abstract from modal experiences
Mechanisms such as simulation have the ability to
abstract upon modal representations, which Barsalou
(2008) defines as the ‘‘reenactment of perceptual, motor,
and introspective states acquired during experience with
the world, body, and mind” (pp. 618–619). According to
several interpretations of grounded cognition, simulation
plays a central role in supporting cognitive processes
(Barsalou, 1999; Decety & Grèzes, 2006; Goldman, 2006).
Thus, the rationale for this recommendation is that the
instantiation of simulation or inference mechanisms capable of abstracting from modal experiences would support
T2 processes for use when information was not directly
available in the environment. The benefits of designing a
system to support these mechanisms would be that a robot
could better engage in mental state attributions of others as
well as explain current events, predict future events, and
imagine new events (Vernon, 2010). Essentially, this would
provide the robot with a mechanism for engaging in the
mental state attributions of others as a function of initial
exposure to a mental state and storage of the associated
multimodal sensory cues (e.g., in the case of anger: a raised
voice, facial expression, etc.). In the event that the robot
encounters the same mental state again, the associated cues
can be retrieved and used to simulate the mental state to
first, enable recognition and then, attribution.
An example instantiation related to this recommendation
is Breazeal et al.’s (2009) design of an embodied social robot
system comprised of interconnected perceptual, motor,
belief, and intention modules. From these modules, the
robot generates its own states and re-uses them to simulate
and infer the perspective and intentions of humans during
an interaction, which in part rests on a foundation of social
learning through imitation (Breazeal & Scassellati, 2002). In
support of interactive capabilities, other efforts have used
Simulation mechanisms also provide a means for a
robot to predict future states in service of engaging in coordinative action with the physical and social environment
(Hoffman & Breazeal, 2004). Research findings focusing
on human visual psychophysics indicate that scene analysis, and by extension, analysis of a robot’s working environment, are heavily influenced by top-down processes
(Wang, 2003). The ability to perceive one’s environment
is, in itself, intrinsically linked to action. For example,
Noton and Stark’s (1971) Scanpath Theory of Attention,
states that top-down cognitive control drives the eye movements that in turn, enable perception. Essentially, the primary influence on perception is a cognitive model based
on expectations and thus, an individual often only perceives what they expected to see. The rationale behind this
recommendation is that it would provide a link between T1
and T2 processes, essential to stimulating and triggering
the motor system toward the selection of appropriate
actions while also informing direct perception-action links.
The benefits of this recommendation are that it would support the evolution of robots from artificial systems perceived as tools to artificial systems perceived as
teammates. In particular, such simulation-based topdown perceptual biasing ‘‘may specifically be the key to
more fluent coordination between humans and robots
working together in a socially structured interaction”
(Hoffman, 2012, p. 6). Such a mechanism would additionally contribute to the learning and perception of affordances. Specifically, a mechanism grounded in the
scanpath theory would enable a robot to analyze its environment more efficiently by observing specific details, in
accordance with their saliency, that would support
enhancement of the original model.
An example of a simulation-based top-down approach
to such a mechanism would be perceptual priming, which
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
essentially stimulates and triggers the motor system toward
the selection of the most appropriate action in a given situation (Marsh et al., 2009; Schütz-Bosbach & Prinz, 2007).
Hoffman (2012) describes two subsystems that support
simulation-based top-down perceptual biasing as perceptual priming mechanisms: Markov-chain Bayesian anticipatory simulations and Intermodal Hebbian reinforcing.
Markov-chain Bayesian anticipatory simulation mechanisms allow the system to probabilistically anticipate the
activation of a state to reduce reaction latency whereas
Intermodal Hebbian reinforcing strengthens the connections between activation nodes. For example, if a happy
mental state is generally attributed when a social agent
exhibits social cues resembling a smile, then the perception
of a smile will activate the pathway responsible for the
judgment of happiness. The activation of this pathway,
when employing a Markov-chain Bayesian anticipatory
simulation mechanism, serves to prime the robot’s perceptual system in that the sensors responsible for detecting the
features of a happy mental state are now more responsive
toward these social cues, which reduces the delay of a
happy mental state attribution. The application of Intermodal Hebbian reinforcing will influence the decision making process as exposure to a given constellation of social
cues (i.e., multiple co-occurring social cues) increases. If
sadness is commonly communicated with social cues
resembling a frown and heavy-lidded gaze (i.e., eyes not
completely open) then the connection between the attribution of a sad mental state and these two social cues cooccurring are strengthened. In essence, a mechanism
grounded in these two computational subsystems will trigger the simulation of a high-level concept, like a mental
state, and prime the sensors responsible for detecting
low-level features, like social cues.
5. Discussion and conclusions
For robots to become more capable of complex social
interactions, it is essential to advance the state of the art
in HRI through innovations made in social-cognitive
mechanisms featured in artificial systems (Christensen
et al., 2013). Such innovations would enable more effective
human-robot coordination and cooperation, and shift the
perception of robots as tools to robots viewed as teammates, collaborators, or partners. The most well-known
artificial general intelligence (AGI) models –LIDA
(Ramamurthy, Baars, D’Mello, & Franklin, 2006), Soar
(Laird et al., 1987), and ACT-R (Anderson, 1993) – began
as a series of theories that could be applied to design an
artificial cognitive model. Eventually, those theories
evolved into conceptual commitments that computational
efforts would adhere to in instantiating such work, resulting in the current AGI models of today. Our goal with this
paper was to establish a roadmap intended for engineering
human social cognition. To that end, we provided a theoretical foundation for modeling social-cognitive mechanisms in robotic agents such that these artificial systems
203
can be designed to function, and be perceived, as effective
teammates.
It is our primary position that understanding the many
facets of human social cognition will be foundational in
contributing to the development of artificial social intelligence system design. This is but one step in research for
advancing the social capabilities of robotic agents. With
advancements in artificial social intelligence, concomitant
considerations will also need to be made regarding, for
example, artificial moral capabilities (e.g., Wallach &
Allen, 2008; Wiltshire, 2015). Throughout this paper, we
have outlined an integrated set of mechanisms that are
key to instantiating the EHSC approach, what we suggest
as a path towards artificial social intelligence. Unraveling
human social cognition is vital to mapping how people
understand others’ mental states, explain as well as predict
their behavior, determine the most appropriate behavioral
response, and maintain relationships.
As we have detailed, although the basic fundamental
mechanisms of social cognition remain debated, they likely
align and are some combination based on the Theory of
Mind approach (Gopnik & Wellman, 1992), perceptualmotor simulation routines (Blakemore & Decety, 2001;
Goldman, 2006), and direct perception (De Jaegher,
2009; De Jaegher & Di Paolo, 2007; Gallagher, 2007,
2008; Gangohopadhyay & Schilbach, 2011; Wiltshire
et al., 2015). Meta-level approaches, such as dual-process
theories of cognition, have integrated the aforementioned
social-cognitive mechanisms in attempt to more fully
explain social cognition (Bohl & van den Bos, 2012;
Wiltshire et al., 2015). Building upon this, EHSC emphasizes the potential functional efficacy of leveraging the dual
approach to social signal processing. In this way, we provided a framework that integrates competing theories with
the goal of differentiating mechanisms of social cognition
to make their computational instantiation more tractable.
The EHSC approach is unique in its identification and
advancement of four specific components that we believe
will contribute to the design of robotic systems that possess
social intelligence. First, we identified the importance of
SSP in enabling more effective communication between
human and robotic teammates and its contribution to providing the basis for mechanisms that will allow a robot to
gain an improved understanding of humans. Maintaining
this emphasis on social robotics places the objective of
attaining more natural and automatic interaction at the
forefront of our approach, with a focus on verbal and nonverbal communicative cues. However, we also stress that,
for more natural and automatic interaction to be possible
in HRI, embodied cognition must be taken into account
given that this will establish the foundation for humans
and robots to construct a shared understanding of the
world. Finally, we outlined our commitment to autonomous robotic systems, one of the long-term goals for
EHSC and robotics designers alike. We propose that leveraging these four key components in the EHSC approach
will provide an initial roadmap toward modeling social-
204
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
cognitive mechanisms that will give rise to, not only a more
effective robot, but also a more effective artificial teammate.
Our modeling recommendations have centered primarily on the perceptual, motor, and cognitive modeling of a
robotic system that spans disciplinary perspectives. Indeed,
this is the area that will require extensive work in the
future. However, while we have posited these recommendations as aligned with the components of our approach, they
can also be viewed as open challenges for consideration in
the instantiation of artificial social intelligence for embodied agents (cf., Vinciarelli et al., 2015). Admittedly, the aim
of our paper was not to outline the challenges that will present themselves given the instantiation and integration of
these mechanisms, but such challenges will certainly need
to be addressed to ensure that robots will perform as effective team members (cf. Klein et al., 2004). As such, the next
steps in this area must include both research and modeling
efforts that assess the issues and challenges of integrating
the proposed types of models and formalisms. That effort
can aid in the development of an integrated and working
system based on these recommendations. These recommendations, if instantiated, would provide some very basic perceptual, motor, and cognitive abilities, but future efforts
should address whether these would also support more
complex forms of social interaction. Such a capability
would permit an artificial system to better express or perceive emotions while interacting and communicating with
humans (cf. Pezzulo, 2012) in even more complex social
scenarios requiring shared decision-making and problemsolving. Table 4 lists the EHSC modeling recommendations
outlined within this paper, their relation to T1 & T2 processes, examples of associated computational formalisms
supporting their instantiation, and representative references essential for consideration in their implementation.
In sum, our goal has been to outline the basic engineering of human social cognition to illustrate how an embodied social robot can be designed to function autonomously
as an efficient teammate. Adopting the EHSC recommendations as an approach to modeling social-cognitive mechanisms in robots, will not only provide a sophisticated and
flexible perceptual, motor, and cognitive architecture for
robots, it also allows for a more direct understanding of,
and natural interaction with, the environment and human
teammates. It also provides the mechanism for better
understanding human behavior and mental states as well
as allows for prediction and interpretation of novel and
complex social situations. Social robots exist within an
environment suitable for humans, therefore, similar
embodiment under these conditions indicates the need to
take advantage of similar cognitive mechanisms for
humans and robots to coexist.
Acknowledgements
The authors wish to thank Christian Mosbæk Johannessen for his graphic design recommendations regarding
Fig. 1. This work was partially supported by the Army
Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-10-2-0016. Views
contained here are of the authors and should not be interpreted as representing official policies, either expressed or
implied, of the Army Research Laboratory, the U.S.
Government or the University of Central Florida. The U.
S. Government is authorized to reproduce and distribute
reprints for Government purposes notwithstanding any
copyright notation.
References
ACT-R Research Group (2013). About ACT-R Retrieved from<http://
act-r.psy.cmu.edu/about/>.
Adolphs, R. (1999). Social cognition and the human brain. Trends in
Cognitive Sciences, 3(12), 469–479.
Amit, R., & Mataric, M. (2002). Learning movement sequences from
demonstration. In Proceedings of the 2nd international conference on
development and learning (pp. 203–208). IEEE.
Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Anderson, M. L. (2003). Embodied cognition: A field guide. Artificial
Intelligence, 149(1), 91–130.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., &
Qin, Y. (2004). An integrated theory of the mind. Psychological
Review, 111(4), 1036.
Ansermin, E., Mostafaoui, G., Beaussé, N., & Gaussier, P. (2016).
Learning to synchronously imitate gestures using entrainment effect. In
International conference on simulation of adaptive behavior
(pp. 219–231). Springer International Publishing. August.
Arbib, M. A., Metta, G., & van der Smagt, P. (2008). Neurorobotics:
From vision to action. In B. Sicilliano & O. Khatib (Eds.), Springer
handbook of robotics (pp. 1453–1480). Springer.
Atkinson, D. J., & Clark, M. H. (2013). Autonomous agents and human
interpersonal trust: Can we engineer a human-machine social interface
for trust? Technical Report No SS-13-07. In Trust and autonomous
systems: Papers from the 2013 AAAI spring symposium. Menlo Park,
CA: AAAI Press, March.
Barakova, E. I., & Lourens, T. (2009). Mirror neuron framework yields
representations for robot interaction. Neurocomputing, 72(4), 895–900.
Bargh, J. A. (1984). Automatic and conscious processing of social
information. In R. S. Wyer, Jr. & T. K. Srull (Eds.). Handbook of
social cognition (Vol. 3, pp. 1–43). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain
Sciences, 22(4), 577–660.
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645.
Barsalou, L. W., Niedenthal, P. M., Barbey, A. K., & Ruppert, J. A.
(2003). Social embodiment. Psychology of Learning and Motivation, 43,
43–92.
Bartneck, C., & Forlizzi, J. (2004). A design-centered framework for social
human-robot interaction. Proceedings of the Ro-Man, 2004, 591–594.
http://dx.doi.org/10.1109/ROMAN.2004.1374827.
Beer, R. D. (1995). A dynamical systems perspective on agent-environment interaction. Artificial Intelligence, 72(1–2), 173–215.
Bermúdez, J. L. (2010). Cognitive science: An introduction to the science of
the mind. New York: Cambridge University Press.
Best, A., Warta, S. F., Kapalo, K. A., & Fiore, S. M. (2016). Of Mental
States and Machine Learning How Social Cues and Signals Can Help
Develop Artificial Social Intelligence. Proceedings of the Human
Factors and Ergonomics Society Annual Meeting, 60(1),
1362–1366.
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
Billard, A. (2002). Imitation: A means to enhance learning of a synthetic
protolanguage in autonomous robots. In K. Dautenhahn & C. L.
Nehaniv (Eds.), Imitation in animals and artifacts (pp. 281–310).
Cambridge, MA: MIT Press.
Blakemore, S. J., & Decety, J. (2001). From the perception of action to the
understanding of intention. Nature Reviews Neuroscience, 2(8),
561–567.
Bodenhausen, G. V., & Todd, A. R. (2010). Social cognition. Wiley
Interdisciplinary Reviews: Cognitive Science, 1(2), 160–171.
Bohl, V. (2015). We read minds to shape relationships. Philosophical
Psychology, 28(5), 674–694.
Bohl, V., & van den Bos, W. (2012). Toward an integrative account of
social cognition: Marrying theory of mind and interactionism to study
the interplay of Type 1 and Type 2 processes. Frontiers in Human
Neuroscience, 6, 1–15. http://dx.doi.org/10.3389/fnhum.2012.00274.
Bradshaw, J. M., Feltovich, P. J., Johnson, M. J., Breedy, M., Bunch, L.,
Eskridge, T. C., ... van Diggelen, J. (2009). From tools to teammates:
Joint activity in human-agent-robot teams. In Human centered design
(pp. 935–944). Berlin, Heidelberg: Springer.
Bradshaw, J. M., Feltovich, P. J., Johnson, M., Bunch, L., Breedy, M.,
Eskridge, T., ... Uszok, A. (2008). Coordination in human-agent-robot
teamwork. In International symposium on collaborative technologies and
systems (pp. 467–476). IEEE.
Breazeal, C. (2004). Social interactions in HRI: The robot view. IEEE
Transactions on Systems, Man, and Cybernetics, Part C: Applications
and Reviews, 34(2), 181–186, IEEE.
Breazeal, C., Gray, J., & Berlin, M. (2009). An embodied cognition
approach to mindreading skills for socially intelligent robots. The
International Journal of Robotics Research, 28(5), 656–680.
Breazeal, C., & Scassellati, B. (2002). Robots that imitate humans. Trends
in Cognitive Sciences, 6(11), 481–487.
Brooks, R. A. (1999). Cambrian intelligence: The early history of the new
AI. Cambridge, MA: MIT Press.
Chaiken, S., & Trope, Y. (1999). Dual-process theories in social psychology.
New York: Guilford Press.
Chaminade, T., & Cheng, G. (2009). Social cognitive neuroscience and
humanoid robotics. Journal of Physiology-Paris, 103(3), 286–295.
Chemero, A. (2003). An outline of a theory of affordances. Ecological
Psychology, 15(2), 181–195.
Chemero, A. (2009). Radical embodied cognitive science. Cambridge, MA:
MIT Press.
Chemero, A. (2013). Radical embodied cognitive science. Review of
General Psychology, 17(2), 145–150.
Chemero, A., & Turvey, M. T. (2007). Gibsonian affordances for
roboticists. Adaptive Behavior, 15(4), 473–480.
Christensen, H., Batzinger, T., Bekris, K., Bohringer, K., Bordogna, J.,
Bradski, G., ... Zhang, M. (2013). A roadmap for US robotics: From
internet to robotics. Washington, DC, US: Computing Community
Consortium and Computing Research Association.
Coey, C. A., Vartlet, M., & Richardson, M. J. (2012). Coordination
dynamics in a socially situated nervous system. Frontiers in Human
Neuroscience, 6, 1–12. http://dx.doi.org/10.3389/fnhum.2012.00164.
Dale, R. (2008). The possibility of a pluralist cognitive science. Journal of
Experimental and Theoretical Artificial Intelligence, 20(3), 155–179.
Dale, R. (2010). Review of radical embodied cognitive science. Journal of
Mind and Behavior, 31(1–2), 127–140.
Dale, R., Dietrich, E., & Chemero, A. (2009). Explanatory pluralism in
cognitive science. Cognitive Science, 33(5), 739–742.
Damasio, A. R. (1989). Time-locked multiregional retroactivation: A
systems-level proposal for the neural substrates of recall and recognition. Cognition, 33(1), 25–62.
Dautenhahn, K., Ogden, B., & Quick, T. (2002). From embodied to
socially embedded agents – Implications for interaction-aware robots.
Cognitive Systems Research, 3(3), 397–428.
De Jaegher, H. (2009). Social understanding through direct perception?
Yes, by interacting. Consciousness and Cognition, 18(2), 535–542.
De Jaegher, H., & Di Paolo, E. (2007). Participatory sense-making.
Phenomenology and the Cognitive Sciences, 6(4), 485–507.
205
De Jaegher, H., Di Paolo, E. D., & Gallagher, S. (2010). Can social
interaction constitute social cognition? Trends in Cognitive Science, 14
(10), 441–447.
Decety, J., & Grèzes, J. (2006). The power of simulation: Imagining one’s
own and other’s behavior. Brain Research, 1079(1), 4–14.
Delaherche, E., Chetouani, M., Mahdhaoui, A., Saint-Georges, C., Viaux,
S., & Cohen, D. (2012). Interpersonal synchrony: A survey of
evaluation methods across disciplines. IEEE Transactions on Affective
Computing, 3(3), 349–365.
DeSteno, D., Breazeal, C., Frank, R. H., Pizarro, D., Baumann, J.,
Dickens, L., & Lee, J. J. (2012). Detecting the trustworthiness of novel
partners in economic exchange. Psychological Science, 23(12),
1549–1556.
Di Paolo, E., & De Jaegher, H. (2012). The interactive brain hypothesis.
Frontiers in Human Neuroscience, 6, 1–16. http://dx.doi.org/10.3389/
fnhum.2012.00163.
di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G.
(1992). Understanding motor events: A neurophysiological study.
Experimental Brain Research, 91(1), 176–180.
Duchon, A. P., Kaelbling, L. P., & Warren, W. H. (1998). Ecological
robotics. Adaptive Behavior, 6(3–4), 473–507.
Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary
Anthropology, 6, 178–190.
Eiler, B. A., Kallen, R. W., Harrison, S. J., & Richardson, M. J. (2013).
Origins of order in joint activity and social behavior. Ecological
Psychology, 25(3), 316–326.
Elias, J. & Fiore, S. M. (2008, May). From psychology, to neuroscience, to
robots: An interdisciplinary approach to bio-inspired robotics. In
Presented at the 20th annual convention of the American Psychological
Society. Chicago, IL.
Fiore, S. M., Badler, N. L., Boloni, L., Goodrich, M. A., Wu, A. S., &
Chen, J. (2011). Human-robot teams collaborating socially, organizationally, and culturally. Proceedings of the Human Factors and
Ergonomics Society Annual Meeting, 55(1), 465–469. http://dx.doi.
org/10.1177/1071181311551096.
Fiore, S. M., Elias, J., Gallagher, S., & Jentsch, F. (2008). Cognition and
coordination: Applying cognitive science to understand macrocognition in human-agent teams. In Proceedings of the 8th annual symposium
on human interaction with complex systems. Norfolk, VA.
Fiore, S. M., Wiltshire, T. J., Lobato, E. J. C., Jentsch, F. G., Huang, W.
H., & Axelrod, B. (2013). Toward understanding social cues and
signals in human-robot interaction: Effects of robot gaze and
proxemics behavior. Frontiers in Psychology, 4, 1–15. http://dx.doi.
org/10.3389/fpsyg.2013.00859.
Fiske, A. P., & Haslam, N. (1996). Social cognition is thinking about
relationships. Current Directions in Psychological Science, 5(5),
143–148.
Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A survey of socially
interactive robots. Robotics and Autonomous Systems, 42(3), 143–166.
Franklin, S., Strain, S., McCall, R., & Baars, B. (2013). Conceptual
commitments of the LIDA model of cognition. Journal of Artificial
General Intelligence, 4(2), 1–22.
Frith, C. D., & Frith, U. (2007). Social cognition in humans. Current
Biology, 17(16), R724–R732.
Frith, C. D., & Frith, U. (2008). Implicit and explicit processes in social
cognition. Neuron, 60(3), 503–510.
Frith, C. D., & Frith, U. (2012). Mechanisms of social cognition. Annual
Review of Psychology, 63, 287–313.
Gallagher, S. (2007). Social cognition and social robots. Pragmatics &
Cognition, 15(3), 435–453.
Gallagher, S. (2008). Direct perception in the intersubjective context.
Consciousness and Cognition, 17(2), 535–543.
Gallagher, S. (2013). You and I, robot. AI & Society, 28(4), 455–460.
Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation
theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.
Gangohopadhyay, N., & Schilbach, L. (2011). Seeing minds: A neurophilosophical investigation of the role of perception-action coupling
in social perception. Social Neuroscience, 7(4), 410–423.
206
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
Gibson, J. J. (1979). The ecological approach to visual perception. Boston:
Houghton Mifflin.
Goldman, A. I. (2006). Simulating minds: The philosophy, psychology, and
neuroscience of mindreading. Oxford, England: Oxford University
Press.
Gopnik, A., & Wellman, H. M. (1992). Why the child’s theory of mind
really is a theory. Mind & Language, 7(1–2), 145–171.
Hafner, V. V., & Kaplan, F. (2008). Interpersonal maps: How to map
affordances for interaction behaviour. In Towards affordance-based
robot control (pp. 1–15). Berlin, Heidelberg: Springer.
Haken, H., Kelso, J. A. S., & Bunz, H. (1985). A theoretical model of
phase transitions in human hand movements. Biological Cybernetics,
51(5), 347–356.
Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y., De Visser, E.
J., & Parasuraman, R. (2011). A meta-analysis of factors affecting trust
in human-robot interaction. Human Factors: The Journal of the Human
Factors and Ergonomics Society, 53(5), 517–527.
Hoffman, G. (2012). Embodied cognition for autonomous interactive
robots. Topics in Cognitive Science, 4(4), 759–772.
Hoffman, G., & Breazeal, C. (2004). Collaboration in human-robot teams.
In Proceedings of the AIAA 1st intelligent systems technical conference
(pp. 1–18). Chicago, IL: AIAA.
Hoffman, G., & Breazeal, C. (2010). Effects of anticipatory perceptual
simulation on practiced human-robot tasks. Autonomous Robots, 28(4),
403–423.
Horton, T. E., Chakraborty, A., & Amant, R. S. (2012). Affordances for
robots: A brief survey. AVANT, 3(2), 70–84.
Ito, M., Noda, K., Hoshino, Y., & Tani, J. (2006). Dynamic and
interactive generation of object handling behaviors by a small
humanoid robot using a dynamic neural network model. Neural
Networks, 19(3), 323–337.
Ito, M., & Tani, J. (2004). On-line imitative interaction with a humanoid
robot using a dynamic neural network model of a mirror system.
Adaptive Behavior, 12(2), 93–115.
Johnson, M., & Demiris, Y. (2005). Perceptual perspective taking and
action recognition. International Journal of Advanced Robotic Systems,
2(4), 301–308.
Kelso, J. A. S., de Guzman, G. C., Reveley, C., & Tognoli, E. (2009).
Virtual partner interaction (VPI): Exploring novel behaviors via
coordination dynamics. PLoS One, 4(6). http://dx.doi.org/10.1371/
journal.pone.0005749.
Klein, G., Woods, D. D., Bradshaw, J. M., Hoffman, R. R., & Feltovich,
P. J. (2004). Ten challenges for making automation a ‘‘team player” in
joint human-agent activity. IEEE Intelligent Systems, 19(6), 91–95.
Knoblich, G., Butterfill, S., & Sebanz, N. (2011). Psychological research
on joint action: Theory and data. In The psychology of learning and
motivation: Advances in research and theory (pp. 59–101). San Diego,
CA: Elsevier Academic Press, Inc.
Knoblich, G., & Sebanz, N. (2006). The social nature of perception and
action. Current Directions in Psychological Science, 15(3), 99–104.
Kono, T. (2009). Social affordances and the possibility of ecological
linguistics. Integrative Psychological and Behavioral Science, 43(4),
356–373.
Kourtis, D., Sebanz, N., & Knoblich, G. (2010). Favoritism in the motor
system: Social interaction modulates action simulation. Biology
Letters, 6(6), 758–761.
Lackey, S., Barber, D., Reinerman, L., Badler, N. I., & Hudson, I. (2011).
Defining next-generation multi-modal communication in human-robot
interaction. Proceedings of the human factors and ergonomics society
annual meeting (Vol. 55(1), pp. 461–464). http://dx.doi.org/10.1177/
1071181311551095.
Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar: An
architecture for general intelligence. Artificial Intelligence, 33(1), 1–64.
Lakoff, G., & Johnson, M. (1980). Conceptual metaphor in everyday
language. The Journal of Philosophy, 77(8), 453–486.
Lallee, S., & Dominey, P. F. (2013). Multi-modal convergence maps:
From body schema and self-representation to mental imagery.
Adaptive Behavior, 21(4), 274–285.
Langley, P., Laird, J. E., & Rogers, S. (2009). Cognitive architectures:
Research issues and challenges. Cognitive Systems Research, 10(2),
141–160.
Levinson, S. C. (2006). On the human ‘Interaction Engine’. In N. J.
Enfield & S. C. Levinson (Eds.), Roots of human sociality: Culture,
cognition and interaction (pp. 39–69). Oxford, UK: Berg.
Lindblom, J., & Andreasson, R. (2016). Current challenges for UX
evaluation of human-robot interaction. In Advances in ergonomics of
manufacturing: Managing the enterprise of the future (pp. 267–277).
Lobato, E. J. C., Warta, S. F., Wiltshire, T. J., & Fiore, S. M. (2015).
Varying social cue constellations results in different attributed social
signals in a simulated surveillance task. In Proceedings of the twentyeighth international Florida artificial intelligence research society
conference (pp. 61–66). Hollywood, FL: AAAI.
Lobato, E. J., Wiltshire, T. J., Hudak, S., & Fiore, S. M. (2014). No time,
no problem mental state attributions made quickly or after reflection
do not differ. Proceedings of the human factors and ergonomics society
annual meeting (Vol. 58(1), pp. 1341–1345). SAGE Publications.
Macrae, C. N., & Bodenhausen, G. V. (2000). Social cognition: Thinking
categorically about others. Annual Review of Psychology, 51(1), 93–120.
Marsh, K. L., Richardson, M. J., & Schmidt, R. C. (2009). Social
connection through joint action and interpersonal coordination.
Topics in Cognitive Science, 1(2), 320–339.
McCabe, K., Houser, D., Ryan, L., Smith, V., & Trouard, T. (2001). A
functional imaging study of cooperation in two-person reciprocal
exchange. Proceedings of the National Academy of Sciences, 98(20),
11832–11835. http://dx.doi.org/10.1073/pnas.211415698.
Meltzoff, A. N., & Decety, J. (2003). What imitation tells us about social
cognition: A rapprochement between developmental psychology and
cognitive neuroscience. Philosophical Transactions of the Royal Society
of London, Series B: Biological Sciences, 358(1431), 491–500.
Murphy, R. R. (2000). Introduction to AI robotics. Cambridge, MA: MIT
Press.
Neisser, U. (1976). Cognition and reality: Principles and implications of
cognitive psychology. New York, NY: WH Freeman/Times Books/
Henry Holt & Co.
Nilsson, N. J. (1980). Principles of artificial intelligence. San Francisco,
CA: Morgan Kaufmann Publishers, Inc.
Noton, D., & Stark, L. (1971). Scanpaths in eye movements during pattern
perception. Science, 171(3968), 308–311.
Pandey, A. K., & Alami, R. (2012, July). Visuo-spatial ability, effort and
affordance analyses: Towards building blocks for robot’s complex
socio-cognitive behaviors. In Workshops at the twenty-sixth AAAI
conference on artificial intelligence <http://www.aaai.org/ocs/index.
php/WS/AAAIW12/paper/view/5270>.
Pezzulo, G. (2012). The ‘‘Interaction Engine”: A common pragmatic
competence across linguistic and nonlinguistic interactions. IEEE
Transactions on Autonomous Mental Development, 4(2), 105–123.
Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., McRae, K.,
& Spivey, M. (2013). Computational Grounded Cognition: A new
alliance between grounded cognition and computational modeling.
Frontiers in Psychology, 3, 1–11. http://dx.doi.org/10.3389/
fpsyg.2012.00612.
Pfeifer, R., Lungarella, M., & Iida, F. (2007). Self-organization, embodiment, and biologically inspired robotics. Science, 318(5853),
1088–1093.
Pfeifer, R., & Scheier, C. (1999). Understanding intelligence. Cambridge,
MA: MIT Press.
Phillips, E., Ososky, S., Grove, J., & Jentsch, F. (2011). From tools to
teammates: Toward the development of appropriate mental models for
intelligent robots. Proceedings of the Human Factors and Ergonomics
Society Annual Meeting, 55(1), 1491–1495. http://dx.doi.org/10.1177/
1071181311551310.
Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory
of mind? Behavioral and Brain sciences, 1(04), 515–526.
Przyrembel, M., Smallwood, J., Pauen, M., & Singer, T. (2012).
Illuminating the dark matter of social neuroscience: Considering the
problem of social interaction from philosophical, psychological, and
T.J. Wiltshire et al. / Cognitive Systems Research 43 (2017) 190–207
neuroscientific perspectives. Frontiers in Human Neuroscience, 6, 1–15.
http://dx.doi.org/10.3389/fnhum.2012.00190.
Ramamurthy, U., Baars, B. J., D’Mello, S. K., & Franklin, S. (2006).
LIDA: A working model of cognition. In Proceedings of the 7th
international conference on cognitive modeling (pp. 244–249). Trieste:
Edizioni Goliardiche.
S
ß ahin, E., Çakmak, M., Doğar, M. R., Uğur, E., & Üçoluk, G. (2007). To
afford or not to afford: A new formalization of affordances toward
affordance-based robot control. Adaptive Behavior, 15(4), 447–472.
Sartori, L., Becchio, C., & Castiello, U. (2011). Cues to intention: The role
of movement information. Cognition, 119(2), 242–252.
Satpute, A. B., & Lieberman, M. D. (2006). Integrating automatic and
controlled processes into neurocognitive models of social cognition.
Brain Research, 1079(1), 86–97.
Schilbach, L. (2014). On the relationship of online and offline social
cognition. Frontiers in Human Neuroscience, 8, 1–8. http://dx.doi.org/
10.3389/fnhum.2014.00278.
Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G.,
Schlicht, T., & Vogeley, K. (2013). Toward a second-person neuroscience. Behavioral and Brain Sciences, 36(04), 393–414.
Schütz-Bosbach, S., & Prinz, W. (2007). Perceptual resonance: Actioninduced modulation of perception. Trends in Cognitive Sciences, 11(8),
349–355.
Sheridan, T. B. (2016). Human–robot interaction status and challenges.
Human Factors: The Journal of the Human Factors and Ergonomics
Society, 58(4), 525–532.
Simmons, W. K., & Barsalou, L. W. (2003). The similarity-in-topography
principle: Reconciling theories of conceptual deficits. Cognitive Neuropsychology, 20(3–6), 451–486.
Streater, J. P., Bockelman Morrow, P., & Fiore, S. M. (2012, October).
Making things that understand people: The beginnings of an interdisciplinary approach for engineering computational social intelligence.
In Presented at the 56th annual meeting of the human factors and
ergonomics society. Boston, MA.
Treur, J. (2013). An integrative dynamical systems perspective on
emotions. Biologically Inspired Cognitive Architectures, 4,
27–40.
Tsarouchi, P., Makris, S., & Chryssolouris, G. (2016). Human–robot
interaction review and challenges on task planning and programming.
International Journal of Computer Integrated Manufacturing, 29(8),
916–931.
Tylén, K., Allen, M., Hunter, B. K., & Roepstorff, A. (2012). Interaction vs.
observation: Distinctive modes of social cognition in human brain and
behavior? A combined fMRI and eye-tracking study. Frontiers in Human
Neuroscience, 6, 1–11. http://dx.doi.org/10.3389/fnhum.2012.00331.
Uyanik, K. F., Caliskan, Y., Bozcuoglu, A. K., Yuruten, O., Kalkan, S.,
& Sahin, E. (2013). Learning social affordances and using them for
planning. In Proceedings of the cognitive science society annual meeting
(pp. 3604–3609). Berlin, Germany: Cognitive Science Society.
Van Overwalle, F. (2009). Social cognition and the brain: A meta-analysis.
Human Brain Mapping, 30(3), 829–858.
Varela, F. J., Thompson, E., & Rosch, E. (1991). The embodied mind:
Cognitive science and human experience. Cambridge, MA: MIT Press.
Vernon, D. (2010). Enaction as a conceptual framework for developmental cognitive robotics. Paladyn, Journal of Behavioral Robotics, 1(2),
89–98.
Vernon, D. (2014). Artificial cognitive systems: A primer. MIT Press.
Vinciarelli, A., Esposito, A., André, E., Bonin, F., Chetouani, M., Cohn,
J. F., ... Heylen, D. (2015). Open challenges in modelling, analysis and
synthesis of human behaviour in human–human and human–machine
interactions. Cognitive Computation, 7(4), 397–413.
207
Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico,
F., & Schröder, M. (2012). Bridging the gap between social animal and
unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.
Wallach, W., & Allen, C. (2008). Moral machines: Teaching robots right
from wrong. New York, NY: Oxford University Press.
Wang, D. (2003). Visual scene segmentation. In M. A. Arbib (Ed.), The
handbook of brain theory and neural networks (pp. 1215–1219).
Cambridge, MA: MIT Press.
Warta, S. F. (2015). If a Robot did ‘‘The Robot”, would it still be called
‘‘The Robot” or just Dancing? Perceptual and social factors in humanrobot interactions. Proceedings of the human factors and ergonomics
society annual meeting (Vol. 59(1), pp. 796–800). SAGE Publications.
Warta, S. F., Kapalo, K. A., Best, A., & Fiore, S. M. (2016). Similarity,
Complementarity, and Agency in HRI Theoretical Issues in Shifting
the Perception of Robots from Tools to Teammates. Proceedings of the
Human Factors and Ergonomics Society Annual Meeting, 60(1),
1230–1234.
Wilkinson, M. R., & Ball, L. J. (2013). Dual processes in mental state
understanding: Is theorising synonymous with intuitive thinking and is
simulation synonymous with reflective thinking? In Proceedings of the
35th annual conference of the cognitive science society (pp. 3771–3776).
Austin, TX: Cognitive Science Society.
Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin
& Review, 9(4), 625–636.
Wiltshire, T. J. (2015). A prospective framework for the design of ideal
artificial moral agents: Insights from the science of heroism in humans.
Minds and Machines, 25(1), 57–71.
Wiltshire, T. J., Barber, D., & Fiore, S. M. (2013). Towards modeling
social-cognitive mechanisms in robots to facilitate human-robot
teaming. Proceedings of the human factors and ergonomics society
annual meeting (Vol. 57(1), pp. 1278–1282). . http://dx.doi.org/
10.1177/1541931213571283.
Wiltshire, T. J., & Fiore, S. M. (2014). Social cognitive and affective
neuroscience in human-machine systems: A roadmap for improving
training, human-robot interaction, and team performance. IEEE
Transactions on Human-Machine Systems, 44(6), 779–787.
Wiltshire, T. J., Lobato, E. J. C., Jentsch, F. G., & Fiore, S. M. (2013). Will
(dis)embodied LIDA agents be socially interactive? A commentary on
the target article entitled ‘‘Conceptual commitments of the LIDA model
of cognition”. Journal of Artificial General Intelligence, 4(2), 23–58.
Wiltshire, T. J., Lobato, E. J., McConnell, D. S., & Fiore, S. M. (2015).
Prospects for direct social perception: A multi-theoretical integration
to further the science of social cognition. Frontiers in Human
Neuroscience, 8, 1–22. http://dx.doi.org/10.3389/fnhum.2014.01007.
Wiltshire, T. J., Lobato, E. J. C., Wedell, A. V., Huang, W., Axelrod, B.,
& Fiore, S. M. (2013). Effects of robot gaze and proxemic behavior on
perceived social presence during a hallway navigation scenario.
Proceedings of the human factors and ergonomics society annual
http://dx.doi.org/10.1177/
meeting (Vol. 57(1), pp. 1273–1277).
1541931213571282.
Wiltshire, T. J., Smith, D. C., & Keebler, J. R. (2013). Cybernetic teams:
Towards the implementation of team heuristics in HRI. In Virtual
augmented and mixed reality designing and developing augmented and
virtual environments. Lecture notes in computer science (Vol. 8021,
pp. 321–330). Berlin, Heidelberg: Springer.
Wiltshire, T. J., Snow, S. L., Lobato, E. J., & Fiore, S. M. (2014).
Leveraging social judgment theory to examine the relationship between
social cues and signals in human-robot interactions. Proceedings of the
human factors and ergonomics society annual meeting (Vol. 58(1),
pp. 1336–1340). . http://dx.doi.org/10.1177/1541931214581279.