See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/286449212
Lifelong Augmentation of Multi-Modal
Streaming Autobiographical Memories
Article in IEEE Transactions on Autonomous Mental Development · September 2016
DOI: 10.1109/TAMD.2015.2507439
CITATIONS
READS
5
97
3 authors:
Tobias Fischer
Maxime Petit
6 PUBLICATIONS 12 CITATIONS
22 PUBLICATIONS 122 CITATIONS
Imperial College London
SEE PROFILE
Ecole Centrale de Lyon
SEE PROFILE
Yiannis Demiris
Imperial College London
186 PUBLICATIONS 2,330 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Personal Assistant healthy Lifestyle (PAL) View project
All content following this page was uploaded by Tobias Fischer on 14 September 2016.
The user has requested enhancement of the downloaded file.
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
Lifelong Augmentation of Multi-Modal
Streaming Autobiographical Memories
Maxime Petit*, Tobias Fischer*, and Yiannis Demiris
Abstract—Robot systems that interact with humans over extended periods of time will benefit from storing and recalling
large amounts of accumulated sensorimotor and interaction
data. We provide a principled framework for the cumulative
organisation of streaming autobiographical data so that data can
be continuously processed and augmented as the processing and
reasoning abilities of the agent develop and further interactions
with humans take place. As an example, we show how a
kinematic structure learning algorithm reasons a-posteriori about
the skeleton of a human hand. A partner can be asked to
provide feedback about the augmented memories, which can in
turn be supplied to the reasoning processes in order to adapt
their parameters. We employ active, multi-modal remembering,
so the robot as well as humans can gain insights of both the
original and augmented memories. Our framework is capable of
storing discrete and continuous data in real-time. The data can
cover multiple modalities and several layers of abstraction (e.g.
from raw sound signals over sentences to extracted meanings).
We show a typical interaction with a human partner using
an iCub humanoid robot. The framework is implemented in
a platform-independent manner. In particular, we validate its
multi platform capabilities using the iCub, Baxter and NAO
robots. We also provide an interface to cloud based services,
which allow automatic annotation of episodes. Our framework
is geared towards the developmental robotics community, as it
1) provides a variety of interfaces for other modules, 2) unifies
previous works on autobiographical memory, and 3) is licensed
as open source software.
Index Terms—Autobiographical Memory, Reasoning, Remembering, Developmental Robotics, Human Feedback, Robotics
I. I NTRODUCTION
H
UMANS have the ability to mentally travel in time.
They can go back in time by remembering their past
experiences, as well as predicting the consequences of future
actions based on these past events with their reasoning capabilities. This ability is crucial for autonomous robots [1, 2]
as it allows to adapt to the current situation. The important
cognitive component involved in this process is the autobiographical memory, which contains one’s past experiences. The
autobiographical memory is based on lifelong episodic and
semantic memories. The episodic memory stores past personal
The authors are with the Personal Robotics Lab, Department of Electrical
and Electronic Engineering, Imperial College London, UK. E-mail: {m.petit,
t.fischer, y.demiris}@imperial.ac.uk.
*M. Petit and T. Fischer contributed equally to this work.
Manuscript received August 10, 2015; revised October 13, 2015; accepted
November 18, 2015; date of publication December 09, 2015.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org
Digital Object Identifier 10.1109/TAMD.2015.2507439
experiences, precisely defined in space and time using multimodal perception; whereas the semantic memory contains general knowledge, e.g. facts or laws about the world [3–5]. Both
of these memories are declarative (explicit), i.e. consciously
accessible information. This is opposed to non-declarative
(implicit) memories, which are based on the collection of
non-consciously learnt capabilities [6]. One example of a nondeclarative memory is the procedural memory, which contains
skills and habits.
In this paper, we present an implementation of a dynamic
autobiographical memory system1 . We focus on the episodic
part, and are interested in the continuous data perceived during
an episode. Our framework is generic (i.e. independent of the
robotic platform, programming language, and modern desktop
operating system), and stores a full episodic memory. We
store as much data as possible in a continuous data stream
in contrast to other solutions [7]. This means that our system
gathers data during events in real-time (tested up to 100 Hz).
The data can then be replayed at a later time without losing
any information; and, more importantly, can be used as input
to reasoning modules. The data can originate from multiple
modalities, and span across several layers of abstraction.
Storing as much data as possible is especially useful in cases
of incremental development of reasoning algorithms, where an
initial implementation might only employ a single modality,
whereas a more advanced implementation might fuse multiple
modalities. If only the single modality was remembered in
the first place, additional data acquisition would be needed
for further development, whereas our system bypasses this
problem.
We base our work on a cognitive framework used to extract
regularities in human-robot interaction [8–10]. The framework
was used to learn knowledge about rules of games [8], spatial
and temporal properties [9], and pronoun identification [10].
These applications required the storage of data (such as objects
location, agent skeleton, relations) only at two key moments: at
the beginning (for pre-conditions) and at the end of an episode
(for consequences). This was sufficient for knowledge-based
reasoning [11], where the data about the action itself is not
relevant.
However, the continuous data recorded during an episode
is crucial for many other applications, such as autonomous
robots that learn from motor babbling [12] and imitation [13].
Furthermore, the storage of multiple modalities (such as joint
positions, camera images, tactile sensor values) is needed for
1 The software developed for this paper is available open source at
http://www.imperial.ac.uk/PersonalRobotics
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
Discrete Data
Body Schema Face Tracker
Sensors
Tactile
Continuous
Data
Input Interface
...
Speech
Multi-Modal
Remembering
ABM
Annotation
Main
Content
Argument
Content
OPC
Emotions
Entity
Action
Object
RTObject
Adjective
Agent
Image Visualisation
Continuous Data
instance
time
label port
subtype value frame
instance
time
label port
subtype value frame
instance
time /icub/head/state:o
label port
subtype
value 319
frame
1361
2015-02-26
5
-0.01
instance
time /icub/head/state:o
label port
subtype
value 319
frame
1361
2015-02-26
5
-0.01
instance
time /icub/left
label
port
value319
frame
1361
2015-02-26
/icub/head/state:o
5
-0.01
319
1361
arm/state:o
0subtype
-40.1
instance
time /icub/left
label
port
value 319
frame
1361 2015-02-26
2015-02-26
/icub/head/state:o
5
-0.01
319
1361
2015-02-26
arm/state:o
0subtype
-40.1
1361 2015-02-26
2015-02-26
/icub/head/state:o
-0.01 319
319
1361
2015-02-26
/icub/left
arm/state:o
-40.1
319
1361
/icub/left
arm/state:o
1 0055 25.02
1361 2015-02-26
2015-02-26
/icub/head/state:o
-0.01 319
319
1361
2015-02-26
/icub/left
arm/state:o
-40.1
319
1361
/icub/left
arm/state:o
25.02
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 211 0 3.229
-40.1 319
319
1361
2015-02-26
/icub/left
arm/state:o
25.02
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 2 1 0 3.229
-40.1 319
319
1361
2015-02-26
/icub/left
arm/state:o
25.02
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 3 2 1 5.226
25.02 319
319
1361
2015-02-26
/icub/left
arm/state:o
3.229
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o
25.02 319
319
1361
2015-02-26
/icub/left
arm/state:o
2 1 5.226
3.229
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 433 2 7.023
3.229 319
319
1361
2015-02-26
/icub/left
arm/state:o
5.226
319
1361
/icub/left
arm/state:o
1361
2015-02-26
/icub/left
arm/state:o
2
3.229
319
1361 2015-02-26
2015-02-26 /icub/left
/icub/left
arm/state:o
3
5.226 319
319
1361
arm/state:o
7.023
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 544 3 13.85
5.226 319
319
1361
2015-02-26
/icub/left
arm/state:o
7.023
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 5 4 3 13.85
5.226 319
319
1361
2015-02-26
/icub/left
arm/state:o
7.023
319
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 6 5 4 5.123
7.023 319
319
1361
/icub/left
arm/state:o
13.85
1361
/icub/left
arm/state:o
1361 2015-02-26
2015-02-26
/icub/left
arm/state:o 6 5 4 5.123
7.023 319
319
1361
2015-02-26
/icub/left
arm/state:o
13.85
319
1361
arm/state:o
1361 2015-02-26
2015-02-26/icub/left
/icub/left
arm/state:o
13.85 319
319
1361
2015-02-26
/icub/left
arm/state:o
6 55
5.123
319
1361
2015-02-26
/icub/left
arm/state:o
13.85
319
1361
2015-02-26 /icub/left arm/state:o
6
5.123
319
1361
2015-02-26 /icub/left arm/state:o
6
5.123
319
1361
2015-02-26 /icub/left arm/state:o
6
5.123
319
Output Interface
Annotation
Modules
Memories
Dialogue Interface
Tactile Visualisation
Proprioception
Request
Data
Add Reasoned
Data
Reasoning Modules
Kinematic Structure
Proprioception
Recognition
Visual
Fig. 1. Framework overview. The autobiographical memory (ABM) receives continuous data from the sensors of the robot (e.g. images, proprioceptive data,
tactile information) and external devices (e.g. images and skeleton information from the Kinect, and distance measurements from a laser scanner) during an
episode. This data is saved together with the respective annotations which are typically provided by action modules in a SQL database. At the beginning
and end of an episode, the contents of the working memory (e.g. emotions of the robot) are additionally stored in the ABM. After an episode is finished,
external reasoning modules can access the data and provide augmented memories. For example, one of the modules retrieves a stream of images, then reasons
about the kinematic structure of the contained objects, and finally stores the augmented images back in the ABM. Our framework also supports cloud based
services. In this figure, we show cloud based a-posteriori face recognition. The data can then be reproduced as an action, i.e. by visualising the images,
spoken language and recalling motor commands. The robot can make use of the a-posteriori provided annotations (e.g. naming a previously unknown person)
and augmented images (e.g. the skeleton of an object). The robot can ask a partner to judge the quality of the augmented images, which can then be used to
improve the underlying algorithms.
various reasoning processes, such as kinematic structure extraction [14]. Also, computationally expensive computer vision
algorithms might benefit from our framework, as some of these
algorithms cannot currently run in real-time, which reduces
their usage in the area of robotics. The system provides also a
spoken language interaction interface to allow the robot to ask
for feedback about the quality of the results of such reasoning
algorithms. This is especially useful if the quality of the output
is hard to quantify automatically (e.g. the kinematic structure
of a human hand), which precludes reinforcement learning.
The reasoning algorithms can then provide candidates from
earlier episodes to a human, potentially several hours or days
after the actual episode happened. We propose a generic
autobiographical memory which is able to store and provide
streaming data for these and related applications.
Our cognitive framework respects the requirements detailed
in [15], a recent version of the popular Soar architecture:
i) store knowledge of memories; ii) extract, select, combine,
and store features; iii) represent the stored knowledge using a
syntax. An overview of our framework is shown in Figure 1
and will be detailed in Section III. In our framework, we
separate the memory from the inference systems, i.e. the data
storage is independent of the data processing, following the
principle of [16]. This allows the memory to be used in a
wider range of applications instead of being task, domain or
robot specific.
The first contribution of this paper is the ability to remember
episodes in the visual, spoken and/or proprioceptive modalities. It has been shown that the addition of visual information
to language can help providing a memory prosthesis for elderly
users [17]. In our work, we extend this concept by also
recalling the proprioceptive data. This is feasible as we have
a full episodic memory which allows the robot to re-live past
episodes by replaying previously executed joint sequences.
We propose that employing a wider range of modalities when
remembering episodes leads to a more natural interaction. In
this respect, we are mirroring the work of Perzanowski et
al. [18], who proposed that a robot should recognise multimodalities of a human to improve the interaction.
The second contribution is providing data of episodes to al-
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
low a-posteriori reasoning. This requires providing data to the
external module, retrieving the reasoned data, and linking them
to the original episode. This is especially useful for inference
algorithms that cannot be used in real-time. Also, this implies
our memory is dynamic with newly acquired knowledge used
to revisit past episodes which are reinterpreted according to
the new information. The robot can then be considered as a
constantly adapting developmental agent. In this article, we
show the integration of an external inference algorithm which
computes the kinematic structure of articulated objects[14].
The autobiographical memory can provide a stream of annotated images to the module, which sends augmented images
(original images superimposed with the kinematic structure) in
return. Thus, our autobiographical memory framework allows
a robot to visualise the results of time-consuming reasoning
algorithms as if the reasoning had happened in real-time.
As third contribution, our framework allows the data exchange with cloud based services. As a proof of concept,
we use the face recognition and scene understanding services
provided by ReKognition2 and Imagga3 to gain information
about the contents of images captured by the robot cameras.
The fourth contribution is the capability to obtain and
store spoken feedback from a human about the quality of
reasoned visual data. In a typical scenario, the robot performs
or observes an action, which is stored as memory. Then, these
memories are provided to reasoning algorithms, which create
augmented memories in return (see second contribution). As
the quality of these reasoning processes is hard to judge/rank
for the robot itself, the robot is asking humans for feedback. Importantly, as the episodes are stored as memory, the
feedback can be given to episodes far in the past, even if
the reasoning process takes a long time. As described in the
first contribution, the human is not only presented with the
augmented memory to be ranked, but the episode as a whole
is relived. The ranking is then provided back to the reasoning
module, and can be used to improve the parameters as known
from reinforcement learning.
As suggested by Baxter and Belpaeme [19] as well as
Lallee et al. [20], social robots require a long-term memory
to improve long-term interaction with humans. Therefore, as
our fifth contribution, we introduce human-robot interaction
capabilities for all of the previously mentioned contributions.
That is, we allow humans to interact with the robot in a multimodal, multi-temporal (interaction in the present about past
experiences) manner. The robot makes use of the reasoned
data, and accesses the cloud based services, when interacting
with the human.
II. R ELATED WORK
One of the first concrete attempts to provide a cognitive architecture for general intelligence is Soar [21]. This framework
implements a long-term memory storing production rules,
and is thus forming a procedural non-declarative memory.
The framework also contains a short-term memory using a
2 http://rekognition.com
3 http://imagga.com
symbolic graph to represent object properties and their relations. More recently, this framework has been extended with
semantic and episodic long-term memories, that can process
low-level, non-symbolic representations of knowledge [15].
Another important advance in providing a generic memory
module was made by Tecuci and Porter [16]. In order to avoid
bias due to a specific domain or task, the memory system was
separated and was made completely independent from the inference mechanisms. The provided episodic memory can then
be attached to different applications which use the retrieved
memories in different manners depending on the specific task.
Episodes were divided in context (initial situation), outcome
(effect of the episode) and contents (sequences of action). The
sequences of actions to describe an episode are a step towards
recording a stream of data, however the description is sparse as
the basic component are actions rather than sensor data (highlevel information vs. low-level information). In our framework,
as we aim to visually and actively remember even unknown
actions/episodes, we additionally require low-level data which
is more dense.
More recently, the Multilevel Darwinist Brain [22] was
based on artificial neural networks and was tested with several
robots (Pioneer 2, AIBO and Hermes II). The networks
emerged from an artificial evolution algorithm for automatic
acquisition of knowledge, including both a short-term and
long-term memory. The short-term memory network stores
action-perception pairs (“the facts”), whereas the long-term
memory network gathers post-treatment knowledge (“situations” and “behaviours”) from the short-term memory data.
The long-term memory is therefore procedural, i.e. the links
to the original past events are lost, and thus individual events
cannot be remembered.
The previous architectures stored mainly high level discrete
data, such as pre-defined action labels, applied to either artificial agents or non-humanoid robots. The Extended Interaction
History Architecture [23] makes use of the iCub, a complex
humanoid robot. The architecture allows the robot to learn
a successful interaction from simple actions sequences. The
learning is based on the immediate feedback from a human
in collaborative tasks, which is affecting the social drive of
the iCub. The feedback is non-declarative, based on social
engagement cues such as visual attention or human-robot
synchronicity. In contrast to our method, the feedback is given
to a recent action, rather than giving feedback about the
outcome of a reasoning process. Although there is an intention
to store continuous variables, they are discretised (into 8 bins
over the range of each variable) or reduced (intensity image
of 64 pixels, audio stream filtered to extract drumbeats). This
could prove too restrictive for reasoning modules requesting
high-frequency, high-resolution data.
In 2012, a generic robot database was developed based on
the MongoDb database architecture and ROS middleware [24].
The database provides a long term memory to store and
maintain raw data. Arbitrary meta-data can be linked to the
data, allowing multi-modal aggregation. This memory system
was successfully integrated in a cognitive robotic architecture
called Cognitive Robot Abstract Machine (CRAMm ) [25, 26].
As in [16], CRAMm is based on a separation between the
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
knowledge and continuous database. The continuous database
is thus an episodic long term memory. It is able to store
low level data, as well as complex motions and their effects.
However, images are stored only at key moments, usually
at the beginning and the end of an action. This limits
their a-posteriori reasoning to non-vision based algorithms.
Furthermore, CRAMm is targeted towards the robot system
itself rather than human-robot interaction, and the memory is
queried using Prolog.
To summarise, we subscribe to the idea of separating the
episodic long term memory from reasoning and inference
mechanisms [16, 24]. Our memory is generic, allowing different robots to access a full episodic memory [7]. In contrast
to an explicit (often procedural) long-term memory [21, 22] or
reduced episodic memory [23, 26], we propose a full episodic
memory system which may be useful to a larger spectrum
of reasoning applications, as well as being able to reproduce
actions in an exact manner. We store all available data at
a high frequency (sensor dependent, tested up to 100Hz)
from multiple modalities during the whole episode, including
original and augmented images. That allows the usage of
newly emerged or implemented reasoning modules to work aposteriori and store augmented memories. It is also possible to
retrieve and recall precise episodes, using multiple modalities
including visual imagery of the scene.
III. AUTOBIOGRAPHICAL MEMORY FRAMEWORK
In this section, we introduce the design of our autobiographical memory (ABM) framework, as well as its implementation. Discrete data about the situation (e.g. localisation
of objects, human positions) present in a short-term memory
are transferred to the long-term memory at the beginning and
at the end of an episode, following [9]. Episodes are delineated
in two ways, depending on who is acting. When the iCub is
acting (e.g. doing motor babbling), the corresponding module
automatically provides the begin and end of an episode. When
the human is acting, spoken cues are extracted to delineate and
annotate the episode (e.g. “I am doing babbling with my right
hand” as beginning cue, and “I finished babbling” as cue for
the end).
We will present the framework’s features (unifying previous
works), linked to the possibility to handle high-frequency
streaming data (e.g. joint angles, camera images, tactile sensor
values). We will discuss its A) platform independence, B)
performance, C) multi-modality, with each modality covering
multiple levels of abstraction, and D) synchronisation of data.
A. Platform and robot independence
The platform independence is conditional on the libraries
used in our framework, as well as how the data is represented
within the framework. PostgreSQL4 and Yet Another Robot
Platform (YARP) [27] are the two dependencies of the ABM.
PostgreSQL is a SQL database management system, and
YARP is a robotic middleware used for the communication
with external modules. Both dependencies are known to work
in Windows, Linux and OS X amongst others.
4 http://www.postgresql.org/
Visual
Proprioception
Image Visualisation
Tactile
Multi-Modal
Remembering
Sensors
Fig. 2. iCub setup. Our autobiographical memory framework acquires data
from various sensors of the robot, namely of the two eye cameras, the state of
the joints, and tactile information. The range of sensors can be easily extended,
in this setup we use an external Kinect camera. When recalling episodes, the
robot visualises the original memories together with augmented memories if
they are available. Also, the iCub relives the episode using its motor system.
The human can interact with the iCub, and for example can provide feedback
about remembered data.
YARP allows algorithms running on different machines to
exchange data. The data is transmitted through connection
points called ports (equivalent to topics in the Robot Operating
System [ROS]). The programs do not need to know details
about the underlying operating system or protocol, and can
be relocated across the different computers on the network.
The communication is therefore platform and transport layer
independent. The interfaces between modules are specified by
a) the port names, b) type of data these ports are receiving /
sending and c) the connections between these ports (e.g. port
/writer will be connected to port /receiver). The connections
used in our framework are represented by arrows in Figure 1.
We propose also an optional human-robot interface system
based on spoken language interaction. We use the Microsoft
Speech Recognition which detects the spoken utterance, and
also provides the semantic roles of the words, as defined
by a given grammar. The grammar allows the extraction of
keywords corresponding to the semantic role. For example,
“Can you remember the last time HyungJin showed you motor
babbling?” is recognised using the grammar rule “Can you
remember the <temporal cue> time <agent cue> showed you
<action cue>?”. Therefore, not only the sentence as a whole
is sent to the ABM, but also the role of semantic words. The
question is about an action “motor babbling” done by an agent
called “HyungJin” for the “last” time. This annotation can then
be used to retrieve the instance of a specific episode from the
memory using SQL queries (see Section IV and Figure 1:
Annotation Modules).
In our experiments, we focus on the iCub humanoid robot
platform [28], which is built upon YARP. We represent its
internal state by saving joint positions and velocities, as well
as pressure sensed by its skin. Furthermore, we store images
captured by the two eye cameras and the Kinect camera
mounted above the robot, as well as the spoken utterance of
the human partner. See Figure 2 for our set-up with iCub.
Using the Baxter and NAO robots, we show that our
framework is generic and not bound to a specific robot. The
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
TABLE I
OVERVIEW OF STORED DATA FOR THE DIFFERENT ROBOTIC SYSTEMS , INCLUDING THE SENSOR FREQUENCY
Data
RGB Kinect
Joints
RGB cameras (high res)
RGB cameras (low res)
Skin
Joints
RGB camera
Joints
RGB head camera
Type
RGB Image (640*480)
Double
RGB Image (640*480)
RGB Image (320*240)
Double
Double
RGB Image (320*240)
Double
RGB Image (640*400)
robots differ in various aspects: a) joints (number, position,
range), b) cameras (number, resolution), c) tactile sensors are
absent from the Baxter and NAO robots, d) the underlying
middleware (YARP for the iCub, ROS for the Baxter, NAOqi
for the NAO). See Table I for an overview of stored data for
the different robotic systems.
The platform independence is achieved by choosing a data
representation that is generic. Internally, the continuous data
are stored in two database tables, one for textual/numerical
data and another one for binary data (e.g. images and sound).
We suggest to separate these data due to their different nature,
with a very high information amount for binary data on the
one hand, and a vector of numbers or text on the other hand.
Table II shows a representation of the database table which
is designed to store data in a general manner using key-value
pairs. To anchor one row in the database to other data, the
instance of the episode, and the time the data was acquired are
needed. This allows the bundling of data which was acquired
at the same time. In addition, we save the port where the data
originated. For each port, there are N rows representing the
N values acquired from the sensor. Instead of pre-coding a
specific subset of values (e.g. Njoints,l arm,iCub = 16 joints
for the left arm of the iCub, against Njoints,l arm,N AO = 6
for the NAO and Njoints,l arm,Baxter = 7 for the Baxter),
the streaming data groups are split (with key from 1 to N )
and the value are stored one by one, allowing as many values
per source as needed. Therefore, the key-value representation
is independent of the number of robot limbs, tactile pressure
sensors, etc.
Table III shows the storage of binary data. The structure is
similar to that of Table II, but the value column is replaced
by the columns relative path and oid (object identifier). The
former is used to store the path to a file on the hard disk
containing binary data such as an image or sound, to either
import this file to the database or export an file with the
identifier oid on the hard drive. The latter is a link to an object
file within the database. Images can be stored using any image
format (we use tif ). The additional column augmented is used
to indicate the origin of this image (a sensor or an a-posteriori
reasoning module). Sound (e.g. utterance said by the human)
is stored as one uncompressed .wav file per sentence. The
sound serves as discrete meta-information, and is therefore
not treated as continuous data.
The framework, written in C++, can be extended using
a variety of programming languages which are currently
supported by YARP: Java, Python, Perl, C#, Lisp, TCL, Ruby,
Number/frame
1
53
2
2
4224
25
1
16
1
Frequency (Hz)
30
100
15
25
50
50
15
100
25
MATLAB, etc. As a proof of concept, we use the algorithm
presented in [14], which is written in MATLAB, with our
memory framework. The data is exchanged via YARP ports.
The algorithm along with the minor modifications needed for
our framework is described in Section IV-A.
B. Performance optimisation
A common problem is the storage of images in a database in
real time [26]. This is due to the limited computational power
of an embedded system, especially when multiple modules are
running as typically during interactions. However, recording
images in real-time is desirable to be able to replay them
without lag, as well as to provide them to reasoning modules in
a continuous manner. We suggest to store the images temporarily as image files on the hard drive (in folder relative path)
while an episode is memorised. Once the episode has ended,
the images are then stored as objects in the database (oid
links to the object), which allows shared memories, but is
computationally and input/output expensive. Similarly, we use
this concept of delayed processing to store textual/numerical
data first in memory (rather than the hard drive), and record
it in the database after the end of an episode.
Another performance gain is achieved by discarding data
which is below a baseline. For example, storing the values
of all 4224 skin taxels of the iCub skin at 50 Hz is compucamera
FPS (percentage of maximum)
Robot
All
iCub
iCub
iCub
iCub
NAO
NAO
Baxter
Baxter
joints
kinect
skin
100
75
50
25
0
n
n
ed
ne
ed
line atio
tio
lay
lay
se
sa
s
De
De
mi
Ba alleli
i
/
t
r
e
Op
Pa
elin Full
e/
as
n
B
i
l
e
No
s
Ba
Fig. 3. Performance comparison. The obtained frames per second of the
non-optimised version are compared with each optimisation individually
first, and then with a combination of the optimisation techniques. Only the
combination of all optimisation techniques allows storing of all data at the
highest frequency. We plot the average frames per second as percentage of
the maximum, which is depending on the sensor type (see Table I).
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
TABLE II
A BM S QL TABLE FOR TEXTUAL AND NUMERICAL STREAMING DATA
instance
1042
1042
...
1042
1042
...
1042
...
time
2015-01-15 13:51:15.186846
2015-01-15 13:51:15.186846
...
2015-01-15 13:51:15.186846
2015-01-15 13:51:15.186846
...
2015-01-15 13:51:15.186846
...
port
/icub/head/state:o
/icub/head/state:o
...
/icub/skin/torso comp
/icub/skin/torso comp
...
/kinect/skeleton:o
...
id
1
2
...
1
2
...
1
...
value
12.0
42.0
...
198.0
174.0
...
3.2
...
relative path
1042/camLeft1.tif
1042/camRight1.tif
...
1042/kinstructure1.tif
...
oid
46889
46890
...
51511
...
TABLE III
A BM S QL TABLE FOR BINARY DATA
instance
1042
1042
...
1042
...
time
2015-01-15 13:51:15.186846
2015-01-15 13:51:15.186846
...
2015-01-15 13:51:15.186846
...
port
/icub/camLeft
/icub/camRight
...
/icub/camLeft/kinstructure
...
augmented
...
kinematic structure
...
TABLE IV
A BM S QL TABLES FOR EPISODES MAIN INFORMATION AND ANNOTATION
Annotation table
Main-information table
instance
1042
1043
...
time
2015-01-15 13:51:15.186846
2015-01-15 13:51:26.924655
...
activityname
motor babbling
motor babbling
...
activitytype
action
action
...
tationally expensive. However, most of the time the iCub is
either not touched at all or just on a small sub-part of the skin.
Therefore, the output vi of the skin taxel i is near a known
baseline θ when the skin is not touched; and these data are
not stored in the autobiographical memory:
⇢
vi if vi > θ
.
vi,memory =
; otherwise
The memory is still complete: all values can be recovered, as
the value is equal to the baseline if no value is found in the
memory:
⇢
vi,memory if 9vi,memory
vi,recall =
.
θ
otherwise
We further improve the performance of the framework by
parallelising the data acquisition from the network ports and
subsequent data pre-processing and storage. We suggest one
thread for the binary data providers, and another thread for
the textual/numerical data providers. As shown in Figure 3,
this allows recording at maximum frequency for all providers.
If the maximum performance still cannot be achieved, the
framework is designed in a way that the data acquisition can
be performed using one thread per incoming port, which can
further increase the performance if needed.
Sometimes we do not want to acquire data at the maximum
frequency fmax due to computational and memory limitations
in robotic systems. In our framework, we can enforce the
throttling of high-frequency data, similar to [26]. Rather than
requesting data from the incoming ports as often as possible,
data is requested only flimit times per second. This reduces
the amount of stored data by
N
1 X fmax,i
,
N i=1 flimit
begin
TRUE
FALSE
...
instance
1042
1042
...
argument
HyungJin
hand
...
role
agent
part
...
where fmax,i is the maximum frequency of sensor i 2
{1, . . . , N } (assuming flimit fmax,i , 8i).
C. Multiple Modalities and Abstraction Levels
Recent robotic platforms such as the iCub have a vast
amount of sensors to sense the environment, as well as dozens
of actuators to interact with it. It is desirable to record the
output of all sensors, as well as the state of the actuators, in
the full autobiographical memory. This allows remembering
events in an exact manner, which is especially important
in case algorithms are asked to reason about these events.
Saving all this information is also crucial to be able to recall
knowledge in a multi-modal manner. If information is missing,
the visualisation of images would be laggy, and recalling
motor states would be jerky. Perzanowski et al. propose the
recognition of multiple modalities of a human in order to
ease the interaction with the robot [18]. We believe that the
interaction is further improved if the robot itself employs
multiple modalities. Therefore, we think that using the wide
range of data stored in the autobiographical memory will lead
to a more natural human-robot interaction in the future.
Using our framework, we are able to store data from all
common sensors in real-time: images from cameras, sentences
acquired by speech recognition, localisation of the robot,
pressure exerted on the skin, values read by the encoders of
the motors, as well as force applied to the limbs (see Figure 1:
Sensors). Due to its modular design, the framework can easily
be extended should additional data be required.
More interestingly, our framework supports multiple abstraction levels for each of the modalities. Starting from raw
data (e.g. microphone recordings), over intermediate complexity (e.g. sentences) to refined complexity (e.g. semantic
memories using a-posteriori reasoning, the ability to actively
recall past episodes, as well as the capacity to ask for human
feedback.
Action
Predictions
Labeled
Scenes
Extracted
Meanings
Joint
Velocities
Processed
Images
Sentences
Joint
Positions
RGB and
Depth Images
Raw Sound
Signals
Proprioception
Visual
Language
A. A-Posteriori Reasoning in a Lifelong, Complete Memory
M
od
al
ity
ab Lev
st el
ra of
ct
io
n
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
Fig. 4. Overview of supported modalities, with examples given for each
abstraction level. As discussed in the text, the framework is not limited to these
data, and can be easily extended due to the underlying YARP middleware.
Typically, the framework just needs to be made aware of the name of the new
YARP port (or ROS topic) the data is incoming from.
annotations); all data are stored in a single memory. Figure 4
provides an overview of the supported data so far.
D. Synchronisation
Anchoring is the problem of establishing “a correspondence
between the names of things and their perceptual image” [29].
A similar problem is faced when building a memory system,
as a correspondence between the i 2 {1, . . . , N } sensor inputs
is desirable. We solve this problem by referring to the time
Trequested,i , which denotes the time the data was requested
for sensor i. We propose to request all sensor data at the same
time Trequested , and therefore:
Trequested,i = Trequested
8i 2 S ✓ {1, . . . , N }.
This provides the advantage that at time Trequested , the whole
state of the robot is known (Sf ull = {1, . . . , N }). This is in
contrast with storing the time Tpublished,i , which denotes the
time the data was sent to the port by sensor i, and generally
differs across sensors:
Tpublished,i 6= Tpublished,j
i 6= j; i, j 2 {1, . . . , N }.
Therefore, there is no single time Tpublished where the state
of the robot is known as a whole.
Given the synchronisation by Trequested , external modules
can acquire the full robot state for given times. For example,
one could imagine a module applying sensor fusion to decrease
the uncertainty of sensory inputs [30].
IV. A-P OSTERIORI R EASONING AND M ULTI -M ODAL
R EMEMBERING
So far, we have discussed how the data is stored within
the autobiographical memory framework. In this section, we
describe ways of accessing and modifying the data, including
the interfaces allowing for the communication between the
ABM and other modules (typically reasoning modules). Examples are provided for each of the interfaces, including cloud
based solutions for face recognition and scene understanding.
In particular, we focus on modules providing augmented
In order to be able to reason about past episodes in a
framework with separated memories and reasoning algorithms,
the memories have to be accessible by the reasoning algorithms. First, we show how data about past episodes can
be extracted, and we then describe the process of adding
augmented memories.
Annotation of Episodes: Our framework provides a synergy between a complete memory and automatically annotated
data. As defined earlier, a complete memory requires that
all available data is stored. Our framework also allows to
store meta information. The meta information is currently
automatically acquired from a spoken language dialogue;
however any other module can annotate data (see Figure 1:
Annotation Modules). For example, a face recognition module
might annotate the entry and leaving of a specific person in
a scene. Also, reasoning modules might refine or add meta
information to episodes.
Extracting Knowledge for Modules: The autobiographical
memory is designed within a SQL database. The framework
thus benefits from the intrinsic qualities of such a language to
store and extract knowledge, from creating a subset of episodes
sharing common properties to finding a unique instance based
on specific cues. Extracting knowledge requires sending data
related to the requested episode. Note that the amount of data
which can be accessed is substantially higher in our framework
compared to other works, due to the higher frequency of data
which is being recorded.
Two steps are needed to extract knowledge: i) The first step
is finding the desired identifiers of the episodes which are of
interest. This depends on the problem at hand. For example,
after recognising a person, the identifier of the episode when
the robot first met this person might be needed by the
module. In another scenario, all episodes where a specific
object was involved might be of interest. ii) The second
step is the retrieval of data related to these episodes using
SQL queries. For instance, based on the main and annotation
tables linked to episodes, “Can you remember the <last>
time <HyungJin> showed you <motor babbling>?” spoken
request from the human can be translated to the SQL request
in Listing 1, targeting the database tables shown in Table IV:
−−E x t r a c t t h e i n s t a n c e and t h e t i m e
SELECT main . i n s t a n c e , main . t i m e
−−Both , t h e main and a n n o t a t i o n t a b l e a r e i n v o l v e d
FROM main INNER JOIN a n n o t a t i o n
ON a n n o t a t i o n . i n s t a n c e =main . i n s t a n c e
−−L o o k i n g f o r e p i s o d e s i n v o l v i n g ’ m o t o r b a b b l i n g ’ , where ’
HyungJin ’ was t h e ’ a g e n t ’
WHERE main . a c t i v i t y n a m e = ’ m o t o r b a b b l i n g ’ AND a n n o t a t i o n .
a r g u m e n t = ’ H y u n g J i n ’ AND a n n o t a t i o n . r o l e = ’ a g e n t ’
−−R e t r i e v e t h e l a s t i n s t a n c e o f s u c h an e p i s o d e
ORDER BY i n s t a n c e DESC LIMIT 1 ;
Listing 1. Example of a SQL request showing the extraction of an instance
for a specific agent performing a certain action.
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
This query is designed to retrieve only a unique and precise
episode, but a request can also provide subsets of episodes. For
example, to extract regularities a reasoning algorithm might
want to extract all the “motor babbling” actions done by
“HyungJin”. It can then use the previous SQL query without
the “ORDER BY instance DESC LIMIT 1” (which limits the
result to the row with the highest instance number). Using the
same principle, we implemented the ability to either retrieve
all related sensor data Sf ull , or a subset Ssub ⇢ Sf ull (e.g.
just the joints positions, or just the images from the left
camera) from one or several episodes. An interface for the
most frequently used queries is provided in the framework,
e.g. finding episodes where something failed or extracting
augmented images for episodes of a certain activity.
This knowledge is usually extracted by action modules. For
example, a module for visual search of an object retrieves
previous locations of the desired object. The module could
then adapt its search strategy for the object accordingly.
In this scenario, the autobiographical memories could help
reducing the search time by providing information about past
episodes [31].
A-Posteriori Reasoning: As the memory is fully episodic,
i.e. storing all data without forgetting, one can take advantage
of using a-posteriori reasoning about episodes in the lifelong
memory. This is in contrast to creating ad-hoc experiments
in non-complete memory systems because data is missing
in these systems. A-posteriori reasoning in our framework
requires three steps. First, the data for specified episode(s)
needs to be acquired (as above). Second, the data needs to
be processed, i.e. the actual reasoning step (in an external
module). Third, the reasoned data is sent back to the autobiographical memory. In comparison to previous works, the data
is not limited to the state of the world before and after the
episode. In our framework, modules can access the state of
the world in a continuous manner, and include this knowledge
in their reasoning.
As many algorithms in a robotics environment employ batch
learning rather than online learning [32], it is crucial that past
episodes (serving as inputs) can be associated with the result
of these algorithms. In our system, we allow accessing past
episodes in a standardised form, supplying meta information
together with the actual streaming data (see Figure 1: Request
Data). The meta information contains the instance, port and
time. The algorithm can then perform the required computations, and return the result along with the meta information
so it can be associated to the original data. This allows the
outcome of offline reasoning algorithms to be stored, and more
importantly recalled, alongside the original data (see Figure 1:
Add Reasoned Data and Memories).
This feature is crucial to provide human feedback as reward
function to reinforcement learning algorithms. This allows to
explore and learn from unsupervised experiences, as well as to
show the most promising results to a human. Such a strategy
was already employed in robotics to learn navigation [33] or
manipulation [34] tasks, as well as the rules of the rock-paperscissors game [35]. However, in these cases, the algorithm
was able to produce the computed candidate action in realtime. Here, we also allow algorithms which work as offline
i n s t = 1 0 4 2 ; % S p e c i f y i n s t a n c e number
% 1 ) Open p o r t t o ABM and p o r t t o r e c e i v e i m a g e s
port2ABM . open ( ’ / k i n S t r u c t / to ABM ’ ) ;
Network . c o n n e c t ( ’ / k i n S t r u c t / to ABM ’ , ’ /ABM/ r p c ’ ) ;
p o r t I n c o m i n g . open ( ’ / k i n S t r u c t / i m g i n ’ ) ;
Network . c o n n e c t ( ’ /ABM/ i c u b / cam / l e f t ’ , ’ / k i n S t r u c t / i m g i n ’ ) ;
p o r t I n c o m i n g . s e t S t r i c t ( ) % Keep a l l i m a g e s i n b u f f e r
% 2 ) S t a r t s t r e a m i n g and e x t r a c t number o f i m a g e s f o r i n s t
bStartStreaming = [ ’ triggerStreaming ’ , inst ];
port2ABM . w r i t e ( b S t a r t S t r e a m i n g , b R e s p o n s e S t r e a m i n g ) ;
num images = b R e s p o n s e S t r e a m i n g . g e t ( 0 ) ;
% 3 ) E x t r a c t meta i n f o r m a t i o n and raw image one by one
f o r f r a m e = 1 : num images
bRawImages{ f r a m e } = p o r t I n c o m i n g . r e a d ( ) ;
bImageMeta{ f r a m e } = p o r t I n c o m i n g . g e t E n v e l o p e ( ) ;
end
% 4) P r o c e s s images
bAugmentedImages = g e t k i n S t r u c t u r e ( bRawImages ) ;
% 5) send augmented images back
p o r t O u t g o i n g . open ( ’ / k i n S t r u c t / i m g o u t ’ ) ;
Network . c o n n e c t ( ’ / k i n S t r u c t / i m g o u t ’ , ’ /ABM/ a u g m e n t e d i n ’ ) ;
f o r f r a m e = 1 : num images
%p r o v i d e l a b e l
bImageMeta{ f r a m e } . a d d S t r i n g ( ’ k i n e m a t i c s t r u c t u r e ’ )
p o r t O u t g o i n g . s e t E n v e l o p e ( bImageMeta{ f r a m e }) ;
p o r t O u t g o i n g . w r i t e ( bAugmentedImages{ f r a m e }) ;
end
Listing 2. Simplified MATLAB code showing the interaction between the
kinematic structure learning algorithm and the ABM.
reasoning processes, i.e. where computing the proposed model
takes a substantial amount of time. By remembering original
and a-posteriori created augmented memories at the same
time, the robot can obtain the human feedback/reward as if the
reasoning happened in real-time. Moreover, the framework is
also compatible with the active learning approach where the
learner queries the data which is labelled by a human oracle.
This is known to achieve better accuracy with less examples
compared to passive learning implementations [36].
One offline reasoning algorithm which was incorporated in
the framework is that of [14]. It is using a stream of images to
extract the kinematic structure of objects. The whole algorithm
was written in MATLAB, and was extended to work with our
framework after the implementation of the core algorithm was
already done. The extraction of the kinematic structure for a
30 seconds video sequence takes approximately three minutes.
The extensions needed to connect a reasoning algorithm to
the autobiographical memory are as follows: 1) a YARP port
needs to be opened to communicate with the autobiographical
memory framework. 2) The number of images related to the
desired episode are retrieved. 3) Each image is retrieved in a
loop, together with the belonging meta information (which
is stored in a temporary variable). 4) Then, the kinematic
structure of the object in the video is computed. Note that this
is the only step which depends on the algorithm used, all other
steps remain the same. 5) Sending of the augmented images
to the ABM, where the images are sent one by one together
with the appropriate meta information. The code related to this
communication can be found in Listing 2 and contains only
small additions to a typical reasoning module.
We also implement the provision of images to cloud based
services which are based on RESTful APIs, and exchange
data via JSON (see Figure 1: Reasoning Modules - Recognition). RESTful APIs using JSON is a widely used combination by cloud based services (for a thorough list, see
http://www.mashape.com). The reasoning with cloud based
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
services differs in several ways from the kinematic structure
learning algorithm. First, the data communication is done via
JSON rather than YARP ports. Second, instead of augmented
images as in the kinematic structure learning, the cloud based
service is employed to gain additional meta information. We
use the services provided by ReKognition to automatically
annotate images by the name of the person as soon as a
person greets the iCub. This requires a training step where
a small amount of images (in the range of 5-10 images) of
people to recognise are uploaded to the web service. Then, the
service provides the name of the recognised person together
with a certainty. If the certainty is above a threshold θ = 0.85,
a new row in the annotation database table (see Table IV)
is inserted with “role=agent” and “argument=name”, whereas
“name” is the name of the recognised person. This information
is then used in return to greet the person. Please note that the
annotation of the images therefore happens fully automatic in a
short amount of time (⇠1 second). Furthermore, we integrated
the “tagging” service by Imagga, which allows the automated
annotation of images. This can in turn be used by the robot to
proactively engage a conversation with a human by describing
the scene the robot is currently perceiving, and thus improving
the human-robot interaction.
automatically. The new episode is annotated with the current time and the state of the environment, including an
image of one of the robots eye cameras. Furthermore, an
agent whose name is “unknown” is linked to the episode.
Then, the autobiographical memory module automatically
provides the face recognition module with the image of
the human. The image is uploaded to ReKognition, which
provides the name of the human and confidence of the
recognition in return (e.g. <Martina, θ = 0.92>).
A2 ) Based on the recognition in A1 ), the human is greeted.
• In case of a high confidence that the person is known
(θ > 0.85), the iCub answers:
“Hello Martina. What do you want to discuss?”
• In case of a low confidence (i.e. the person is not
known), the iCub asks the human for his/her name:
“Hello. I have not met you yet, what is your name?”
In this example, we follow the case when the human
is known. From this point, four kind of interactions are
supported: B) remembering a unique event, with and without
augmented memories, C) remembering a subset of events,
including active recalling with the robot’s body, D) creation
of new memories using an action module, and E) acquiring
human feedback about the quality of reasoning results. Here,
we will show each of them in this order.
B. Multi-Modal Remembering
We provide data in a multi-modal, natural manner. The
robot expresses itself using language, visual information as
well as its body (see Figure 1: Multi-Modal Remembering).
Language is an important modality for developmental robots,
as it is a natural form for humans to express knowledge and
is needed in a large range of social learning applications. For
example, using language humans can teach behaviours [37],
categories [38] or shared plans [39] to a robot.
Additionally to language, streams of videos associated with
an episode can be recalled. This has been previously used to
provide a memory prosthesis for elderly users [17]. We are
going further by also allowing for proprioceptive recalling.
The action which was performed in an episode can be performed again using the actuators of the robot. Using all these
modalities, a robot can act in a more expressive manner.
V. E XAMPLE HUMAN - ROBOT INTERACTION
To examine the usability of our framework, we have tested
it in a human-robot interaction5 . In the designed scenario, the
iCub retrieves knowledge about past episodes from the ABM,
and uses language, visual imagery as well as motor actions
to express itself. We also show how feedback acquired from
a human can be used to improve the reasoning skills of the
robot. It is important to notice that this dialogue is just an
example, and can be easily extended using additional external
modules, which are beyond the scope of this paper.
A. Interaction A: Greeting the human
A1 ) Human walks into scene. As soon as the face of the
human is detected, an episode NewPerson is triggered
5 Video
available online: http://www.imperial.ac.uk/PersonalRobotics
B. Interaction B: Remembering a unique event, with and
without augmented memories
B1 ) Martina: “Do you remember the last time Hyung-Jin
showed you motor babbling?”
The iCub extracts the semantic role and words using the
Microsoft Speech Recognition software. Based on a given
grammar, the iCub detects it is asked for the last time
(rather than e.g. first or second time) a specific agent
(Hyung-Jin) did a certain action (motor babbling).
B2 ) An SQL query is generated based on B1 , to get information about the referred episode, and the iCub answers:
iCub: “Yes, it was one month ago, on January 26th”. The
iCub then goes ahead and uses visual imagery to clarify
its memories, only using original data by default.
• Martina: “Have you extracted a kinematic structure of
his hand?”
The iCub detects that Martina still refers to the same
episode, rather than asking about a new episode, as no
time point is mentioned.
• The iCub would answer accordingly if instead asked
the following at this point (not further followed here):
Martina: “Do you remember the first time I showed
you motor babbling?”
B3 ) Based on the same episode identifier, the autobiographical
memory is then queried to retrieve images where the
“augmented” column equals “kinematic structure”.
iCub: “Yes, let me show you.”
Then, the original stream of images (as above), as well
as the stream of augmented images is visualised. Both
streams are replayed synchronised, so that a human can
clearly see how the images relate to each other.
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
D. Interaction D: Creation of new memories
(a) External view
(b) Current view from iCub eye
(c) View from iCub eye from a past (d) View from iCub eye from a past
episode; synchronised with (b)
episode; overlaid with the kinematic
structure defined a-posteriori
Fig. 5. Different views of the human-robot interaction example described in
Section V. (a): experimental setup with a human partner interacting with the
iCub. (b): iCub doing motor babbling at the time of the experiment. (c): the
remembered image when the iCub first has done motor babbling is visualised
by the iCub. This can serve as memory prosthesis for a human partner. (d):
Here, this image was used to reason about the kinematic structure of the
iCub. As this reasoning algorithm cannot run in real-time, the autobiographical
memory is needed to store the augmented images provided by an external
module. In this particular case, the algorithm needs a stream of images in a
relatively high frame rate to work. With previous attempts of implementing
an autobiographical memory, this requirement was not fulfilled. Note that the
background was removed in (c) and (d) for better contrast with the iCub hand.
C. Interaction C: Remembering a subset of events, including
active recalling with the robot’s body
C1 ) Martina: “Have you done motor babbling yourself in the
past?” Compared to the question asked in the third item,
this question is more general, as it does not specify a
time point.
C2 ) Therefore, information about all episodes where the iCub
did motor babbling are acquired from the memory. Also,
the iCub answers in a more general way.
iCub: “Yes, I already have done this action 20 times.
The first time I did motor babbling was one week ago.
Do you want me to remember?” Here, the iCub is picking
one unique event among the subset defined by the general
request. This allows this interaction to use a common subpart with the interaction A starting from this point.
C3 ) Martina0 : “Yes, please do it.”
Here, Martina could also have said:
Martina00 : “No, thank you.”
C4 ) Following the principle of items B2 ) & B3 ), visual
imagery is used to replay the recorded episode. Also,
as the iCub was the actor of the episode requested in
item C2 ), the iCub can use its motors to replay the
recorded motor states. In addition to the original image
stream, the iCub also visualises the kinematic structure of
its own hand. Therefore, remembering the past episode
also allows the iCub to reason about its own body, by
combining proprioceptive data and augmented images.
iCub: “Okay, let’s recall.”
The images which are visualised are shown in Figure 5.
D1 ) Martina: “Can you do motor babbling with your arm
instead of your hand?”
D2 ) As this question does not involve remembering, an action
module is executed. Rather than doing motor babbling
with the hand, as in item C4 ), the babbling is done with
the arm in this instance. The action module requests the
autobiographical memory to record the following episode,
automatically providing the following annotations for the
episode: agent name, action label, and the body part used.
Then, the ABM records the state of the iCub, including
images from the cameras, joint positions, and so forth.
iCub: “Yes, let me show you.”
E. Interaction E: Acquiring human feedback about the quality
of reasoning results
E1 ) iCub: “Hello Martina, are you willing to give me some
feedback about one of my reasoning processes?”
E2 ) Martina: “Yes, of course.”
E3 ) iCub: “Can you rank this kinematic structure between 1
and 10 please.”
The robot collects augmented memories of kinematic
structures without feedback from Martina. The iCub then
presents the original episode along with the kinematic
structure to allow Martina ranking the quality of the
kinematic structure. The iCub also provides currently best
kinematic structure (if it exists) as comparison at the same
time, and the robot will for example say “The currently
best structure is shown on your left. The ranking is 7.”
E4 ) Martina: “It is an 8.”
E5 ) iCub: “So for you, this kinematic structure is an 8?”
E6 ) Martina: “Yes, this is correct.”
The iCub is asking Martina for confirmation in order to
avoid speech misrecognition. The feedback is stored in
the autobiographical memory of the iCub, and linked
to the corresponding kinematic structure. In case of
misrecognition, Martina could have said “No, this is
incorrect”, and the system would have gone back to E3 .
E7 ) iCub: “Thank you, I have improved my skills. The best
rank is now 8!”
The iCub is updating the currently best kinematic structure and rank, in order to propose a suitable comparison
for a future E3 step.
E8 ) iCub: “Do you want to provide feedback for another
kinematic structure?”
If the robot had other augmented memories about kinematic structures that are lacking Martina’s feedback, it
would ask whether to continue. If all kinematic structures
are ranked, the system jumps to step E10 .
E9 ) Martina: “Yes, please show me another one.”
Martina agrees to continue, and the interaction loops to
step E3 with the next kinematic structure to be ranked. If
Martina wants to stop the interaction, she can say “No,
thank you” and the iCub will return to Interaction A2 .
E10 ) iCub: “I have no more kinematic structure to rank. Thank
you for your feedback!”.
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
G. Summary human-robot interaction studies
The main features of our framework are as follows: In A),
we show the integration of a cloud based reasoning algorithm.
In B1 ) and B2 ), an episode is remembered visually only using
the original memories. In B3 ), the memories are extended
by augmented images. In interaction C), the iCub employs
active recalling with its body in addition to visual imagery.
In interaction D), we show how new memories can emerge
using an action module which annotates the episode. We used
interaction E) to acquire feedback about 16 different kinematic
structures of a human hand with the help of 5 human partners.
VI. D ISCUSSION AND F UTURE W ORKS
Robotic memory frameworks often come with the ability of
forgetting [40, 41]. In our current work such a feature is not
Day 2
Day 3
Day 4
6
4
Score
8
Day 1
2
We have shown how the robot provides access to its autobiographical memory for a human in the previously detailed
human-robot interactions. Additionally, the autobiographical
memory can be used to improve the reasoning capabilities of
the robot. In particular, the robot can employ human feedback
to estimate the quality of the results of reasoning algorithms.
As an example, the robot can reason about the kinematic
structure of a human hand and provide the guesses to a
human expert, until an emerged kinematic structure fulfils
some heuristics and is thus validated.
We used interaction E) with 5 subjects, and the robot
asks for feedback about 4 different kinematic structures per
session (i.e. per day). If there is no kinematic structure with
a sufficiently high quality, 4 new kinematic structures are
generated and a new session is conducted. Each session lasted
approximately 4 minutes, whereas the visualisation of the
kinematic structures took roughly half the time. The kinematic
structures were obtained with randomly introduced noise to
generate variability in the quality. The generated augmented
images for one kinematic structure take ⇠30 megabytes of
space in the memory. As shown in Figure 6, the score
distribution is not normal (see e.g. kinematic structures f and
o). Therefore, a non-parametric pairwise Wilcoxon signedrank test is used to choose between candidates. As we are only
interested in high quality candidates, the results are filtered
using a heuristic. We use thresholds involving quartiles Q1 ≥ 6
and Q2 ≥ 7 as estimators of the data.
For the first day, kinematic structure d (Q1,d = 4, Q2,d = 4)
is the best one. However, it does not meet the criteria described
above and thus new kinematic structures are created. On the
second day, kinematic structure h (Q1,h = 5, Q2,h = 6) shows
improved results, however still not meeting the quality criteria.
On day three, the reasoning algorithm does not provide a
kinematic structure better than that of h, so h is kept as the
best kinematic structure. On day four, kinematic structures n
(Q1,n = 6, Q2,n = 8) and p (Q1,p = 6, Q2,p = 7) are above
our heuristic threshold. The Wilcoxon signed-rank test provides p = 0.5862, showing that both kinematic structure are
equivalent. Therefore, both candidates are kept as templates
for a high quality human hand kinematic structure.
10
F. Improving reasoning algorithms using interaction E)
Q1 threshold
Q2 threshold
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Kinematic Structure #
Fig. 6. Plot of the scores provided by five human partners for the 16 different
kinematic structures which were obtained. The upper images show the best
kinematic structure found so far. The bottom line shows the threshold for Q1,
and the upper line the threshold for Q2. Only kinematic structures n and p
fulfil both criteria, thus indicating that these kinematic structures possess the
highest quality among all of them.
implemented, as we aim to implement a full episodic memory which allows a-posteriori reasoning. However, in future
works our framework will be extended such that compressed
copies of the eidetic (full) memory are created using different
forgetting mechanisms. This will allow a comparison of these
mechanisms. Our framework is particularly compatible with a
forgetting mechanism based on abstraction. As shown in Figure 4, our framework supports linking low-level abstractions
with high-level abstractions. Discarding low-level abstractions,
which typically are memory intensive, while maintaining highlevel abstractions, allows tackling scalability issues (especially
that of an unbounded memory size) while still maintaining a
coherent memory. We aim to integrate our framework with our
previous work [42], where we proposed a memory compaction
based on abstraction using context free grammars, which
allowed to learn reusable structures from visual input streams.
Another forgetting mechanism we plan to compare to has
been proposed by Gurrin et al. with the forgetting view, which
manages most of the memory retrieval [41]. It is based on a
simplified version of Experience Merging, and the complete
memory is only accessed when it is absolutely needed (e.g.
remembering a precise, particular or novel event).
In the future, we will 1) use the ABM to provide the
training data for faces, 2) use cloud based services to gain an
understanding of the gist of a scene as well as to 3) recognise
objects. We think that the integration of such cloud based
services will be a great advantage in the future, as they offer a
wide range of different services without the need to implement
any of them locally.
We will also add more reasoning algorithms. It will be
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
interesting to see how parameter tuning can be improved by
the ability to provide feedback, as the kinematic structure algorithm is parameter free. Our heuristic based on the quartiles
of the scores might be extended to allow not only qualitative
but also descriptive feedback. The heuristic will then be used
to determine when the parameter tuning has converged, i.e.
the quality of the reasoning is high enough. Then, the robot
can focus on another learning task.
Eventually, we are also interested to design a human-robot
interaction study to identify and tune parameters during a
remembering interaction in order to provide the best experience for a human. Previously, Brom et al. [43] have found
that humans prefer fuzzy categories for temporal notations in
episodic memories, as opposed to precise timing information.
We would like to expand this hypothesis to other notions.
Among others, key aspects are the speed of the remembering
and the amplitude of the movement. Currently, we reproduce
with high fidelity what happened, but increasing the speed
and/or reducing the amplitude might reduce the episode time
without any degradation in the message quality and thus
produce a more appealing interaction.
VII. C ONCLUSION
We have presented a generic framework for a full autobiographical memory, which can be used for a large variety
of applications in autonomous systems because of a unique
set of features. The framework is not dependent on an exact
configuration of sensors from a specific robotic platform; we
have shown that it can be used for different robots, including
the iCub, the Baxter and the NAO. It is simple to store data
from external sources, e.g. images from a Kinect camera or
distance measurements from an external laser scanner, as long
as a YARP interface is established. A bridge to use data
provided on ROS topics is provided, substantially extending
the amount of supported robots.
The ABM can store different kinds of data, and is not
specific to a task: angles of the robot joints and images
from embedded cameras may be the most commonly used
(e.g. for imitation), but high frequency and numerous tactile
pressure values have also been successfully stored (e.g. for
tactile human feedback).
The long-term memory can gather high frequency data
(tested up to 100Hz) with a high information content (4224
tactile values, 640*480 RGB images from multiple sources,
etc.) without loosing information. They are organised among
episodes and can then be retrieved at a later time (even after
several months or years) using the annotations linked to these
events (time, agent, action, objects, etc.). For example, we
have shown a complex request in order to identify the precise
episode behind ”Can you remember the <temporal cue> time
<agent cue> showed you <action cue>?”.
Not only can the robot remember these precise events, but
the robot can also relive the event, providing images from the
cameras and reproducing the movements done. The robot can
even add additional information about this memory event, with
the knowledge of a-posteriori reasoning modules that have had
access to this episode between its origin and the time of recall.
This new information is automatically linked to the events they
originate from, using meta-information attached to each event.
For instance, we have shown how the kinematic structure of a
limb can be added to a memory of hand-babbling and can
be remembered when a human is asking for it. Thus, our
ABM provides a way to revisit both original and augmented
memories in real-time side by side, where the augmented
memories might originate from algorithms that cannot run in
real-time.
ACKNOWLEDGEMENTS
This research was funded in part by the EU projects
WYSIWYD (grant FP7-ICT-612139) and PAL (grant H2020PHC-643783). The authors thank the members of the Personal
Robotics Lab for their continued support, particularly Hyung
Jin Chang (providing the kinematic structure learning module) and Martina Zambelli for helping with the human-robot
interaction.
R EFERENCES
[1] D. Vernon, G. Metta, and G. Sandini, “A Survey of Artificial Cognitive
Systems: Implications for the Autonomous Development of Mental Capabilities in Computational Agents,” IEEE Transactions on Evolutionary
Computation, vol. 11, no. 2, pp. 151–180, 2007.
[2] P. F. Verschure, “Distributed Adaptive Control: A theory of the Mind,
Brain, Body Nexus,” Biologically Inspired Cognitive Architectures,
vol. 1, pp. 55–72, jul 2012.
[3] E. Tulving, “Episodic and Semantic Memory,” in Organization of
Memory. New York: Academic Press, 1972, pp. 381–403.
[4] M. A. Conway and C. W. Pleydell-Pearce, “The Construction of
Autobiographical Memories in the Self-Memory System,” Psychological
Review, vol. 107, no. 2, pp. 261–288, 2000.
[5] R. Wood, P. Baxter, and T. Belpaeme, “A review of long-term memory
in natural and synthetic systems,” Adaptive Behavior, vol. 20, no. 2, pp.
81–103, 2012.
[6] L. Squire and S. Zola-Morgan, “The medial temporal lobe memory
system,” Science, vol. 253, no. 5026, pp. 1380–1386, 1991.
[7] C. Brom and J. Lukavský, “Towards Virtual Characters with a Full
Episodic Memory II: The Episodic Memory Strikes Back,” in International Conference on Autonomous Agents and Multiagent Systems,
Budapest, Hungary, 2009.
[8] G. Pointeau, M. Petit, and P. F. Dominey, “Robot Learning Rules of
Games by Extraction of Intrinsic Properties,” in International Conference on Advances in Computer-Human Interactions, Nice, France, 2013,
pp. 109–116.
[9] ——, “Successive Developmental Levels of Autobiographical Memory
for Learning Through Social Interaction,” IEEE Transactions on Autonomous Mental Development, vol. 6, no. 3, pp. 200–212, 2014.
[10] G. Pointeau, M. Petit, G. Gibert, and P. F. Dominey, “Emergence
of the Use of Pronouns and Names in Triadic Human-Robot Spoken
Interaction,” in International Conference on Development and Learning
and on Epigenetic Robotics, Genoa, Italy, 2014, pp. 85–91.
[11] R. Davis, H. Shrobe, and P. Szolovits, “What Is a Knowledge Representation?” AI Magazine, vol. 14, no. 1, p. 17, 1993.
[12] M. Zambelli and Y. Demiris, “Online Ensemble Learning of Sensorimotor Contingencies,” in IEEE/RSJ International Conference on Intelligent Robots and Systems Workshop on Sensorimotor Contingencies for
Robotics, 2015, p. to be published.
[13] Y. Demiris and A. Dearden, “From motor babbling to hierarchical
learning by imitation: a robot developmental pathway,” in International
Workshop on Epigenetic Robotics: Modeling Cognitive Development in
Robotic Systems, Nara, Japan, 2005, pp. 31–37.
[14] H. J. Chang and Y. Demiris, “Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information,” in IEEE Conference on Computer Vision and Pattern Recognition,
Boston, MA, USA, 2015, pp. 3138–3146.
[15] J. E. Laird, “Extending the Soar Cognitive Architecture,” in Conference
on Artificial General Intelligence, Memphis, TN, USA, 2008, pp. 224–
235.
Preprint version; final version available at http://ieeexplore.ieee.org/document/7350228
IEEE Transactions on Cognitive and Developmental Systems (2016), vol: 8(3), pp: 201-213
DOI: 10.1109/TAMD.2015.2507439
[16] D. G. Tecuci and B. W. Porter, “A Generic Memory Module for
Events,” in International Florida Artificial Intelligence Research Society
Conference, Key West, FL, USA, 2007, pp. 152–157.
[17] W. Ho, K. Dautenhahn, N. Burke, J. Saunders, and J. Saez-Pons,
Episodic memory visualization in robot companions providing a memory
prosthesis for elderly users. IOS Press, 2013, vol. 33, pp. 120–125.
[18] D. Perzanowski, A. C. Schultz, W. Adams, E. Marsh, and M. Bugajska, “Building a Multimodal Human-Robot Interface,” IEEE Intelligent
Systems and Their Applications, vol. 16, no. 1, pp. 16–21, 2001.
[19] P. Baxter and T. Belpaeme, “Pervasive Memory: the Future of LongTerm Social HRI Lies in the Past,” in International Symposium on New
Frontiers in Human-Robot Interaction at AISB, London, UK, 2014.
[20] S. Lallee, V. Vouloutsi, M. B. Munoz, K. Grechuta, J.-Y. P. Llobet,
M. Sarda, and P. F. Verschure, “Towards the synthetic self: Making others perceive me as an other,” Paladyn, Journal of Behavioral Robotics,
vol. 6, no. 1, pp. 136–164, 2015.
[21] J. Laird, A. Newell, and P. S. Rosenbloom, “SOAR: An Architecture for
General Intelligence,” Artificial Intelligence, vol. 33, pp. 1–64, 1987.
[22] F. Bellas, R. J. Duro, A. Faiña, and D. Souto, “Multilevel Darwinist
Brain (MDB): Artificial Evolution in a Cognitive Architecture for Real
Robots,” IEEE Transactions on Autonomous Mental Development, vol. 2,
no. 4, pp. 340–354, 2010.
[23] F. Broz, C. L. Nehaniv, H. Kose-Bagci, and K. Dautenhahn, “Interaction
histories and short term memory: Enactive development of turn-taking
behaviors in a childlike humanoid robot,” CoRR, vol. abs/1202.5600,
2012.
[24] T. D. Niemueller, G. Lakemeyer, and S. Srinivasa, “A Generic Robot
Database and its Application in Fault Analysis and Performance Evaluation,” in IEEE International Conference on Intelligent Robots and
Systems, Vilamoura, Portugal, 2012, pp. 364–369.
[25] M. Beetz, M. Lorenz, and M. Tenorth, “CRAM - A Cognitive Robot Abstract Machine for Everyday Manipulation in Human Environments,” in
IEEE/RSJ International Conference on Intelligent Robots and Systems,
Taipei, Taiwan, 2010, pp. 1012–1017.
[26] J. Winkler, M. Tenorth, A. K. Bozcuoglu, and M. Beetz, “CRAMm
- Memories for Robots Performing Everyday Manipulation Activities,”
Advances in Cognitive Systems, vol. 3, pp. 47–66, 2014.
[27] P. Fitzpatrick, G. Metta, and L. Natale, “YARP: Yet Another Robot
Platform,” International Journal of Advanced Robotic Systems, vol. 3,
no. 1, pp. 43–48, 2006.
[28] G. Metta, L. Natale, F. Nori, G. Sandini, D. Vernon, L. Fadiga, C. von
Hofsten, K. Rosander, M. Lopes, J. Santos-Victor, A. Bernardino, and
L. Montesano, “The iCub humanoid robot: An open-systems platform
for research in cognitive development,” Neural Networks, vol. 23, no.
8-9, pp. 1125–1134, 2010.
[29] S. Coradeschi and A. Saffiotti, “An Introduction to the Anchoring
Problem,” Robotics and Autonomous Systems, vol. 43, no. 2-3, pp. 85–
96, 2003.
[30] R. C. Luo, C.-C. Yih, and K. L. Su, “Multisensor Fusion and Integration: Approaches, Applications, and Future Research Directions,” IEEE
Sensors Journal, vol. 2, no. 2, pp. 107–119, 2002.
[31] M. Samadi, T. Kollar, and M. Veloso, “Using the Web to Interactively
Learn to Find Objects,” in AAAI Conference on Artificial Intelligence,
Toronto, Canada, 2012, pp. 2074–2080.
[32] J. Cunha, R. Serra, N. Lau, L. S. Lopes, and A. J. R. Neves, “Batch
Reinforcement Learning for Robotic Soccer Using the Q-Batch UpdateRule,” Journal of Intelligent & Robotic Systems, 2015.
[33] W. B. Knox, P. Stone, and C. Breazeal, “Training a Robot via Human Feedback: A Case Study,” in International Conference on Social
Robotics, Bristol, UK, 2013, pp. 460–470.
[34] A. León, E. F. Morales, L. Altamirano, and J. R. Ruiz, “Teaching a
Robot to Perform Task through Imitation and On-line Feedback,” in
Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications, 2011, pp. 549–556.
[35] A. Soltoggio, F. Reinhart, A. Lemme, and J. Steil, “Learning the
rules of a game: Neural conditioning in human-robot interaction with
delayed rewards,” in IEEE International Conference on Development
and Learning and Epigenetic Robotics, Osaka, Japan, 2013.
[36] B. Settles, “From Theories to Queries: Active Learning in Practice,”
JMLR: Workshop and Conference Proceedings, vol. 16, pp. 1–18, 2011.
[37] P. E. Rybski, K. Yoon, J. Stolarz, and M. M. Veloso, “Interactive
Robot Task Training through Dialog and Demonstration,” in ACM/IEEE
International Conference on Human-Robot Interaction, Washington DC,
USA, 2007, pp. 49–56.
[38] L. Steels and T. Belpaeme, “Coordinating perceptually grounded categories through language: A case study for colour,” Behavioral and Brain
Sciences, vol. 28, no. 4, pp. 469–489, 2005.
View publication stats
[39] M. Petit, S. Lallee, J.-D. Boucher, G. Pointeau, P. Cheminade, D. Ognibene, E. Chinellato, U. Pattacini, I. Gori, U. Martinez-Hernandez,
H. Barron-Gonzalez, M. Inderbitzin, A. Luvizotto, V. Vouloutsi,
Y. Demiris, G. Metta, and P. F. Dominey, “The Coordinating Role of
Language in Real-Time Multimodal Learning of Cooperative Tasks,”
IEEE Transactions on Autonomous Mental Development, vol. 5, no. 1,
pp. 3–17, 2013.
[40] W. C. Ho, M. Y. Lim, P. A. Vargas, S. Enz, K. Dautenhahn, and
R. Aylett, “An Initial Memory Model for Virtual and Robot Companions
Supporting Migration and Long-term Interaction,” in IEEE International
Symposium on Robot and Human Interactive Communication, Toyama,
Japan, 2009, pp. 277–284.
[41] C. Gurrin, H. Lee, and J. Hayes, “iForgot: A Model of Forgetting
in Robotic Memories,” in International Conference on Human-Robot
Interaction, Osaka, Japan, 2010, pp. 93–94.
[42] K. Lee, Y. Su, T.-K. Kim, and Y. Demiris, “A syntactic approach to robot
imitation learning using probabilistic activity grammars,” Robotics and
Autonomous Systems, vol. 61, pp. 1323–1334, 2013.
[43] C. Brom, O. Burkert, and R. Kadlec, “Timing in Episodic Memory for
Virtual Characters,” in IEEE Conference on Computational Intelligence
and Games, Copenhagen, Denmark, 2010, pp. 305–312.
Maxime Petit received the M.Sc. degree in computer science from the University of Paris-Sud,
France, and an engineering degree in biosciences
(bio-informatics and modelling) from the National
Institute of Applied Sciences (INSA) Lyon, France,
both in 2010. In 2014, he received a Ph.D. in
Neurosciences from the National Institute of Science
and Medical Research (INSERM), in the Stem-Cell
and Brain Research Institute (SBRI) in Lyon, within
the Robot Cognition Laboratory (RCL). He is now
a Research Associate at the Personal Robotics Lab,
Imperial College London.
His research interests include developmental robotics, memory and reasoning in robotics, especially linked to social interaction through spoken language
with a human.
Tobias Fischer received the B.Sc. degree from
the Ilmenau University of Technology, Germany, in
2013, and the M.Sc. degree in Artificial Intelligence
from the University of Edinburgh, United Kingdom,
in 2014. He is currently pursuing the Ph.D degree in
robotics under Yiannis Demiris’ supervision with the
Personal Robotics Lab at Imperial College London.
His research interests include a variety of topics
including both computer vision and human vision,
visual attention, machine learning and computational
neuroscience. Tobias is interested in applying this
knowledge to robotics, to imitate human-like behaviour.
Yiannis Demiris is a Reader (Associate Professor)
at Imperial College London where he heads the
Personal Robotics Laboratory. His research interests
include assistive robotics, cognitive and developmental robotics multi robot systems, robothuman
interaction, and applications of intelligent robotics
in healthcare. His research is funded by the EU FP7
and H2020 programs through projects WYSIWYD
and PAL, both addressing novel machine learning
approaches to humanrobot interaction. He received
the Rectors Award for Teaching Excellence, and the
Faculty of Engineering Award for Excellence in Engineering Education in
2012. He is a Fellow of the Royal Statistical Society (FRSS), the British
Computer Society (FBCS) and the Institute of Engineering and Technology
(FIET).