INTERSPEECH 2005
From Robust Spoken Language Understanding
to Knowledge Acquisition and Management
Luı́s Seabra Lopes, António J. S. Teixeira, Marcelo Quinderé, Mário Rodrigues
IEETA, Departamento de Electrónica e Telecomunicações
Universidade de Aveiro, 3810-193 Aveiro, Portugal
{lsl, ajst, marcelo, mfr}@ieeta.pt
Abstract
Some other projects have attempted to use spoken natural language for human-robot interaction, including Godot [2]
and Jijo-2 [5]. In comparison with these works, we explore
the knowledge acquisition and management aspects in greater
depth, namely by using inductive, deductive and analogical inference. Although there are various works on KR for robotics,
few of them are focused on supporting the dialogue system.
Skubic et al developed a framework to use spatial language in
human-robot dialogues [13]. However, they are not focused on
acquiring information from different interlocutors and do not
consider the acquisition of non-spatial knowledge.
The paper is structured as follows. Section 2 describes
the KRR module. Section 3 presents the robust SLU module.
Section 4 presents some preliminary experimental results. Section 5 concludes the paper with reference to future work.
The recent evolution of Carl, an intelligent mobile robot, is presented. The paper focuses on robust spoken language understanding (SLU) and on knowledge representation and reasoning
(KRR). Robustness in SLU is achieved through the combination
of deep and shallow parsing, tolerating non-grammatical utterances. The KRR module supports the integration of information
coming from different interlocutors and is capable of handling
contradictory facts. The knowledge representation language is
based on semantic networks. Question answering is based on
deductive as well as inductive inference. A preliminary evaluation of the efficiency of the SLU/KRR system, for the purpose
of knowledge acquisition, is presented.
1. Introduction
2. Knowledge Representation and
Reasoning Module
Due to the successive advances in robotics-related technologies,
robots are closer to humans. Intelligent service robots capable
of performing useful work in close interaction with humans are
the next generation. However, in order to reach that phase, it
is necessary to include in their design such basic capabilities as
linguistic communication, reasoning, reactivity and learning.
“Integrated Intelligence” identifies an approach to building
intelligent artificial agents in which the integration of all those
aspects is considered [10]. This is the scope of CARL (Communication, Action, Reasoning and Learning in Robotics), a research project started in our institute in 1999, in the framework
of which a robot prototype was developed, Carl [12, 11, 7].
The software architecture of Carl is based on a set of Linux
processes. One of them handles general perception and action,
including navigation. Other processes are dedicated to speech
processing and touch screen interaction. An animated face displays appropriate emotions. High-level reasoning and natural
language generation are mostly based on the Prolog engine. The
central manager is an event-driven system.
The human-robot communication process is modeled as the
exchange of messages, much like is done in multi-agent systems. The currently supported set of performatives or message
types includes tell, ask, ask if, and achieve.
The latest developments of Carl concern the SLU and KRR
modules. Designed to enable a future enhancement of the dialogue capabilities, the new KRR module adopts a semantic
network representation style and is capable of integrating information obtained from different interlocutors, even if they
are contradictory. The robust SLU module uses a new hybrid
deep/shallow approach. This paper focuses on the application
of the developed SLU/KRR integration for knowledge acquisition.
The scenario in which the Carl robot must act involves dialogue
with various interlocutors and exchange of information with
them. The module should provide functionalities for knowledge acquisition and question answering. It’s important to note
that the agent should be able to start with no prior knowledge.
Since the robot can acquire data from different sources, the
KRR module must handle contradictory pieces of information.
Moreover, since spoken language input will be used, the recognition confidence should also be taken into account.
These problems have, in part, been addressed in other contexts. Benferhat et al presented strategies for conflict resolution
developed to deal with exception handling and iterated belief revision, but which could also be applied in merging information
from different sources [1]. Brazdil and Torgo have developed a
method to construct an integrated knowledge base from several
separated ones [3].
2.1. Knowledge Representation Language
Complex domains, like the one of a dialogue system in an intelligent mobile robot, require a general and flexible knowledge
representation (KR) [9]. The definition of our KR language is
based on typical definitions of semantic networks and on class
and object diagrams of UML (Unified Modeling Language).
Semantic networks [14] address the main representation requirements to support a high level dialogue. It is very easy to
represent the entities with them. Inference is simple, all one has
to do is to follow the relations between entities. Semantic networks, not only provide ways for efficient computing, they also
provide a very intelligible layout. Using semantic networks, the
3469
September, 4-8, Lisbon, Portugal
INTERSPEECH 2005
for this property. In this case, the confidence that the mentioned
value is the correct one is given by:
Cat
instance
instance
instance
Likes
: Milk
Bob
Jim
Likes
Tom
Milk
Likes
: Milk
Likes
Conf (N, T ) =
(2)
Note that the confidence of answers based on few statements is reduced.
If the question is about an attribute value in a type, Algorithm 1 is used to determine the value. The tree traversal step
of the algorithm computes the frequency of occurrence of the
possible values of the Attribute in the type and all its subtypes
(and sub-subtypes, etc) and respective objects. If there is a supertype, the result of the tree traversal step is combined with a
similar result inherited from the supertype.
Figure 1: Analogy
rule of inheritance can be easily applied. In this rule, all properties of the supertype are copied to the subtypes, except if there
is a redefinition of the property in the subtype.
This language assumes that objects are identified by
system-generated identifiers. The predicates that provide information on specific objects include instance(ObjID, Type),
name(ObjID, Name), function(ObjID, Function) and attribute(ObjID, Attribute, Value). Generalization is declared
with subtype/2. Standard UML relations (composition, association) are also supported.
Algorithm 1:
GET VALUE (Type, Attribute)
begin
V1 , .., Vk ←− possible values of Attribute
In a tree traversal, compute the frequencies of
occurrence, N1 , .., Nk , of each possible value of
Attribute in Type and all its subtypes (and
sub-subtypes, etc) and respective objects
P
T ← ki=1 Ni (total number of statements about
values of Attribute)
for i ← 1 to k do
Ci ←− conf (Ni , T ) (According to Equation 2)
if Type has no supertype then
return ((V1 , C1 ), .., (Vk , Ck ))
ST ←− supertype of Type
((V1 , C1′ ), .., (Vk , Ck′ )) ←− GET VALUE(ST,
Attribute)
for i ← 1 to k do
Ci′′ ←− (Ci + Ci′ )/2
return ((V1 , C1′′ ), .., (Vk , Ck′′ ))
2.2. Basic Inference Mechanisms
There are three main types of inference: deduction, induction
and analogy [6]. Take the logical entailment (1), in which P
represents the premise, BK the background knowledge and C
the consequence.
P ∪ BK |= C .
„
«
N
1
· 1−
.
T
T +1
(1)
Deduction is truth preserving. It derives consequence C
from a given premise P and background knowledge BK. Here,
this kind of inference is used when the type has an attribute
with a default value and a question is made about the object’s
attribute.
If there is BK and C, induction can be used to hypothesize
a premise P. In this work, inductive inference is used when the
objects of a type have some attribute information that the type
itself does not have.
Finally, analogy is a combination of deduction and induction. Take the knowledge given in Fig. 1, suppose someone asks
“what does Bob like?”. Since this information is given neither
on the object Bob itself nor on the type Cat, the module has to:
first use induction from the objects Tom and Jim to the type Cat;
then use deduction from the type Cat to the object Bob.
end
In this algorithm, for simplicity, it is assumed that the input
confidences are always 1.0. If that is not the case, each frequency of occurrence, Ni , should be replaced by the sum of the
input confidences on value Vi .
If the question is about an attribute value in an object, confidences for all possible values are computed in the object (equation 2) and in the type (Algorithm 1). The value with the highest
combined confidence is returned.
If the question is about a conjunction of facts, the global
confidence is the multiplication of the confidences of the individual facts.
The implementation of this module was done in Prolog.
2.3. The Reasoning System
The functionalities of the module are provided mainly by the
following two procedures:
• tell(Int, Fact, Conf) – the interlocutor Int tells Fact to
the system; Conf is the interlocutor’s confidence on Fact;
in the integrated system, this is the Automatic Speech
Recognizer (ASR) confidence;
3. Robust Spoken Language Understanding
In order for a robot to acquire knowledge from human interlocutors through spoken language interaction, a robust spoken
language understanding (SLU) capability is necessary.
SLU systems generally include a natural language understanding (NLU) module attached to an Automatic Speech Recognizer (ASR). The ASR decodes voice inputs to sequences of
words. These sequences are then processed by the NLU module. NLU is usually performed in two stages: first a syntactic
structure is built and then semantic information is extracted.
• ask(Fact, Answer, Conf) – the system is asked about
Fact; the confidence in the Answer, Conf, is returned.
The procedure tell simply stores the information given by
the interlocutors. Inference is used in ask in order to provide
the most suitable answer.
Confidences are calculated as follows. Consider a property
of an object or type for which a certain value is supported by N
interlocutors and that a total of T interlocutors provided values
3470
INTERSPEECH 2005
passed to LCFLEX for a deep syntactic analysis, otherwise it is
passed to another instance of TiMBL for a shallow analysis.
If the sentence syntactic structure returned by LCFLEX includes more than 75% of the total words of the sentence, the
analysis is passed to the semantics extraction module. Otherwise the system considers that the sentence has a great deviation
from the grammar and the final analysis is shallow and made by
the second instance of TiMBL. This instance of TiMBL can also
ignore the sentence if it cannot get a valid analysis.
With this hybrid architecture, it was developed a system capable of performing deep analysis if the sentence is mostly acceptable by the grammar, but also capable of performing a shallow analysis if the sentence has severe errors. A more detailed
explanation of the SLU system is given on [7].
3.2. Semantics
In order to integrate the SLU module with the new KRR module into the existing software architecture, it is crucial that the
two modules use the same language. Therefore, a new semantics analysis module was developed, which receives as input the
syntactic structure given by the ASR and parsing modules.
As an example, if an interlocutor tells Carl that “Professor James is in France”. The system extracts the following
semantics: [name(X, james), function(X, professor), association(be in, X, Y), name(Y, france)].
It is important to note that the semantics analysis module
is robust with respect to some misrecognition problems. For instance, consider the relation subtype(SubType, Type). Suppose a
phrase is misrecognized and instead of “a car is a machine” the
ASR gets “the car is a machine”. This would generate a semantics [instance(X, car), subtype(X, machine)]. However, since
the relation subtype is only supposed to link two types, the misrecognized instance X is ignored and the semantics becomes:
subtype(car, machine).
Figure 2: NLU system architecture
It is important to extract meaning from every ASR output
instead of ignoring it, even if it is not grammatical. To be able
to get the meaningful part of such utterances, the NLU module
of the SLU system should be robust to several types of errors.
3.1. System Description
Carl uses LCFLEX [8], a flexible left-corner parser designed for
efficient robust interpretation, based on the Lexical Functional
Grammar (LFG) formalism for grammar definition. LCFLEX
goal is to extract the most complete interpretation possible from
the inputs. It supports word skipping, insertion of non-terminal
grammar rules and selective flexible feature unification.
Although LCFLEX can interpret utterances not completely
described by the system grammar, if the utterance has great deviation from it, results can miss the meaningful part of the utterance. For those utterances, a complementary memory based
learning (MBL) approach is used.
MBL is based on the idea that intelligent behavior can be
obtained by analogical reasoning. MBL algorithms take a set of
examples as input, and produce a classifier capable of classifying new input patterns. If the system learns to detect important
information from examples, it should be able to identify meaningful information in any utterance. The TiMBL [4] tool is used
for this purpose.
The NLU system architecture is represented in Fig. 2. The
priority of the system is to use LCFLEX for syntactic analysis.
The parser can perform a deeper analysis than TiMBL, extracting more information from the utterances. If the parser fails or if
the sentence is considered with great deviation to the grammar,
TiMBL is used to extract meaningful information.
To build the syntactic structure of a sentence, the information used is the morfo-syntactic role of each word in the sentence instead of the word itself. So the lexical entries of the
grammar used by LCFLEX are part-of-speech (POS) tags instead of words. The POS tags are assigned on-the-fly by two
taggers.
The taggers use the Penn Treebank tagset that has 36 tags,
so the lexicon of our grammar has just 36 entries which allows
us to have a very small, easily maintained grammar.
If both taggers agree in more than 75% of the tags and the
sentence is classified by TiMBL as grammatical, the sentence is
4. Experimental Results
In Table 1, the results of an experiment performed with the SLU
system are presented. One speaker read aloud 41 sentences, repeating 3 times each one, in a total of 123 sentences with 312
relations. The sentences were selected in order to generate a
minimum of 20 samples of each semantic relation. The average
number of words per sentence is 5.0 and the average number of
semantic relations correctly representing the semantics of the
sentences is 2.54. This experiment was carried out in the Carl
robot in an environment with some background noise (a common research laboratory). As ASR system, Nuance 8.0 with
trigrams language model was used.
Two performance measures are used. Precision is the number of correct relations extracted by the system divided by the
total number of relations extracted (= Right/(Right+Wrong)).
Recall is the number of correct relations extracted divided by
the total number of correct relations that could have been extracted (= Right/Target). The system had a Recall of 84.3% and
a Precision of 80.9%.
Looking carefully to results not presented, sometimes the
final result does not contain enough information for the robot to
take a decision, but at least it gives an idea about the sentence
content allowing the robot to ask for future clarification.
Another experiment was carried out to test the performance
of the integrated SLU/KRR system. First, an “Utopian” state of
the knowledge base was created, using the same 41 sentences of
the previous test. For every phrase, the right semantics was gen-
3471
INTERSPEECH 2005
Table 1: Semantic Analysis Results.
Relation
Association
Composition
Name
Instance
Subtype
Function
Total
Target
87
21
93
39
21
51
312
Right
74
7
82
36
21
43
263
Wrong
19
0
1
39
1
2
62
Precision
79.6%
100.0%
98.8%
48.0%
95.5%
95.6%
80.9%
Recall
85.1%
33.3%
88.2%
92.3%
100.0%
84.3%
84.3%
Answers
41
37
Performance
100.0%
90.2%
90.2%
[1] Benferhat, Salem, et al.: Weakening conflicting information for iterated revision and knowledge integration. Artif.
Intell, 153(1-2), (2004) 339–371
[3] Brazdil, P. and L. Torgo: Knowledge Acquisition via
Knowledge Integration. Current Trends in AI, B. Wielenga
et al.(eds.), (1990) IOS Press, Amsterdam
[4] Daelemans, W., J. Zavrel, A. van den Bosch, and K. van der
Sloot.: TiMBL: Tilburg Memory-Based Learner Reference
Guide, version 4.2. Technical report, P.O. Box 90153, NL5000 LE, Tilburg, The Netherlands, June 2002
[5] Matsui, T., et al. Integrated Natural Spoken Dialogue System of Jijo-2 Mobile Robot for Office Services Proceedings
of AAAI (1999), 621-627.
[6] Michalski, R. S.: Inferential Theory of Learning: Developing Foundations for Multistrategy Learning. Machine
Learning: A Multistrategy Approach, 1994
[7] Rodrigues, M., A. Teixeira and L. Seabra Lopes. An Hybrid Approach for Spoken Natural Language Understanding Applied to a Mobile Intelligent Robot. Proc. Natural
Language Understanding and Cognitive Science (NLUCS),
(2004) 145-150
Table 2: Results of the KRR on the second experiment - utopia
vs reality.
Questions
41
41
6. References
[2] Bos, J., E. Klein, T. Oka Meaningful Conversation with a
Mobile Robot Proceedings of the EACL (2003).
erated manually, and added to the base with a call to the predicate tell, using 100.0% as a fictitious confidence of the ASR.
Then, 41 questions based on the original phrases were submited
using the predicate ask. Of course the system could respond
to all of them, but since the confidence of the answer depends
on the number of statements that supports it, the average confidence is considerably lower than 100%.
To finish the second experiment, we have made the same 41
questions to the base that actually was produced after the first
recognizer test. Table 2 shows the results of this comparison
between reality and utopia. If we take the confidence result
of reality (29.4%) and divide by the respective result of utopia
(43.7%), we have a good measure of the efficiency of the system
(67.2%). The Performance numbers indicate the percentage of
questions that could be answered.
Despite the small test size, the results show that the system
is suitable to support knowledge acquisition for an intelligent
mobile robot.
Data
Utopia
Reality
Efficiency
have not yet been addressed. One of them is induction on attributes with continuous values. Another problem is that inheritance does not take into account the confidence on the generalization links (subtype). A third problem is that when evaluating
the values of an attribute it is assumed that only one is correct,
which is not always the case.
Confidence
43.7%
29.4%
67.2%
[8] Rosé, C. P., and A. Lavie.: LCFlex: An Efficient Robust
Left-Corner Parser. Technical report, University of Pittsburgh, 1998
[9] Russell, Stuart and Peter Norvig Artificial Intelligence: A
Modern Approach. Prentice Hall, 1st edition (1995); 2nd
edition (2003).
[10] Seabra Lopes, L. and Connell, J.H. (eds.): Semisentient
Robots: special issue of IEEE Intelligent Systems. vol. 16,
n. 5. Computer Society (2001) 10-14.
5. Conclusion and Future Work
In this paper, the recent evolution of the intelligent mobile robot
Carl was presented. The paper focuses on the robust SLU module and on the new KRR module. The KR language is based on
semantic networks and incorporates some notions from UML.
The reasoning system combines deductive and inductive inference.
With respect to SLU, a new hybrid approach is used,
which combines a robust parser with case-based shallow parsing (MBL). With this hybrid architecture we have developed an
interface capable of performing deep analysis if the utterance is
completely or almost completely accepted by the grammar, but
also a shallow analysis if the utterance has severe errors.
Experiment results presented show that the system had a
recall of 84.3% of the total relations spoken by the user, during
a test in laboratory. Another experiment compared the performance of KRR module after the first test to an utopian state.
This showed that the KRR module could answer to 90.2% of
the questions, with an average confidence of 29.4%, which is
67.2% of the utopian confidence.
With respect to the KRR system, some important aspects
[11] Seabra Lopes, L., A. Teixeira, M. Rodrigues, D. Gomes,
C. Teixeira, L. Ferreira, P. Soares, J. Girão & N. Sénica:
Towards a Personal Robot with Language Interface. Proc.
Eurospeech’2003, p. 2205-2208 (2003).
[12] Seabra Lopes, L., A. Teixeira, M. Quinderé, A Knowledge Representation and Reasoning Module for a Dialogue
System in a Mobile Robot. Proc. Natural Language Understanding and Cognitive Science (NLUCS), (2005), to appear
[13] Skubic, M., D. Perzanowski, et al.: Spatial Language
for Human-Robot Dialogs. IEEE Transactions on Systems,
Man, and Cybernetics, C 34(2): (2004) 154-167
[14] Sowa, John F.: Semantic Networks. MIT Encyclopedia of
Cognitive Science, www.jfsowa.com/pubs/semnet.htm
3472