1. Introduction
Human societies rely on communication. Human–agent communication in a collaborative virtual environment (VE), where both the human and the agent should collaborate together to complete a shared goal, is a particularly challenging task as the communication should be real-time, goal-driven, and the verbal and non-verbal communication should be interleaved in order to achieve the common goal [
1]. Most natural communication situations require integrating communication channels [
2]. Much of the agent-based research concerned with communication and teamwork focusses on agent-to-agent communication [
3]. Agent-to-agent communication tends to be manageable especially if the agents involved in communication are designed around Beliefs, Desires, and Intentions (BDI). The BDI approach has been introduced as a means of modelling and allowing agents to do complex and anonymous reasoning [
4]. While there are some models to aid human–agent communication, these models tend to consider only a single communication channel, i.e. the verbal (e.g., [
5]) or non-verbal communication channel (e.g., [
6]). The few models that combine both channels of communication are either situation-dependent (e.g., Babu, Schmugge [
7] offer a model to be used in a role-specific static environment) or lack an objective evaluation of its effectiveness (e.g., [
3]).
Speech Act Theory (SAT) is a well-known theory in the field of the philosophy of language that involves the study of the performative function of language and communication. SAT proposes that any form of expression, either verbal or textual, represents acts to be performed, and there are some actions that need to be performed by the receiver [
8]. In other words, SAT tries to understand how an utterance can be used to achieve actions. Austin [
9] described three characteristics, or acts, of human speech: locutionary acts, illocutionary acts, and perlocutionary acts. Locutionary acts refer to the uttering of a sentence. Illocutionary acts denote the action of expressing the speaker’s intention. Perlocutionary acts refer to what is achieved by saying something.
In order to evaluate the impact of the communication model on human-agent collaboration performance, SAT is used in this paper as a novel evaluation technique. The foundation of SAT will be used to analyse the components of an agent’s speech and find the possible impact on collaboration performance with a human teammate. To investigate the influence of the agent’s verbal communication, as evaluated by SAT, on human–agent collaboration, the following research questions are proposed:
Using SAT concepts, how can an agent’s locutionary act be evaluated?
Using SAT concepts, how can an agent’s illocutionary act be evaluated?
Using SAT concepts, how can an agent’s perlocutionary act be evaluated?
Could the agent’s evaluated illocutionary and perlocutionary acts impact collaboration performance?
2. Background
It is widely known that communication between agents is a challenging research area [
10]. Horvitz [
1] identifies a number of challenges in human–machine interaction, including seeking mutual understanding or the grounding of a joint activity, recognizing problem-solving opportunities, decomposing problems into sub-problems, solving sub-problems, combining solutions found by humans and machines, and maintaining natural communication and coordination during these processes. Ferguson and Allen [
11] state that true human–agent collaborative behaviour requires an agent to possess a number of capabilities including reasoning, communication, planning, execution, and learning.
Many researchers aim to design systems and models that involve human–agent communication. Some of these works focus only on verbal communication. For example, Luin, Akker [
12] present a natural language accessible navigation agent for a virtual theatre environment where the user can navigate into the environment and the agents inside the system can answer the user’s questions. Other scholars are concerned with non-verbal communication such as facial expressions, gesture, and body movement. For example, Miao, Hoppe [
13] present a system to train humans to handle abnormal situations while driving cars by interacting non-verbally with the agent to learn from or avoid abnormal driving behaviours.
Much of the work in the area of human–agent communication includes prototypes that lack a generic model that could be utilized in different scenarios. Among the few presented models, Oijen and Dignum [
3] presented a model for realizing believable human-like interaction between virtual agents in a multi-agent system (MAS), but their work handled the communication between agents in the same virtual environment, where agents shared the same resources. The authors designed a layer to emulate the human’s way of perception. This layer intermediated between the input behaviour received by the agent and the output reaction to that behaviour. The presented model was designed for agent–agent communication and did not demonstrate how to tackle multimodal input to or output from the model. Traum, Rickel [
14] sketched a spoken negotiation model for a peacemaker scenario to handle communication involving tasks in hybrid human–agent teams designed for training purposes. Their negotiation model extended previous work about a conversational planning agent [
15]. Their model focused on verbal rather than non-verbal communication and was restricted to a training situation where the agent’s role is to give oral instructions.
3. Speech Act Theory
The meaning of utterance has been defined by many disciplines such as philosophy, linguistics, social sciences, and artificial intelligence. Theories that study the meaning of utterances are called theories of meaning. Lemaître and Fallah-Seghrouchni [
16] present three categories of theories of meaning that influence multi-agent communication formalisms. The first is classical formal semantics, which studies the conditions used to estimate the truth/falsehood of the proposition uttered. The focus of this theory is on linguistic expression and not the relationship between the sender and the receiver or the communication situation. The second theory is intentionalistic semantics, which focuses on what the speaker meant to say in his/her speech. The meaning is conveyed by the speaker’s intention. The third theory is the use-theory of meaning, which defines the meaning of language as based on how it is used in the communicative situation.
Using the foundation of use-theory of meaning, Austin [
9] presented Speech Act Theory (SAT). The main idea of SAT is that during communication, people do not just utter propositions to be answered with acceptance or rejection. Instead, every exchanged sentence in a communication situation includes the intention of the speaker to accomplish something such as requesting, advising, and so on. Austin described three characteristics of statements, or acts, that begin with the building blocks of words and end with the effects those words have on an audience.
Locutionary acts: the physical act of uttering the sentence.
Illocutionary acts: the action of conveying the speaker’s intention, such as informing, ordering, warning, and undertaking.
Perlocutionary acts: what we achieve by saying something, such as persuading, convincing, requesting. The perlocutionary effect of an utterance is what is actually achieved by the locution. The perlocutionary effect could be informing of a possible next step, informing of accomplishing a task, persuading someone of my point of view, etc.
There are different speech act taxonomies for classifying the literal and pragmatic meaning of utterances, such as Verbal Response Modes (VRM) [
17] and Searle’s taxonomy. Searle’s taxonomy is more commonly used as his classification covers a wider variety of intentions of utterances. Searle [
18] has set up the following classification of illocutionary speech acts:
Commissives—speech acts that commit a speaker to performing an action, e.g., promises.
Declarations—speech acts that bring something about in the world, e.g., pronouncing something.
Directives—speech acts that influence the listener to take a particular action, e.g., requests, commands, and advice.
Expressive—speech acts that express the speaker’s psychological state or attitudes towards a proposition, and which have an impact on the listener, e.g. congratulations, excuses, and thanking.
Representatives—speech acts that express the state of the speaker.
An important assumption of speech act theory is that effective communication requires the accurate recognition of speech acts that are exchanged between players [
19].
Speech Act Theory and Agent Communication
When it was first introduced, speech act theory was meant to be a tool to interpret the verbal communication that takes place between human beings. SAT has been used to analyse different shapes of communication including questionnaires [
20], written messages in forum posts [
17], event logs [
21], and email messages [
22]. In the 1990s, artificial intelligence (AI) researchers adopted SAT as a design tool to work in the field of AI and inter-agent communication [
23,
24,
25,
26]. SAT has been used as an integration approach to aid the design and interpretation of communication between humans and agents. In most research work that combines SAT and agent communication, SAT has been used as a reference to design the agent’s communication language, understand the human–agent exchange of messages or for the analysis of mutual understanding.
There are two main Agent Communication Languages (ACLs) that are based on SAT. The first one was proposed by the Foundation for Intelligent Physical Agents (FIPA-ACL), and the other was defined by Finin [
22] as a standard of ACL called Knowledge Query Manipulation Language (KQML). Moreira, Vieira [
27] used SAT as a foundation for giving semantics to messages received by an AgentSpeak (L) (which stands for Agents Speak Out in a Logical Computable Language) agent to fill the gap concerning neglected aspects in agent programming languages such as communication primitives. Speech act-based communication has been used to enable the agent to communicate arguments between agents, to share its internal state with other agents, and to influence other agents’ states. In a more recent study [
28], an approach (named an Agent Communication Model for Interacting Crowd Simulation: ACMICS) was presented to simulate the communication between agents in a crown simulation system. This approach used a message structure based on an ACL that depends on SAT.
Jiang and Zhou [
29] provide a general agent automated negotiation protocol based on speech act theory in MAS. Chien and Soo [
17] designed a speech act model using a dynamic Bayesian Network (DBN) that provided a communication bridge to help virtual agents reason about different dialogue contexts that include norms, social relations, emotion, personality, intention, or goals among agents in a dialogue scene. Dragone, Holz [
30] make use of the accurate and expressive communication mechanism of SAT in agent–agent communication in a multi-agent system. The agents send messages (such as requesting, ordering, informing, or promising) to socially capable acquaintances in order to affect their mental states. The interacting agents used an ACL that was designed using SAT. Cohen and Perrault [
31] used SAT as a model to understand the human–agent exchange of speech acts in a plan-based situation. Traum and Allen [
32] used SAT to analyse the achievement of mutual understanding between participants in a conversation, i.e., to provide grounding.
4. Evaluation of Human–Agent Communication Using SAT
We investigated what evaluation methods existed (
Section 4.1) to evaluate human–agent communication. We found that the few existing evaluation techniques are not integrated or comprehensive, meaning that they do not evaluate both the verbal and non-verbal communication channels, and even those techniques that evaluate the verbal interaction do not cover syntactic and semantic features of the utterances. The lack of an integrated evaluation tool led us to propose that SAT be used as a communication evaluation tool (
Section 4.2).
4.1. Methodologies to Evaluate Agent Communication
Various methods have been used to evaluate the effectiveness of an agent’s verbal or non-verbal communication towards achieving the goal of communication. These methods can be classified as methods that investigate psychological aspects to estimate the personality and internal state of the agent [
33] and those that use specific criteria to measure how the agent expresses its intention and internal states [
34]. Among the methods that investigate psychological aspects, Allwood [
35] discussed the features of successful verbal and non-verbal communication. Allwood stated nine non-mutually exclusive goals. These goals achieve flexibility and conflict prevention in communication: mutual friendliness, lack of tension (tension release), lack of a need to defend a position, admitting weakness or uncertainty, lack of attempts to overtly impose opinions on others, coordination of attention and movements, giving and eliciting feedback expressing mutual support and agreement, showing consideration and interest, invoking mutual awareness and beliefs. McRorie, Sneddon [
33] evaluated the perception of agents’ personalities and credibility by human viewers. The authors used Eysenck’s theoretical [
36] basis to explain aspects of the characterization of four different agents. To evaluate the effectiveness of their agent’s speech, the study by Granström and House [
34] used intelligibility and information presentation, visual cues for prominence, prosody and interaction, visual cues to sentence mode, and agent expressiveness and attitude.
Fabri, M., et al. [
37] introduced the concept of “richness of experience” to evaluate the user’s experience and the non-verbal communication between the human user and an avatar. The authors postulated that a richer experience would manifest itself through more involvement in the task, greater enjoyment of the experience, a higher sense of presence during the experience, and a higher sense of co-presence. The study evaluated the effect of the agent’s facial expressions of emotion (happiness, surprise, anger, fear, sadness, and disgust) on the user’s richness of experience.
4.2. Speech Act Theory as an Evaluation Tool
Speech Act Theory has been used to interpret communication. In [
21], a method based on Speech Act Theory was introduced to discover the relationship between speech acts in human conversations and how to foster the analysis of business process management performance. In another study [
38], speech act theory was used to examine utilising speech acts in computer-mediated communication. The study applied qualitative analysis to investigate various types of speech acts that are manifested in the status messages of the
Whatsapp mobile application. That study identified were four speech acts in
Whatsapp status messages, with different levels of occurrence. Although that study used SAT to evaluate the distribution of
Whatsapp status messages, the study used only 86 status messages posted by 23 participants. Also, the study did a descriptive analysis and did not go beyond describing the frequency of messages appearing in each type of act. In another study [
39], SAT was used as a tool to support the supervised classification of twitter messages.
Another study [
8] presented a framework that used SAT as a tool to evaluate deception in computer-mediated communication. The proposed framework analysed emails in business–to– business environments to detect deception in business messages in order to help managers and decision makers determine whether their business partners were being deceptive. That framework assessed three levels: the ability of word use, message developments, and intertextual exchange cues.
5. Evaluating Human–Agent Communication Using SAT
In this study, we aim to use SAT concepts as a tool to evaluate agent verbal communication and its ability to empower human–agent collaboration toward achieving a common task. Each act of an agent’s message (locutionary, illocutionary, and perlocutionary) should be evaluated individually to reach a final analysis of the agent’s overall communication. Each act will be evaluated as follows:
Locutionary acts: equivalent to uttering a certain sentence with building structures. A locutionary act could be verified by asking the interlocutor his/her impression about the structure of the utterances. The interlocutor could be asked whether the sentences are clear, easy to understand, and natural. Another way to evaluate the locutionary act is to ask a third person to review and evaluate the form of the sentences.
Illocutionary acts: the intention the speaker meant from his/her locutionary act. The illocutionary force could be informing, ordering, requesting, warning, undertaking, etc. The aim of evaluating this act is to determine whether the illocutionary force is clear to the interlocutor. The illocutionary act could be verified by considering an appropriate ratio of using different classes of illocutionary act so that the conversation will have various tones and surveying whether the interlocutor perceives the intention behind the locutionary acts uttered by the speaker.
Perlocutionary acts: what is achieved by saying something, such as convincing, persuading, deterring, surprising, or misleading. Evaluating both the locutionary act and the illocutionary act requires surveying the interlocutor’s impression and opinion about the form of the utterances and the meaning behind it. Evaluating the perlocutionary act should go beyond asking the interlocutor to investigate the actual effect of the locutionary act as demonstrated from the behaviour performed (see
Table 1). To track the behaviour of the participants in a virtual world, actions should be carefully logged as part of the design and implementation. The next sections will describe the materials used for implementation, the scenario designed, the procedure followed in the study, and the data collected.
5.1. Materials
To evaluate the proposed communication model, we extended an existing 3D virtual world known as Omosa Virtual World to include a collaborative activity. Omosa is an ecosystem for an imaginary island designed to help secondary school students learn scientific knowledge and science inquiry skills. To gain scientific knowledge and skills, students are given the goal of determining why the fictitious animals, known as Yernt, are dying out. The authors’ focus is on creating a world that encourages collaboration between the agents and the human, and between the humans. The island consists of four different locations the human can visit. We plan to develop multiple collaboration scenarios, one for each of the four locations on Omosa (village, research lab, hunting ground, and weather station). In this study, we chose one of the areas and designed a scenario we thought provided a compelling reason to collaborate: trapping one of the Yernt with the virtual biologist so that the animal could be studied more closely. To achieve this, the goal of the developed scene is to build a fence together around a virtual animal.
5.2. Scenario from Omosa Virtual World
The communication model is illustrated in a scenario where the user and the virtual agent have a shared goal to capture a Yernt by surrounding the animal with an octagon-shaped fence. In this task, both the human and the agent should collaborate. After the human or the agent draws the first line near the animal, they will continue to take alternating turns to form a geometric shape to trap the animal inside by drawing a line beginning from the two ends of the existing drawn shape. For practicality, the Yernt is tranquilised before the collaborative activity commences so that it remains calm for a specific pre-determined amount of time before it runs away. The process of capturing the animal should therefore be done before the time deadline is reached.
Figure 1 shows a snapshot from Omosa Virtual World.
The design of the scenario is noteworthy in that it is:
Collaborative, with an explicit goal where the user can observe the changes in the situation.
A two-way multimodal interaction: The agent (virtual biologist) interacts with the human participant through speech and visual actions, and vice versa.
SAT is used as an evaluation tool to measure the effectiveness of communication in achieving a designated goal.
Real-time: The agent’s plan is generated in real-time as the human participant’s actions may vary and lead to unexpected changes in the environment.
The task itself shows the effect of both verbal and non-verbal communication through the human–agent social interaction.
As an example of agent messages, an agent may ask a human teammate using a directive illocutionary act “
Why don’t you go to region x and I will go to region y”. Before the human replies to the agent’s request, there is an option to ask for explanation from the agent about the proposed request. The agent replied using a directive illocutionary act “
Why don’t you go to region x and I will go to region y”. The human participant can either accept the agent’s proposition or reject it using a representative illocutionary act, saying “
I like this idea” or
“I do not like this idea”. In
Appendix A, a sample of human–agent messages is listed in addition to the illocutionary classification of each message. The illocutionary classification is performed and reviewed by the authors of this study. To evaluate the perlocutionary act of an agent request and explanation, human replies and actual actions are recorded to see if the human’s reply to either accept or reject the agent’s proposal is honoured or not.
5.3. Data Collection and Data Processing
Collecting data for the events that take place in a VE may require different methods according to the goals of the VE. Hanna et al. [
40] provided a taxonomy of data collection forms and when they are appropriate. For the purposes of this study, data was collected by two means as follows:
For objective data analysis: automatic data logging to track the human’s and the agent’s behaviours, messages, and selections and to register any problems experienced by the participants.
For subjective data analysis: surveys, one before the scenario to collect biographical data and another one after the scenario to evaluate users’ experience.
The biographical survey included questions about participants’ linguistic skills, levels of computing skills, and experience in using computer games and other 3D applications. A second section of the survey (10 Likert scale items) aimed to acquire the users’ opinions about the agent’s verbal and non-verbal communication and whether it was relevant to the collaborative situation.
In the VE scenario, the task was to trap a Yernt that was surrounded by eight regions. Both the human and the agent should take turns to select one region at a time and observe each other’s action, at the same time exchanging verbal messages to convey their intention and request a recommended selection from the other party. We call the process of selecting each pair of regions out of the four pairs a “cycle”. There are four cycles, and each cycle includes the human and the agent selecting a region (except for the first cycle), and they exchange requests and replies verbally. In the first cycle, i.e., the initial selection for the human and the agent, the human selects freely any region out of the eight regions, and they do not exchange verbal messages. Dividing the data collected from the log files during the task into cycles helps to understand the effect of the continuous communication on the achievement of the successive cycles.
6. Results
6.1. Participants
Sixty-six undergraduate students participated in the study. Seven students out of the total number of participants (73) did not complete the collaborative task due to technical reasons. The data from these students were excluded from the evaluation of the communication model. Participants were aged between 18 and 49 years (mean = 21.9; SD = 5.12).
Participants’ linguistic and computer skills were surveyed to explore if any struggle in the communication with virtual agent was because of a lack of linguistic or computer skills. Concerning participants’ linguistic skills, 92.42% were English native speakers. The non-native speakers of English had been speaking English daily on average for 14.4 years. Of the participants, 21.21% described themselves as having basic computers skills, 16.67% as having advanced skills, and 62.12% as having proficient computer skills. Concerning their experience in using games and other 3D applications, the participants answered the question “How many hours a week do you play computer games?” with times ranging between 0 and 30 hours weekly (mean = 4.24, SD = 6.66).
6.2. Evaluating Agent Communication
This paper aims to present an analysis of the exchanged verbal communication between the human and agent while achieving a collaborative task. To reach this aim, an analysis of the agent’s verbal communication was conducted using SAT. The results that answer the first research question will be presented in (
Section 6.2.1 and
Section 6.2.2), the results that tackle the second research question will be presented in (
Section 6.2.3), and the results that answer the third and fourth research questions will be presented in (
Section 6.2.4).
6.2.1. How Can an Agent’s Locutionary Act be Evaluated?
To evaluate the locutionary act of the agent’s speech, a subjective analysis of the data was undertaken (refer to
Table 1). We rely on surveying participants’ perceptions of their interlocutors’ locutionary act to determine the plausibility of the speech structure. The survey asked the participants about their perception of the structure of the messages (rather than the messages content or meaning) used by the agent. Participants needed to select one or more keywords from the following: clear, ambiguous, natural, awkward, nice, ugly, too short, and too long. Ideally, the agent’s messages should be clear, natural, nice, and expressed in as few words as possible. The results demonstrate that the “clear” property was selected by 42.42% of the participants (see
Figure 2). Nevertheless, three times as many participants found the messages “awkward” rather than “natural” and twice as many found the messages to be “too short” rather than “too long”.
To test the significance of the difference in participants’ perception of the agent’s messages, a Chi square test was utilized. The result of Chi square test showed that the percentage of participants that perceived the agent’s verbal communication as “nice” was significantly different from the percentage of participants that perceived the agent’s verbal communication as “ugly”, χ2(2, N = 66) = 3.636, p < 0.01. Additionally, the percentage of participants that perceived the agent’s verbal communication as “too short” was significantly different from the percentage of participants that perceived the agent’s verbal communication as “too long”, χ2(2, N = 66) = 66.091, p < 0.01. Moreover, the percentage of participants that perceived the agent’s verbal communication as “awkward” was significantly different from the percentage of participants that perceived agent’s verbal communication as “natural”, χ2 (2, N = 66) = 10.182, p < 0.01. Although the percentage of participants that deemed the agent’s messages to be “clear” was higher than that of participants who found them to be “ambiguous”, there was no significant difference between these perceptions (χ2 (2, N = 66) = 2.545, p > 0.01).
6.2.2. How Can an Agent’s Illocutionary Act Be Evaluated?
The second component of human speech according to SAT is the illocutionary act. To evaluate the speaker’s intention in delivering an utterance, objective and subjective evaluation methods were utilized. The aim of objective evaluation of the agent’s illocutionary act is to make sure that there is a balance, using Searle’s taxonomy of illocutionary acts. Moreover, objective evaluation aimed to validate that the dominant class of illocutionary act matched the goal of the situation. The verbal utterances of the agent were analysed and classified according to Searle’s taxonomy. Each utterance is labelled according to the meaning and the intention that the agent intended the interlocutor to receive.
The result, as can be seen in
Figure 3, showed that the class “directives” was coded with a ratio of 16.67%, while the class of “representatives”, which give replies to requests or state a fact about the surrounding environment, was coded with ratio of 33.33%. Declaratives, expressives, and commissives comprised 27.78%, 16.67%, and 5.56% of utterances, respectively. The overall result of the objective evaluation of the agent’s illocutionary acts showed the dominant representation of both representatives and directives. This dominance suits the nature of this collaborative situation, where the human and agent teammates exchange requests and replies about the common task.
6.2.3. How Can an Agent’s Perlocutionary Act Be Evaluated?
SAT states that the effect of the illocutionary act appears in the perlocutionary act. This means that to check the effect of the verbal messages of the agent, we need to determine the influence of these messages on the current collaborative situation. To evaluate this influence, objective and subjective analysis methods were used. In the objective evaluation of the agent’s perlocutionary act, the goal was to measure how successful each of the agent’s speech acts were in achieving the agent’s intention, i.e., illocutionary act. To conduct this objective evaluation, the data collected in the automatic log files that track the human’s actions must be analysed to check if the user demonstrates understanding and positively responds to the verbal message. The idea behind this technique is to determine if, after the agent has expressed his intention, the human then takes a decision based on the agent’s intention. The result showed the percentage of the humans’ responses to the agent’s verbal requests in each cycle. The results show that the humans’ behaviour reflects 64.41%, 67.80%, and 70.69% human acceptance of the agent’s requests in cycle1, cycle2 and cycle3, respectively (see
Figure 4).
6.2.4. Could the Evaluated Agent’s Illocutionary and Perlocutionary Acts Impact Collaboration Performance?
In
Section 6.2, SAT was used as a tool to evaluate the agent’s locutionary, illocutionary, and perlocutionary acts. The fourth research question proposes a further utilisation of SAT concepts to evaluate the impact of the agent’s verbal communication on the performance of the human–agent collaboration. The objective of this research question is to investigate if the humans’ perception of the agent’s illocutionary and perlocutionary acts is associated with human–agent performance. To objectively evaluate the performance of human–agent teamwork, the time to complete each cycle was logged during participants’ usage of the virtual system. The result of analysing the log files showed that the average time to complete consecutive cycles decreased from 51.1 s in the first cycle to 40.66 s in the second cycle and 33.78 s in the last cycle. To measure the performance of human–agent collaboration, human perception of performance was surveyed. Participants were asked two questions to estimate the final performance of the collaboration.
The first question asked the participants to estimate how appropriate they found the flow of collaboration from the agent’s side. The results showed that 64.06% and 14.06% of the participants agreed and strongly agreed, respectively, that the agent’s flow of actions was appropriate. Using a t-test to determine if there were any significant differences between participants’ responses, it was found that that there was a significant difference between participants in their responses to survey questions about the perception of the agent’s role in collaboration performance (t(65) = 27.54,
p < 0.01, see
Table 2). The second question asked the participants how appropriate they found the agent’s reaction to the human’s role in the collaboration. The results showed that 59.38% and 9.38% of the participants agreed and strongly agreed, respectively, that the agent’s reactions toward the human’s role in the collaboration was appropriate. Using a t-test to evaluate any significant differences in the participants’ responses, it was found that there was a significant difference between participants in their perception of the agent’s reactions to the human’s role in the collaboration (t(65) = 24.39,
p < 0.01).
To measure the strength and direction of association between human perceptions of the agent’s illocutionary act, perlocutionary act, and collaboration performance, Spearman’s rho correlation was used. Spearman’s rho correlation was selected as it is more appropriate for small sample sizes or non-normally distributed responses. Human perception of the agent’s illocutionary act was significantly positively related to human perception of the agent’s perlocutionary act (r = 0.620, p < 0.05). Moreover, human–agent collaboration performance was significantly positively related to human perception of the agent’s illocutionary act and perlocutionary act (r = 0.927, p < 0.01 and r = 0.633, p < 0.01), respectively.
7. Discussion
This paper aims to study the plausibility of using SAT concepts to analyse the verbal communication between a human and an agent in a collaborative virtual environment. To go beyond merely exploring the hypothesized relationship between the agent’s verbal communication and collaboration performance, the agent’s verbal communication was anatomized. In this dissection, each fundamental component that composed the agent’s verbal communication was evaluated individually and studied in relation to other components. SAT is a well-known theory for understanding human speech, particularly while achieving a task. Although SAT has been used to explore human speech while accomplishing a mission in collaboration with other humans, it has not previously been utilized to understand an agent’s verbal communication and the impact of that communication on the outcome of a collaboration with a human teammate. We claim that this is one of the innovations of this paper.
The first research question inquired about how to evaluate the agent’s locutionary acts. The results showed that the participants generally had a positive perspective toward the structure of the agent’s utterances. We sought to investigate why many found the messages “awkward”. By reviewing the script of the sentences and the participants’ comments, we found that some of agent’s utterances seemed to be formal and not like natural everyday conversation. Researchers have found that humans’ expectations of the abilities of collaborative virtual agents were lower for robot-like agents rather than human-like agents. Nishio and Ishiguro [
41] found that the appearance of a virtual agent can have a strong effect on human evaluations of the agent’s capabilities. If the virtual agent looks like a human, humans will expect the virtual agent to have other human capabilities such as natural human speech. Improving the structure of the sentences uttered by the agent needs to be continuously revised to make sure they satisfy human expectations.
The second research question inquired about how agent’s illocutionary acts can be measured. The result of measuring illocutionary acts stressed the importance of balance in the interlocutors’ usage of Searle’s five classes. Effective speech should not have a specific or singular tone, i.e., representative, declarative, etc. The results showed that the five classes were used with different ratios in the dialogue between the agent and the human. The dominant ratio is for the representative class because the nature of the dialogue needs both the human and the agent to reply to each request for feedback. A subjective evaluation measured the extent to which the participants considered the intention of the agent to be clear, as conveyed by the agent’s requests, replies, and feedback. Findings in social neuroscience research have demonstrated that understanding the intentions of co-actors is fundamental for successful social collaboration [
42]. Similarly, in collaborations involving agents, researchers have found that understanding intentions is fundamental in collaborative tasks [
43].
The third research question inquired about how to evaluate the agent’s perlocutionary acts. The analysis of tracking data in the log files showed that the participants’ acceptance of the agent’s requests increased from 64.41% to 70.69% in the last cycle. This increase could be explained by the increasing exposure of participants to the agent’s verbal communication, which led to the development of a shared understanding between the human participant and the agent, or what is called a shared mental model. A number of studies have argued that the exposure to communication positively affects the degree of coordinated performance attained by teammates.
Communication was found to play an important role in teamwork achievements [
44]. According to some research, communication in computer-based environments does not differ from face-to-face communication in terms of the capabilities for social information exchange [
45]. The result of our study is in line with previous studies that showed the positive impact of communication on tuning team performance [
46]. In [
47], communication was found to uncover any uncertainty team members may have, increasing team effectiveness and improving team performance.
The fourth research question asked whether human understanding of the agent’s intention is associated with human perceptions of the consequences of the agent’s utterance. The result revealed that humans’ perceptions of the agent’s illocutionary acts and their perceptions of the agent’s perlocutionary acts are significantly correlated. Moreover, the result showed that human understanding of the agent’s intention through speech is likely to be a predictor of human perceptions of the consequences of the speech. The literature reveals conflicting opinions regarding whether illocutionary acts are sufficient to develop the interlocutor’s understanding of the consequences of speech. Some researchers believe that perception of the collaborative environment is hard to establish by exchanging messages because message exchange could fail anytime [
48]. Some other work goes to the other extreme and assumes the understanding of uttered messages to be a key factor in forming an understanding of other interlocutors and incidents in a collaborative situation [
49]. Although our experiments did not investigate other possible factors that might contribute to the participants’ understanding of the agent’s perlocutionary acts, the results showed the importance of participants’ understanding of the agent’s intention expressed in locutionary acts to support participant expectations of the consequences of verbal communication in a collaboration situation.
The results stemming from the last research question demonstrated that there is a significant correlation between human perception of the agent’s intention in verbal communication and the consequences of this communication, as well as the human perception of the collaboration performance. In addition, the results revealed that human perceptions of the agent’s intention and the consequences of speech are likely to be a predictor of human perceptions of the collaboration outcome. The answer to the second part of the fourth question revealed that participants’ perceptions of the agent’s illocutionary acts contribute more to their perception of collaboration performance. To the best of our knowledge, there is no research in the literature that has studied an agent’s illocutionary and perlocutionary acts and their impact on a task-oriented collaboration with humans. Nevertheless, there are a number of studies that argue the importance of communication as a facilitator in successful teamwork [
50] and in improving coordinated performance [
51]. Sycara and Lewis [
52] confirm the importance of the agent in assisting human partners in their activities via communication.
8. Conclusions and Limitations
To study the requirements of natural, enriched, and effective task-oriented communication with agents in the collaborative virtual world, we designed a communication model, and a scenario was developed where both the human and the virtual agent have to collaborate. SAT was used as a tool to evaluate the effectiveness of the verbal communication in conveying the agent’s intentions. This study aimed to present SAT concepts as applicable tools to evaluate an agent’s verbal communication. Different studies and scenarios that include human–agent communication will incorporate unique exchanged messages to achieve a unique objective; thus, there is no one method that can fit all cases and scenarios. The objective of the current study is to propose a general-purpose framework that is built on SAT concepts that can be adjusted to match each study. We discovered the following major findings from our study:
Besides being used as a reference in designing a communication model, SAT could be used as an evaluation tool to provide understanding of verbal communication from different perspectives, including the structure of utterances, the ratios of use of different intentions, the clearness of the intention behind the utterances, and the effect of the clearness of verbal messages on the decisions taken.
Results show that the verbal component is very effective in conveying requests and intentions in a collaborative virtual environment.
There is a positive relationship between participants’ understanding of the agent’s intention conveyed in his request and their acceptance of the request, and hence, to the understanding of the incidents in the collaboration situation.
There is a positive relationship between the mutual understanding between the human and the agent in the collaborative environment and the human participants’ perceptions of the flow of actions and overall performance.
There is a positive relationship between the agent’s consideration of the human, as represented in its utterances and actions, and the level of understanding the human has of these utterances and actions.
This study had a number of limitations and challenges. The study showed that SAT concepts could be used to evaluate an agent’s verbal communication; however, this is a general framework that should be adapted to fit each unique scenario and different messages. Another challenge could be the way to access the performance of a team including a human and an agent. As each scenario should have its own objective, evaluating the performance of human–agent teamwork will depend on the nature of each scenario. Additionally, potential improvement in human–agent teamwork may involve other factors that could contribute to performance improvement.
Author Contributions
Conceptualization, N.H. and D.R.; methodology, N.H. and D.R.; software, N.H.; validation, N.H. and D.R.; formal analysis, N.H.; investigation, N.H. and D.R.; resources, D.R.; data curation, N.H.; writing—original draft preparation, N.H.; writing—review and editing, N.H. and D.R.; visualization, N.H.; supervision, D.R.; project administration, D.R.; funding acquisition, D.R.
Funding
This research was funded by an Australian Research Council (ARC) Discovery Project number DP1093170.
Acknowledgments
The authors would like to thank John Porte and Meredith Porte for their technical assistance in designing the virtual reality system and in running the experiments. We would also like to thank Jennifer Clarke–Mackessy for allowing us to conduct one of the experiments in her class.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Table A1.
Sample of human–agent messages.
Table A1.
Sample of human–agent messages.
Actor | Speech Act | Illocutionary Act Classification |
---|
Agent | It is my turn. | Declarative |
Do you want to suggest which region I should go to? | Commissive |
Human | I have something | Declarative |
Nothing in mind | Declarative |
Human | I am thinking about regions (1, 2,…, n) | Directive |
Agent | Wow, the requested x was what I was thinking about. | Representative |
The requested region x is a possible choice but far to go to. | Expressive |
Your proposed region x has already been taken before. | Representative |
Well, your proposed region x is not possible because it is not directly connected to an edge region. | Representative |
Agent | It is your turn. | Declarative |
Why don’t you go to region x and I will go to region y. What do you do you think? | Directive |
Agent (reason) | I prefer you to go to x because I am closer to y and so I will save the time to move to the far region x. | Directive |
Human | I like this idea. | Representative |
| I do not like this idea. | Representative |
Agent | Thanks for accepting my request and going to region x. | Expressive |
We are on the right track. | Declarative |
Agent | It seems you have another opinion. | Representative |
I have to hurry to another region. That really cost me time. | Expressive |
References
- Horvitz, E. Principles of Mixed-Initiative User Interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems (SIGCHI’99), Pittsburgh, PA, USA, 15–20 May 1999; pp. 159–166. [Google Scholar]
- Clark, H.H. Using Language; Cambridge University Press: New York, NY, USA, 1996. [Google Scholar]
- Van Oijen, J.; Dignum, F. Agent Communication for Believable Human-Like Interactions between Virtual Characters. In Proceedings of the International Workshop on Emotional and Empathic Agents, AAMAS’12, Valencia, Spain, 4 June 2012; pp. 1181–1182. [Google Scholar]
- Caillou, P.; Gaudou, B.; Grignard, A.; Truong, C.Q.; Taillandier, P. A Simple-to-Use BDI Architecture for Agent-Based Modeling and Simulation. In Advances in Social Simulation 2015; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Yuasa, M.; Mukawa, N.; Kimura, K.; Tokunaga, H.; Terai, H. An Utterance Attitude Model in Human-Agent Communication: From Good Turn-Taking to Better Human-Agent Understanding. In CHI′10 Extended Abstracts on Human Factors in Computing Systems; ACM: Atlanta, GA, USA, 2010; pp. 3919–3924. [Google Scholar]
- Sharpanskykh, A.; Treur, J. An Ambient Agent Model for Automated Mindreading by Identifying and Monitoring Representation Relations. In Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments (PETRA’08), Athens, Greece, 16–19 July 2008; pp. 1–9. [Google Scholar]
- Babu, S.; Schmugge, S.; Barnes, T.; Hodges, L. “What Would You Like to Talk About?” An Evaluation of Social Conversations with a Virtual Receptionist. In Intelligent Virtual Agents; Gratch, J., Young, M., Aylett, R., Ballin, D., Olivier, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 169–180. [Google Scholar]
- Ludwig, S.; Van Laer, T.; De Ruyter, K.; Friedman, M. Untangling a Web of Lies: Exploring Automated Detection of Deception in Computer-Mediated Communication. J. Manag. Inf. Syst. 2016, 33, 511–541. [Google Scholar] [CrossRef] [Green Version]
- Austin, J.L. How to Do Things with Words; Harvard University Press: Cambridge, MA, USA, 1975. [Google Scholar]
- Chaib-draa, B.; Dignum, F. Trends in Agent Communication Language. Comput. Intell. 2002, 18, 89–101. [Google Scholar] [CrossRef] [Green Version]
- Ferguson, G.; Allen, J. Mixed-Initiative Dialogue Systems for Collaborative Problem-Solving. AI Mag. 2007, 28, 23–32. [Google Scholar]
- van Luin, J.; op den Akker, R.; Nijholt, A. A Dialogue Agent for Navigation Support in Virtual Reality. In Proceedings of the Conference on Human Factors in Computing Systems (CHI’01), Seattle, WA, USA, 31 March–5 April 2001; pp. 117–118. [Google Scholar]
- Miao, Y.; Hoppe, U.; Pinkwart, N. Naughty Agents Can Be Helpful: Training Drivers to Handle Dangerous Situations in Virtual Reality. In Proceedings of the 6th International Conference on Advanced Learning Technologies (ICALT’06), Kerkrade, The Netherlands, 5–7 July 2006. [Google Scholar]
- Traum, D.; Rickel, J.; Gratch, J.; Marsella, S. Negotiation over Tasks in Hybrid Human-Agent Teams for Simulation-Based Training. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, Melbourne, VIC, Australia, 14–18 July 2003; pp. 441–448. [Google Scholar]
- Nijholt, A. Issues in Multimodal Nonverbal Communication and Emotion in Embodied (Conversational) Agents. In Proceedings of the 6th World Multiconference on Systemics, Cybernetics and Informatics, Volume II: Concepts and Applications of Systemics, Cybernetics and Informatics, International Institute of Informatics and Systemics, Orlando, FL, USA, 14–18 July 2002. [Google Scholar]
- Lemaître, C.; Fallah-Seghrouchni, A. A Multiagent Systems Theory of Meaning Based on the Habermas/Bühler Communicative Action Theory. In Advances in Artificial Intelligence; Monard, M., Sichman, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 116–125. [Google Scholar]
- Chien, A.-H.; Soo, V.-W. Inferring Pragmatics from Dialogue Contexts in Simulated Virtual Agent Games. In Agents for Educational Games and Simulations; Beer, M., Brom, C., Worth, F., Soo, V.-W., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 123–138. [Google Scholar]
- Searle, J. Speech Acts: An Essay in the Philosophy of Language; Cambridge University Press: New York, NY, USA, 1969. [Google Scholar]
- Eppler, M.; Mengis, J. The Concept of Information Overload-A Review of Literature from Organization Science, Accounting, Marketing, MIS, and Related Disciplines. Inf. Soc. 2004, 20, 325–344. [Google Scholar] [CrossRef]
- Rose, K.R. Speech Acts and Questionnaires: The Effect of Hearer Response. J. Pragmat. 1992, 17, 49–62. [Google Scholar] [CrossRef]
- Richetti, P.H.P.; de ARGonçalves, J.C.; Baião, F.A.; Santoro, F.M. Analysis of Knowledge-Intensive Processes Focused on the Communication Perspective. In Business Process Management; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Finin, T.; Fritzson, R.; McKay, D.; McEntire, R. KQML as an Agent Communication Language. In Proceedings of the Third International Conference on Information and Knowledge Management, Gaithersburg, MD, USA, 29 November–2 December 1994; ACM: New York, NY, USA, 1994; pp. 456–463. [Google Scholar]
- Shoham, Y. Agent-Oriented Programming. Artif. Intell. 1993, 60, 51–92. [Google Scholar] [CrossRef]
- Sidner, C.L. An Artificial Discourse Language for Collaborative Negotiation. In Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, USA, 1–4 August 1994; pp. 814–819. [Google Scholar]
- Cohen, P.R.; Levesque, H.J. Communicative Actions for Artificial Agents. In Software Agents; Jeffrey, M.B., Ed.; MIT Press: Cambridge, MA, USA, 1997; pp. 419–436. [Google Scholar]
- Cohen, P.R.; Perrault, C.R. Elements of a Plan-Based Theory of Speech Acts. Cogn. Sci. 1979, 3, 177–212. [Google Scholar] [CrossRef]
- Moreira, Á.; Vieira, R.; Bordini, R. Speech-Act Based Communication: Progress in the Formal Semantics and in the Implementation of Multi-agent Oriented Programming Languages. In Declarative Agent Languages and Technologies IX; Sakama, C., Sardina, S., Vasconcelos, W., Winikoff, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 111–116. [Google Scholar]
- Kullu, K.; Güdükbay, U.; Manocha, D. ACMICS: An agent communication model for interacting crowd simulation. Auton. Agents Multi-Agent Syst. 2017, 31, 1403–1423. [Google Scholar] [CrossRef]
- Jiang, W.; Zhou, X. Research on a Novel Multi-Agent System Negotiation Strategy and Model. In Proceedings of the 4th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM’08), Dalian, China, 12–14 October 2008. [Google Scholar]
- Dragone, M.; Holz, T.; Duffy, B.R.; O’Hare, G.M. Social Situated Agents in Virtual, Real and Mixed Reality Environments. In Intelligent Virtual Agents; Panayiotopoulos, T., Gratch, J., Aylett, R., Ballin, D., Olivier, P., Rist, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 166–177. [Google Scholar]
- Cohen, P.; Perrault, C.R. Elements of a Plan-Based Theory of Speech Acts. In Communication in Multiagent Systems; Huget, M.-P., Ed.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 1–36. [Google Scholar]
- Traum, D.R.; Allen, J.F. A Speech Acts Approach to Grounding in Conversation. In Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP’92), Banff, AB, Canada, 12–16 October 1992. [Google Scholar]
- McRorie, M.; Sneddon, I.; McKeown, G.; Bevacqua, E.; de Sevin, E.; Pelachaud, C. Evaluation of Four Designed Virtual Agent Personalities. IEEE Trans. Affect. Comput. 2012, 3, 311–322. [Google Scholar] [CrossRef] [Green Version]
- Granström, B.; House, D. Modelling and Evaluating Verbal and Non-Verbal Communication in Talking Animated Interface Agents. In Evaluation of Text and Speech Systems; Dybkjær, L., Hemsen, H., Minker, W., Eds.; Springer: Dordrecht, The Netherlands, 2007; pp. 65–98. [Google Scholar]
- Allwood, J. Cooperation and Flexibility in Multimodal Communication. In Cooperative Multimodal Communication; Bunt, H., Beun, R.-J., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; pp. 113–124. [Google Scholar]
- Eysenck, H.J. The Measurement of Personality; University Park Press: Baltimore, MD, USA, 1976. [Google Scholar]
- Fabri, M.; Elzouki, S.Y.A.; Moore, D. Emotionally expressive avatars for chatting, learning and therapeutic intervention. In Proceedings of the 12th International Conference on Human-Computer Interaction: Intelligent Multimodal Interaction Environments, Beijing, China, 22–27 July 2007; pp. 275–285. [Google Scholar]
- Faizin, B.; Ramdhani, M.A.; Gunawan, W.; Gojali, D. Speech Acts Analysis in Whatsapp Status Updates. In International Conference on Media and Communication Studies (ICOMACS 2018); Atlantis Press: New York, NY, USA, 2018. [Google Scholar]
- Vosoughi, S.; Roy, D. Tweet Acts: A Speech Act Classifier for Twitter. In Proceedings of the 10th AAAI Conference on Weblogs and Social Media (ICWSM 2016), Cologne, Germany, 17–20 May 2016. [Google Scholar]
- Hanna, N.; Richards, D.; Jacobson, M.J. Automatic Acquisition of User Models of Interaction to Evaluate the Usability of Virtual Environments. In Knowledge Management and Acquisition for Intelligent Systems-12th Pacific Rim Knowledge Acquisition Workshop (PKAW’12); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; pp. 43–57. [Google Scholar]
- Nishio, S.; Ishiguro, H. Attitude Change Induced by Different Appearances of Interaction Agents. Int. J. Mach. Conscious. 2011, 3, 115–126. [Google Scholar] [CrossRef]
- Newman-Norlund, R.D.; Noordzij, M.L.; Meulenbroek, R.G.; Bekkering, H. Exploring the Brain Basis of Joint Action: Co-ordination of Actions, Goals and Intentions. Soc. Neurosci. 2007, 2, 48–65. [Google Scholar] [CrossRef]
- Dindo, H.; Chella, A. What Will You Do Next? A Cognitive Model for Understanding Others’ Intentions Based on Shared Representations. In Virtual Augmented and Mixed Reality. Designing and Developing Augmented and Virtual Environments; Shumaker, R., Ed.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 253–266. [Google Scholar]
- Conigliaro, J. Teamwork and communication. In Patient Safety; Springer: New York, NY, USA, 2014; pp. 19–33. [Google Scholar]
- Walther, J.B. Group and interpersonal effects in international computer-mediated collaboration. Hum. Commun. Res. 1997, 23, 342–369. [Google Scholar] [CrossRef]
- Hirokawa, R.Y. The Role of Communication in Group Decision-Making Efficacy: A Task-Contingency Perspective. Small Group Res. 1990, 21, 190–204. [Google Scholar] [CrossRef]
- Jarvenpaa, S.L.; Shaw, T.R.; Staples, D.S. Toward Contextualized Theories of Trust: The Role of Trust in Global Virtual Teams. Inf. Sys. Res. 2004, 15, 250–267. [Google Scholar] [CrossRef]
- Halpern, J.Y.; Moses, Y. Knowledge and Common Knowledge in a Distributed Environment. J. ACM 1990, 37, 549–587. [Google Scholar] [CrossRef]
- Traum, D.R. Speech Acts for Dialogue Agents. In Foundations of Rational Agency; Springer: Dordrecht, The Netherlands, 1999; pp. 169–201. [Google Scholar]
- Smith-Jentsch, K.A.; Johnston, J.H.; Payne, S.C. Measuring Team-Related Expertise in Complex Environments. In Decision Making under Stress: Implications for Individual and Team Training; Cannon-Bowers, J.A., Salas, E., Eds.; American Psychological Association: Washington, DC, USA, 1998; pp. 61–87. [Google Scholar]
- Espevik, R.; Johnsen, B.H.; Eid, J.; Thayer, J.F. Shared Mental Models and Operational Effectiveness: Effects on Performance and Team Processes in Submarine Attack Teams. Mil. Psychol. 2006, 18, 23–36. [Google Scholar] [CrossRef]
- Sycara, K.; Lewis, M. Integrating Intelligent Agents into Human Teams, in Team Cognition; Salas, E., Fiore, S.M., Eds.; American Psychological Association: Washington, DC, USA, 2004; pp. 203–231. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).