Abstract
Line-of-sight such as gaze and eye-contract plays an important role to enhance the embodied interaction and communication through avatars. In addition, many gaze models and communication systems with the line-of-sight using avatars have been proposed and developed. However, the gaze behaviors by generating the above-mentioned models are not considered to enhance the embodied interaction such as activated communication, because the models stochastically generate the eyeball movements based on the human gaze behavior. Therefore, we analyzed the interaction between the human gaze behavior and the activated communication by using line-of-sight measurement devices. Then, we proposed an eye gaze model based on the above-mentioned analysis. In this study, we develop an advanced avatar-mediated communication system in which the proposed eye gaze model is applied to speech-driven embodied entrainment characters called “InterActor.” This system generates the avatar’s eyeball movements such as gaze and looking away based on the activated communication, and provides a communication environment wherein the embodied interaction is promoted. The effectiveness of the system is demonstrated by means of sensory evaluations of 24 pairs of subjects involved in avatar-mediated communication.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the advancements in the field of information technology, it is now becoming possible for humans to use CG characters called avatars to communicate in a 3D virtual space over a network. Furthermore, many researches that support remote communication using CG characters such as avatars and agent are performed [1]. However, current systems do not simulate embodied sharing using synchrony of embodied rhythms, such as the nodding and body movements in human face-to-face communication, because the CG characters express nonverbal behavior based on the key commands. In human face-to-face communication, not only verbal messages but also nonverbal behavior such as nodding, body movement, line-of-sight and facial expression are rhythmically related and mutually synchronized between talkers [2]. This synchrony of embodied rhythms in communication is called entrainment, and it enhances the sharing of embodiment and empathy unconsciously in human interaction and accelerates the activated communication in which nonverbal behaviors such as body movements and speech activity increase, and the embodied interaction is activated [3].
In our previous work, we analyzed the entrainment between a speaker’s speech and a listener’s nodding motion in face-to-face communication, and developed iRT (InterRobot Technology), which generates a variety of communicative actions and movements such as nodding and blinking and movements of the head, arms, and waist that are coherently related to voice input [4]. In addition, we developed an interactive CG character called “InterActor” which has functions of both speaker and listener, and demonstrated that InterActor can effectively support human interaction and communication [4]. Moreover, we developed an estimation model of interaction-activated communication based on the heat conduction equation and demonstrated the effectiveness of the model by the evaluation experiment [5].
On the other hand, body movements as well as line-of-sight such as eye contact and gaze duration play an important role in smooth human face-to-face communication [6]. Moreover, it is reported that smooth communication via avatars is realized by expressing the avatar’s gaze. For example, Ishii et al. developed a communication system that controls an avatar’s gaze based on an estimated line-of-sight model and demonstrated that utterance is facilitated between talkers using this model in an avatar-mediated communication [7]. Also, we analyzed human eyeball movement through avatars by using an embodied virtual communication system with a line-of-sight measurement device, and proposed an eyeball movement model, consisting of an eyeball delay movement model and a gaze withdrawal model [8]. In addition, we developed an advanced avatar-mediated communication system by applying our proposed eyeball movement model to InterActors, and demonstrated that the developed system is effective for supporting the embodied interaction and communication. These systems generate the avatar’s eyeball movement by a statistical model based on face-to-face communication characteristics. However, from the viewpoint of promoting the line-of-sight interaction, it is difficult for these systems to enhance the line-of-sight interaction, because the dynamic characteristics of human line-of-sight in the activated communication have not yet been designed. Therefore, in our previous research, we analyzed the interaction between activated communication and human gaze behavior by using a line-of-sight measurement device [8]. On the basis of this analysis, we proposed an eye gaze model, consisting of an eyeball delay movement model and a look away model.
In this paper, we develop an advanced avatar-mediated communication system by applying the proposed eye gaze model to InterActors. This system generates the avatar’s eyeball movements such as gaze and looking away based on the proposed model by using only speech input, and provides a communication environment wherein the embodied interaction is promoted. The effectiveness of the proposed and communication system is demonstrated by means of sensory evaluations in an avatar-mediated communication system.
2 A Speech-Driven Embodied Communication System Based on an Eye Gaze Model
2.1 InterActor
In order to support human interaction and communication, we developed a speech-driven embodied entrainment character called InterActor, which has the functions of both speaker and listener [4]. The configuration of InterActor is shown in Fig. 1. InterActor has a virtual skeleton structure such as head, eyes, mouth, neck, shoulders, elbows, hands (Fig. 1(a)). The texture puts on the 3D surface model including the virtual skeleton structure (Fig. 1(b)). In addition, the various facial expressions are realized by applying the smile model in which the previous research was developed (Fig. 1(c)) [9, 10].
The listener’s interaction model includes a nodding reaction model which estimates the nodding timing from a speech ON-OFF pattern and a body reaction model linked to the nodding reaction model [4]. The timing of nodding is predicted using a hierarchy model consisting of two stages; macro and micro (Fig. 2). The macro stage estimates whether a nodding response exists or not in a duration unit which consists of a talkspurt episode T(i) and the following silence episode S(i) with a hangover value of 4/30 s. The estimator M u (i) is a moving-average (MA) model, expressed as the weighted sum of unit speech activity R(i) in Eqs. (1) and (2). When M u (i) exceeds a threshold value, nodding M(i) is also a MA model, estimated as the weighted sum of the binary speech signal V(i) in Eq. (3).
-
a(j): linear prediction coefficient
-
T(i): talkspurt duration in the i th duration unit
-
S(i): silence duration in the i th duration unit
-
u(i): noise
-
i: number of frame
-
b(j): linear prediction coefficient
-
V(i): voice
-
w(i): noise
The body movements are related to the speech input in that the neck and one of the wrists, elbows, arms, or waist is operated when the body threshold is exceeded. The threshold is set lower than that of the nodding prediction of the MA model, which is expressed as the weighted sum of the binary speech signal to nodding. In other words, when InterActor functions as a listener for generating body movements, the relationship between nodding and other movements is dependent on the threshold values of the nodding estimation.
2.2 Eye Gaze Model
We proposed an eye gaze model that generates a gaze movement and looking away movement for enhancing embodied communication based on the characteristics of the analysis of human eyeball movement. The proposed model consists of the previous eyeball delay movement model [8] and look away model. The outline of the proposed model is indicated as follows:
(1) Eyeball Delay Movement Model
The eyeball delay movement model consists of a delay of 0.13 s with respect to the avatar’s head movement. First, the angle of the avatar’s gaze direction for the viewpoint in virtual space is calculated using Eq. 4 (Fig. 3(a)). Then, the avatar’s gaze is generated by adding the angle of the avatar’s head movement to the angle of the avatar’s gaze direction in the fourth previous frame at a frame rate of 30 fps (Eq. 5). Figure 3(b) shows an example of the eyeball delay movement model in an avatar. If the avatar’s head moves, the eyeball moves with a delay of 0.13 s with respect to the head movement in the opposite direction.
-
θ AG : Rotation angle of gaze direction
-
A Ex , A Ey : eyeball postion of InterActor
-
P x , P y : position of view point in virtual space
-
θ G (i): Rotation angle of eyeball movement
-
θ AH (i): Rotation angle of InterActor’s head movement
-
i: number of frame
(2) Look Away Model
The previous analysis of the human eyeball indicates that direct gaze is limited to about 80% of total conversation time [8]. Therefore, the look away model in this study generates eyeball movement for other gazes such as gaze withdrawal and blinking based on the previous analysis. The avatar’s eyeball of looking away is moved at the horizontal direction greatly (Fig. 4), and the effectiveness of this movement was confirmed in a preliminary experiment. When a value which is estimated the degree of interaction-activated communication falls below a threshold value, the looking away movement is generated by the proposed model (Fig. 5). The avatar’s gaze would be modulated such that staring is prevented and impressions of the conversation such as unification and vividness are enhanced.
2.3 Developed System
We developed an advanced communication system in which the proposed model was used with InterActors (Fig. 6). The virtual space was generated by Microsoft DirectX 9.0 SDK (June 2010) and a Windows 7 workstation (CPU: Corei7 2.93 GHz, Memory: 8 GB, Graphics: NVIDIA Geforce GTS250). The voice was sampled using 16 bits at 11 kHz via a headset (Logicool H330). InterActors were represented at a frame rate of 30 fps.
When Talker1 speaks to Talker2, InterActor2 responds to Talker1’s utterance with appropriate timing through body movements, including nodding, blinking, and actions, in a manner similar to the body motions of a listener. A nodding movement is defined as the falling-rising movement in the front-back direction at a speed of 0.15 rad/frame. In addition, InterActor2 generates an eyeball movement based on the proposed model. Here, a looking away movement is defined as the left-right motion of eyeballs at a speed of 0.15 rad/frame based on the preliminary experiment. Also, InterActor1 generates communicative actions and movements and avatar’s eyeball movements as a speaker by using the MA model and eye gaze model. In this manner, two remote talkers can enjoy a conversation via InterActors within a communication environment in which the sense of unity is shared by embodied entrainment.
3 Communication Experiment
In order to evaluate the developed system, a communication experiment was carried out using the developed system.
3.1 Experimental Method
The experiment was performed on talkers engaged in a free conversation. In this experiment, the following three modes were compared: mode (A) with neither eyeball movement nor facial expression, mode (B) with smile model only, and mode (C) with combined smile model and eye gaze model. We recorded the communication experiment scene using two video cameras and screens as shown in Fig. 7. The subjects were 12 pairs of talkers (12 males and 12 females).
The experimental procedure is described as follows. First, the subjects used the system for around 3 min. Next, they were instructed to perform a paired comparison of modes in which, based on their preferences, they selected the better mode. Finally, they were urged to talk in a free conversation for 3 min in each mode. The questionnaire used a seven-point bipolar rating scale from −3 (not at all) to 3 (extremely), where a score of 0 denotes “moderately.” The conversational topics were not specified in both experiments. Each pair of talkers was presented with the two modes in a random order.
3.2 Result
The results of the paired comparison are summarized in Table 1. In this table, the number of winner is shown. For example, the number of mode (A)’s winner is six for mode (B), and the number of total winner is nine. Figure 8 shows the calculated results of the evaluation provided in Table 1 based on the Bradley-Terry model given in Eqs. (6) and (7) [11].
-
π i : Intensity of i
-
p ij : probability of judgment that i is better than j
The consistency of mode matching was confirmed by performing a goodness of fit test \( (x^{2}(1,0.05) = 3.84 > x_{0}^{2} = 0.28) \) and a likelihood ratio test \( (x^{2}(1,0.05) = 3.84 > x_{0}^{2} = 0.27) \). The proposed mode (C), with both smile model and eye gaze model, was evaluated as the best; followed by mode (B), smile model only; and mode (A), no movement.
The questionnaire results are shown in Fig. 9. From the results of the Friedman signed-rank test and the Wilcoxon signed rank test, all categories showed a significance level of 1% among modes (A), (B), and (C). In addition, “Enjoyment,” “Interaction-activated communication,” “Vividness,” and “Natural line-of-sight” had a significance level of 5% between modes (B) and (C).
In both experiments, mode (C) of the proposed eye gaze model was evaluated as the best for avatar-mediated communication. These results indicate the effectiveness of the proposed eye gaze model. These results demonstrate that the combined model is effective.
4 Conclusion
In this paper, we developed an advanced avatar-mediated communication system in which our proposed eye gaze model is used by speech-driven embodied entrainment characters called InterActors. The proposed model consists of an eyeball delay movement model and a look away model. The communication system generates eyeball movement based on this model by generating the entrained head and body motions of InterActors using only speech input. Sensory evaluations in an avatar-mediated communication system showed the effectiveness of the proposed eye gaze model and communication system.
References
Ishii, K., Taniguchi, Y., Osawa, H., Nakadai, K., Imai, M.: Merging viewpoints of user and avatar in telecommunication using image and sound projector. Trans. Inf. Process. Soc. Jpn. 54(4), 1413–1421 (2013)
Condon, W.S., Sander, L.W.: Neonate movement is synchronized with adult speech. Science 183, 99–101 (1974)
Watanabe, T.: Human-entrained embodied interaction and communication technology. In: Fukuda, S. (ed.) Emotional Engineering, pp. 161–177. Springer, Heidelberg (2011)
Watanabe, T., Okubo, M., Nakashige, M., Danbara, R.: InterActor: speech-driven embodied interactive actor. Int. J. Hum.-Comput. Interact. 17(1), 43–60 (2004)
Sejima, Y., Watanabe, T., Jindai, M.: Development of an interaction-activated communication model based on a heat conduction equation in voice communication. In: Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2014), pp. 832–837 (2014)
Argyle, M., Dean, J.: Eye contact, distance and affiliation. Sociometry 41(3), 289–304 (1965)
Ishii, R., Miyajima, T., Fujita, K.: Avatar’s gaze control to facilitate conversation in virtual-space multi-user voice chat system. Trans. Hum. Interface Soc. 10(3), 87–94 (2007)
Sejima, Y., Watanabe, T., Jindai, M.: An embodied communication system using speech-driven embodied entrainment characters with an eyeball movement model. Trans. Jpn. Soc. Mech. Eng. Ser. C 76(762), 340–350 (2010)
Sejima, Y., Ono, K., Yamamoto, M., Ishii, Y., Watanabe, T.: Development of an embodied communication system with line-of-sight model for speech-driven embodied entrainment character. In: Proceedings of the 25th JSME Design and Systems Conference, no. 1110, pp. 1–9 (2015)
Yamamoto, M., Takabayashi, N., Ono, K., Watanabe, T., Ishii, Y.: Development of a nursing communication education support system using nurse-patient embodied avatars with a smile and eyeball movement model. In: Proceedings of the 2014 IEEE/SICE International Symposium on System Integration (SII 2014), pp. 175–180 (2014)
Luce, R.D.: Individual Choice Behavior: A Theoretical Analysis. Wiley, New York (1959)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Numbers JP16K01560, JP26280077.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Sejima, Y., Ono, K., Watanabe, T. (2017). A Speech-Driven Embodied Communication System Based on an Eye Gaze Model in Interaction-Activated Communication. In: Yamamoto, S. (eds) Human Interface and the Management of Information: Information, Knowledge and Interaction Design. HIMI 2017. Lecture Notes in Computer Science(), vol 10273. Springer, Cham. https://doi.org/10.1007/978-3-319-58521-5_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-58521-5_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58520-8
Online ISBN: 978-3-319-58521-5
eBook Packages: Computer ScienceComputer Science (R0)