1 Introduction
Social interactions are essential for well-being [
45]. During these interactions, information is sent and received (e.g., spoken words, gestures), including subtle social clues [
84,
90], which offer insights into feelings, intentions, and emotions, playing a vital role in setting conversational tones, reflecting societal norms, and establishing personal boundaries [
37].
Empathetic computing seeks to equip computing systems with capabilities to sense, process and react emphatically to verbal and nonverbal social clues, making interfaces intuitive, effective and meaningful[
20]. By using virtual human-like agents (VHAs) that transmit social clues as embodied interfaces (i.e., that speak and use gestures), one can fundamentally build social human-computer interactions [
6,
20,
71,
91]. Therefore, the design of empathetic computing systems like VHAs must consider personality traits that change the processing of social information. However, research is riddled by either focusing only on gaze or proxemics [
11,
12], neglecting individual differences, or by just studying the direct effect of personal space or gaze [
31,
32,
74].
For some, social clues have a profoundly different meaning, as is the case of people with Social Anxiety (SA) [
60]. SA, characterized by the intense fear of others’ evaluation, leads to the development of cognitive and behavioral biases causing individuals to avoid social interactions or endure them with severe unease [
89]. Most importantly, affected individuals show an interpretation bias towards social clues, misinterpreting neutral stimuli as hostile [
104]. These patterns stretch into digital domains, bringing both challenges and opportunities for VHA design [
46,
55]: for instance, in Virtual-Reality (VR), individuals with SA tend to walk further around a VHA and stand further away from it [
46,
55]. SA can also amplify social cues perception, making interactions feel more intense and leading to compensatory behaviors such as establishing a larger Interpersonal Distance (IPD) or avoiding direct gaze [
57,
89].
In VR one can interact with objects ranging from rudimentary 3D models to complex VHAs [
7,
91]. By emulating human attributes like gestures, speech, and gaze, VHAs act as interfaces, foster natural interactions with users [
6,
20,
71] and create a distinct social context where people communicate and collaborate in shared spaces enhanced by nonverbal cues [
50,
75,
106]. Nonverbal cues, particularly gaze direction [
31,
33] and IPD [
87,
101], are cornerstones of human communication that are engrained in Metaverse applications such as Social VR [
107]. These are vital if VHAs are to inspire human-like interactions [
33,
56]. Yet, these social cues are subjective and based on individual traits, such as SA [
31,
100]. Therefore, the design of empathetic computing systems like VHAs must consider how individual traits change the way social information is processed.
Few exploratory but highly influential studies on Social VR have focused on gaze, proxemics and their intricate relationship, revealing diverse findings [
11,
12]. One study has found gender-driven nuanced differences in clearance distance behind a VHA, for tracked gaze as compared to static gaze [
12]. Other authors have found clearance distance to be enlarged when comparing VHAs that have their eyes closed, with ones with open eyes [
11]. However, more recent research argues that these methods might not capture the true essence of conversational IPD in virtual interactions [
39,
43,
86]. This is further complicated by the inherent subjectivity of gaze and proxemics due to the influence of individual traits (e.g., [
31,
100]) and mental health conditions like SA [
46].
Existing research on the relationship between gaze and proxemics remains too inconsistent to inform design choices effectively and is riddled by either focusing only on gaze and proxemics [
11,
12], neglecting individual differences and the study of the IPD or gaze in relation to SA [
31,
32,
74]. The present study, thus, compares preferred IPD for different dynamic and static animations for centered and averted gaze in interactions with VHAs. It was found, contrary to Bailenson et al. [
11,
12], that participants preferred shorter IPDs in situations of centered gaze, irrespective of dynamic or static displays, therefore underlining the role of direct social gaze as an affiliative signal. Considering SA, the pattern found by Bailenson et al. [
11,
12] was reproduced: in increased SA, averted gaze led to smaller IPDs compared to centered gaze, while in decreased SA, the opposite was found. Thus, by considering participants’ subjective experiences, we could solve a controversy in the literature on gaze and proxemics. From here, we revised design recommendations for VHA’s nonverbal behavior to consider individual variation, intending to inform the design of more engaging Social VR applications for those with SA and the design of embodied conversational interfaces.
3 Method
In this study, we contrast competing ET interpretations on the interplay of gaze and IPD in a VR context, considering SA. On one side, prior research provides evidence for the hypothesis that socially anxious individuals may tend to avoid looking into facial regions, which is referred to as hypervigilance-avoidance-hypothesis [
26,
69]. On the other side, researchers challenged this biased gaze behaviour and found that the biases may fade in real-life interactions with others [
81,
94]. Therefore, we explore the effects of gaze on IPD, in relation to SA. Bailenson et al. [
11,
12] contend that direct gaze amplifies intimacy, leading individuals to augment their IPD from VHAs. Conversely, Argyle and Dean [
9] posit that direct gaze acts as an affiliative cue, resulting in a decreased IPD.
We aimed to replicate Bailenson et al. [
11,
12] experiments in a realistic setting using a stop-distance task [
86] (H1). Interactions are initiated with a VHA exhibiting either a centered (0º) or averted gaze (-15º or +15º from the centered direction).
We hypothesized that socially anxious participants show a larger IPD when VHA’s gaze is centered (H1.1). Further, we hypothesize that socially anxious individuals keep a larger IPD even if gaze is averted (H1.2). Next, we assessed the influence of dynamic gaze shifts on IPD [
53,
54]. Based on ET’s social signalling perspective [
9], we hypothesized that a VHA averting its gaze dynamically (from 0º to +/-15º at 1 m) leads to an increased IPD compared to centering its gaze (from +/-15º to 0º at 1 m) (H2.1). Again, this effect is hypothesized to be magnified in participants with pronounced SA (H2.2). Conforming to prior research, we predict a positive correlation between SA and preferred IPD (H3; [
34,
55,
57,
74,
105]). All hypotheses and analyses were pre-registered.
1 In alignment with Sicorello et al. [
86], we adopted a Bayesian approach to quantify the likelihood of null differences.
3.1 Study Design
A stop-distance task was used in a within-participants design, where we measured IPD, our dependent variable, as the frontal distance (in cm) between participants and the VHA, logged by participants when approaching the VHA. We manipulated
Gaze Dynamics as our independent variable in four conditions (dynamic averting gaze, dynamic centering gaze, static averted gaze, static centered gaze), creating seven gaze levels, given that in the first three levels gaze could be directed to left or right on a horizontal plane. Besides, we used four different VHAs (two male and two female), with seven gaze levels repeated three times for each VHA (3 repetitions x 7 gaze manipulations x 4 VHA), resulting in 84 trials for each participant. We decided on these conditions building on prior work, which emphasized that the gaze’s angle with neutral facial expression is key and found that interpretation of these angles is affected by SA, amplifying social stress when misinterpreted as staring [
82].
We also measured SA using the Liebowitz Social Anxiety Scale (LSAS; [
30,
57]). Since SA is a dimensional trait [
52], and to avoid loss of power in the analyses due to binarising SA, we followed prior research [
46,
100,
101,
103] assessing SA’s effect on proxemics continuously using bayesian linear mixed models.
Furthermore, we explored participants’ gaze behavior when approaching the VHA by dividing our environment into two areas of interest (environment vs. VHA). This study was approved by the research ethics committee of Aalto University (
D/718/03.04/2023). All data and data analyses can be found online
2.
3.2 Participants
Seventy-nine participants underwent the study. Three were excluded from further analyses due to experiencing motion sickness during the VR immersion, indicated by a score ≥ 14 on the Fast Motion Sickness (FMS) Scale [
49] (remaining sample:
M = 2.62,
SD = 3.30). Two participants were excluded as they did not have normal or corrected-to-normal vision. We also tested for minimal visual acuity (all > LogMar ≤ 1)
3[
10], confirmed by the Landolt C Visual Acuity Test [
10]. Another six participants were excluded, due to data issues. Three participants were removed due to poor questionnaire data (e.g., most left-side options in nearly all questions), two due to one missing value on the LSAS and one because of lost IPD data.
The remaining sample comprises 68 participants (33 male, 34 female, one did not disclose gender,
\(M_\text{age}\) = 26.15,
\(SD_\text{age}\) = 6.31). The detailed demographics are reported in Table
1. Participants were recruited through flyers spread at Aalto University and the Helsinki region. Participants received a 20 EUR gift voucher as compensation for their participation.
3.3 Liebowitz Social Anxiety Scale
The Liebowitz Social Anxiety Scale (LSAS) is a questionnaire designed to assess cognitive, behavioral, and somatic manifestations of social phobia and anxiety [
30,
57]. It consists of 24 items, rated on a 4-point Likert scale, with each being answered twice: once rating how anxious or fearful you feel in the situation, ranging from 0 (none) to 3 (severe), and then how often the situation is avoided, ranging from 0 (never) to 3 (usually). The LSAS score was obtained by summing up all item values. Effectively, the score could range from 0 to 144. LSAS has a high test-retest reliability of
r = 0.83 and an internal consistency reliability of Cronbach’s
α = .79 - .92 [
13], which aligns with our empirical data
α = .93 [0.91, 0.95].
3.4 Apparatus
We used the Meta Quest Pro Head-Mounted Display
4 to render the VR environment, an iPad to present the post-experiment survey, and a computer for data recording. The VR environment was implemented using Unity game engine 2021.3.16f1 [
93], integrated with the Oculus XR Plugin [
72], and the Ultimate XR Plugin [
98]. Datalogging was integrated into the rendering pipeline for efficient performance data collection. While the rendering pipeline generated frames at a rate of 90 fps, data logging was limited to 35.9 Hz. The headset was calibrated by adjusting the interpupillary distance and calibrating the eye-tracking for each participant individually.
3.5 Stimuli
The virtual environment was designed to match the dimensions of the physical environment, to allow participants to walk up to the VHA in a natural manner without colliding with objects or walls in the real environment. To measure the sense of presence in the virtual environment, participants filled in the IGroup Presence Questionnaire (IPQ; [
77]) after completing the experiment. The IPQ measures spatial presence, user involvement, and experienced realness of the virtual environment, using 7-point Likert scale items, ranging from 0 to 6. The IPQ showed participants felt relatively present in the virtual environment (
M = 3.03,
SD = 1.86).
The VHAs were selected from the Microsoft Rocketbox Avatar library given its’ high-definition, fully rigged human-like avatars that are popular and well-used in AR/VR and HCI research [
35,
61,
103]. We selected four white adult VHAs (two females; and two males) previously used in proxemics research in HCI [
43]. Voice responses for each were prerecorded and implemented using the Amazon Text-to-Speech Software: Amazon Polly [
4].
All VHAs were set to have a neutral facial expression (see Figure
1), as validated by participants’ emotionality ratings of each VHA (depicted in Table
1). To control for potential effects of gaze direction [
9,
11,
79,
103], VHA’s height was dynamically adjusted to participants’ height. At the beginning of every trial, participants were positioned in front of the VHA, facing it directly.
3.6 Gaze Visualization
For the static gaze condition, VHA’s gaze was set constant during the participants’ approach, fixed at the starting position either centered (0º; VHA established mutual eye contact during participants’ approach) or averted (+15º/left or -15º/right, VHAs eye gaze was deviated from participants). In the dynamic gaze condition, VHA’s gaze shifted in response to the participants’ approach. In the dynamic averted condition, the VHA’s gaze was centered at the beginning of the trial and gradually averted to the right (+15º) or left (-15º). In the dynamic centered condition, VHA’s gaze was averted at the beginning of the trial (+15º or -15º) and gradually centered (0º). The dynamic shift in gaze direction started when participants stood 2.5 m from the VHA, reaching its’ endpoint when participants stood at 1 m (horizontal body movement did not change gaze). Horizontal eye movement was modelled with an inverse Lerp Function using the following formula:
Thus, VHA’s eyes changed linearly dependent on the IPD. Vertical gaze was not manipulated (e.g., see [
31]) and kept constant at the participants’ eye level [
100].
To validate gaze manipulation, participants were asked to report their subjective sense of feeling looked at on a visual analogue scale ranging from 0 ("I don’t feel looked at") to 1 ("I feel looked at."), in increments of 0.01. In the static gaze condition, participants felt more looked at when the VHA had a mutual gaze (0º; M = 0.64, SD = 0.31), compared to averted gaze (M = 0.18, SD = 0.24). In the dynamic gaze condition, participants felt more looked at when the gaze was centered (15º → 0 º; M = 0.66, SD = 0.31), compared to when the gaze was being averted (0º → 15º; M = 0.24, SD = 0.28). There was no effect of SA on the subjective feeling of feeling looked at (all pb > 19.67%).
3.7 Stop-Distance Task
The social situation was standardized to minimize situational effects on IPD (e.g., [
37,
100,
101]). Participants had to imagine a scenario in which they were in an unfamiliar location, asking a stranger for directions. In the VR stop-distance task, participants approached the VHA until a comfortable IPD was reached and pressed the controller’s trigger button. Then, on the participants’ left hand, a slider on which they assessed their subjective feeling of feeling looked at appeared. After, the VHA instructed on whether to press a white or a black button and disappeared. Then, participants would take a step forward, making the buttons to be pressed appear on the wall. After pressing, participants turned around to initiate the next trial. No time constraints were imposed.
3.8 Procedure
According to the Declaration of Helsinki, participants gave written informed consent before starting the study. Participants were informed about the possibility of experiencing motion sickness during the VR immersion, explaining control with the FMS Scale [
49]. This was followed by 10 practice trials (using a centered-gaze-female VHA), where participants could clarify doubts and adjust the headset volume to control for effects of sound on IPD [
37]. Then, participants completed the stop-distance task. The post-experiment survey was completed on an iPad, provided to them digitally on Qualtrics
5 [
76]. The survey included the FMS Scale [
49], general demographic information (i.e., age, gender, race, and occupation), previous VR experience, VHAs’ gender and emotionality ratings [
102], the IGroup Presence Questionnaire (IPQ; [
77]) measurement of experienced co-presence [
14]; the LSAS [
30,
57]) and the Triarchic Psychopathy Measure Screening (TriPM; [
73] for separate replication purposes; not analyzed in this study). A summary of all descriptive statistics can be found in Table
1. Participants were debriefed after the experiment. Participation in the whole study took approximately 60 minutes.
3.9 Bayesian Data analysis
We used Bayesian parameter estimation (for benefits on the Bayesian approach in HCI, see [
1,
36,
48,
96]) to quantify effects, using brms [
17], a STAN-sampler wrapper [
18] in R [
92]. We computed 4 Hamilton-Monte-Carlo chains, each with 40000 iterations and a 20% warm-up. We use the metric
pb for inference decisions. It is similar to the traditional
p-value [
42,
64,
85], which denotes the proportion of probability that the effect is negligible or reversed. A
pb value less than or equal to 2.5% was deemed significant. We also calculated the 95% High-Density Interval (HDI) for all parameters and conducted mean comparisons on standardized outcome variables. We utilized
δt as an effect size estimate, comparable to Cohen’s d [
40,
47].
5 Discussion
5.1 Summary of the Results
Our study illuminates the interplay between gaze, proxemics, and SA. We used Bayesian parameter estimation to account for the uncertainty in estimating the size of the effects. We found, contrary to Bailenson et al. [
11,
12] and H1.1., that participants prefer shorter IPD to VHAs with a centered gaze as compared to averted or dynamically averting gaze. The difference between static averted and centered gaze was diminished in SA up to the point where participants with higher levels of SA preferred closer IPD when the gaze was averted compared to when it was centered.
Aligning with H2.1., participants preferred larger IPD when the gaze was averted compared to when it was centered. Notably, dynamically centering the gaze produced the smallest IPD in our study, highlighting that dynamic gaze is less ambiguous and can enrich our interactions with VHAs. However, no support was found for H2.2. We did not find any indication that SA increased IPD (H3); however, our Bayesian approach to analyzing the data could show that we did not have enough data to be conclusive about the effect of SA on IPD.
Conversely, by exploring participants’ gaze using eye-tracking, we found that VHA gaze centered/centering diminished the amount of time participants looked at the VHA and that SA increased the overall time spent looking at the VHAs.
5.2 Explanation of Findings
To reiterate, ET posits that individuals engage in a dynamic balance between intimacy and personal space [
12,
51]. When confronted with cues that increase perceived intimacy (e.g., direct gaze), IPD may be increased to maintain a comfortable equilibrium, thus adopting compensation behaviors [
51]. Argyle and Dean [
9] have previously highlighted gaze’s significance as an affiliative signal, suggesting that it can serve as an invitation for closer interaction, opposite to what Bailenson et al. [
12] proposed.
In our study, for average SA levels, the pattern was evident: centered gaze was preferred over averted gaze, possibly indicating a feeling of comfort and affiliation, thus supporting Argyle and Dean [
9] proposal. Regarding SA’s effect on IPD, no main effect was found, contrasting with previous findings [
34,
46], however, more data is needed to estimate this effect.
Despite this, a tendency was found within participants with increased SA: a centered gaze seemed to evoke heightened intimacy and arousal, leading to bigger IPD. When looking at the VHA, increased SA possibly led to higher arousal, caused by biased social cue interpretation. According to ET, individuals try to lower arousal promoted by the increase in perceived intimacy by increasing their IPD. Arguably, SA possibly moderates gaze-promoted intimacy: direct mutual gaze is not intrinsically positive or negative since it can either induce feelings of intimacy and signal attention in those with average levels of SA, or promote uncomfortable levels of arousal that lead to compensation behaviors in individuals with increased levels of SA.
This latter interpretation aligns with Bailenson et al. [
11,
12]. Essentially, while Argyle and Dean [
9] proposal holds in the broader context, the specific direction of the balance (approach or avoidance) can vary based on, for example, personality traits. Therefore, designers of inclusive social VR experiences have to know their audience and design VHA interactions with ET in mind.
5.3 Limitations and Future Research
Our study, while shedding light on several nuances of VHA interactions, is not one without limitations.
First, we must acknowledge the influence of cultural backgrounds on the experience of SA characteristics [
41]. For instance, what might be deemed as an intimate distance in one culture might be perceived as too distant in another [
37,
86,
88]. Future research should delve deeper into the role of cultural background affecting the user’s behavior and perception in-virtuo.
Secondly, physiological arousal, which could moderate the relationship between gaze, proxemics and SA, was not measured (see e.g., [
19,
58]). Future studies may simulate personal space violations for SA and enhance measurements with physiological sensing (e.g., skin-conductance response;[
43]).
Third, one could argue that the initial gaze and not the final gaze is critical for IPD and its interplay with SA. Be reminded that in the static conditions, gaze was either averted or centered, while in the dynamic centering condition, gaze was initially averted and then gradually centered and the opposite for the dynamic averting condition. IPD for static and dynamic conditions resemble each other regarding the end of the animation and not the beginning (averting/averted > centering/centered), mirrored in the ratings of gaze. Thus, the effect of gaze on IPD cannot be explained by initial gaze alone. Could this explain the differences in gaze conditions regarding SA, e.g., entertaining that with SA, one did not look at the avatars when approaching? This is also unlikely. We find an SA effect only for static conditions and not for dynamic conditions. Supporting this, we find no interaction between SA and Gaze condition in our eye-tracking data. Nevertheless, future researchers interested in dynamic gaze patterns should add conditions with fully averting gaze (i.e., looking away from the left to looking away on the right as one approaches.)
5.4 Implications
In line with van Berkel and Hornbæk [
97], we will highlight theoretical and HCI-oriented implications for the design of social VR as well as the implications for social anxiety research:
5.4.1 Implications for Human-Computer Interaction.
Our research highlights the crucial relationship between gaze and proxemics and their interaction with SA, pointing to three key approaches for improving user experience:
First, dynamic responsiveness is essential, where VHAs adjust their gaze and other behaviors in real-time, based on user actions or physiological data like eye-tracking to foster an engaging environment.
Second, designs should be context-aware, adapting to the unique dynamics of virtual settings. For example, in intimate conversations, VHAs might adjust their gaze and distance differently than they would in traditional face-to-face interactions, taking cues from ET.
Third, introducing training modes could help users unfamiliar or uncomfortable with VHA behaviors to acclimate by adjusting settings to their comfort level. Additionally, designers should consider implementing "gaze awareness" features, since a lack of gaze tracking could result in unintentional staring by VHAs affecting proxemics.
5.4.2 Implications for Social Anxiety.
As shown in earlier work, socially anxious individuals tend to prefer online communication, which allows them to hide themselves from the potential evaluation by others. With the broader application of better tracking techniques for gaze and other subtle social clues, social VR may become a challenging environment for socially anxious individuals. Biases learned from the physical world may be transferred and even intensified through online replication, causing more social stress than relief. On the other side, if they try to hide their social clues in VR, others may feel discomfort engaging with the socially anxious individual, causing an increase in their SA. Therefore, designers of social VR and empathetic computing systems at large need to carefully consider their design choices and how to present social clues. Our results may help designers of assessment and digital interventions to find new ways to harness behavioral data in virtual environments for the early detection of SA, which is a critical aspect for successful treatment [
89].
5.5 Ethical concerns
The prospect of detecting personality traits in users, especially within virtual environments, raises several ethical concerns. One primary concern is that of consent [
25]. Users might not be aware that their interactions, behaviors, and responses can be indicative of their personality traits. Extracting such information without explicit consent infringes on individual privacy rights [
25]. Miller et al. [
68] could identify people from 5 minutes of motion-data with high accuracy. They propose that such data should be regarded as personal data. While we were interested in the correlation pattern of IPD and personality to improve the design of virtual environments, given that personality traits can be distinctly linked to stimuli and behavior in VR [
100,
102,
103], we encourage research into privacy-preserving techniques which can be adapted to users personality. However, if virtual environments are tailored to cater to identified personality traits, users might end up in echo chambers reinforcing existing beliefs and behaviors instead of deconstructing harmful behavior of users with SA [
29,
57].
6 Conclusion
The present experimental study in the domain of empathetic computing solves inconsistencies in the literature concerning the interplay of gaze and proxemics for VHAs by considering SA. We found that participants generally prefer shorter distances to VHAs displaying a static centered gaze or dynamically centering gaze as compared to an averted gaze. With an increase in SA, however, this pattern reverses, with participants with SA traits keeping larger distances when being looked at directly, indicating a nuanced interplay of gaze, proxemics and SA. In the metaverse, understanding the nuances of VHA interaction becomes pivotal for designing rich, inclusive, and comfortable virtual experiences. Our study into the interplay of gaze, proxemics, and SA provides new insights into their complexity. While foundational theories provide overarching frameworks, the intricacies of individual factors can significantly modify social interaction patterns. As our digital and real worlds continue to merge, researchers and designers must account for these subtleties, guaranteeing inclusive digital interactions.