1 Introduction

The last decade has seen a significant increase in the popularity of head-mounted display (HMD)-based virtual reality (VR) for commercial, educational, and recreational uses (e.g. Grabowski and Jankowski 2015; Jensen and Konradsen 2018; Munafo et al. 2017). There are now increasing numbers of HMDs on the market (e.g. the Oculus Quest 2, the HP Reverb G2, the HTC Cosmos Elite, HTC Vive Pro 2, the Pimax Vision 5 K/8 K, and the Valve index), each of which can transport their users to richly rendered computer-generated worlds (via PC-VR or standalone-VR). Once they are immersed in HMD VR, users can then explore and interact with their virtual environment in a highly realistic fashion (via tracked physical movements of their heads and hands). Typically they will experience high levels of presence. Compared to other types of VR/simulation, HMD VR tends to produce stronger feelings of spatial presence (i.e. “being there” in the virtual environment—Cummings and Bailenson 2016; Shu et al. 2019). Using an HMD also tends to increase another aspect of presence—the plausibility of the VR simulation (i.e. making the user feel that what appears to be happening is actually happening—see Slater 2009). However, promoting more realistic interactions and generating stronger feelings of presence are just some of the benefits provided by HMD VR. We have only really begun to harness the potential of this revolutionary technology.

At the time of writing, the adoption of HMD VR technology continues to be hindered by user experiences of cybersicknessFootnote 1 (e.g. Rebenitsch and Owen 2016; Teixeira and Palmisano 2021). There is mounting evidence that sickness is more common and severe with HMD VR than with other types of VR/simulation (Dennison et al. 2016; Howarth and Costello 1997; Kim et al. 2014; Sharples et al. 2008; Yildirim 2019a, b). This cybersickness can present as a variety of different signs and unpleasant symptoms. Disorientation appears to be the most reported symptom during HMD VR (Rebenitsch and Owen 2016), whereas vomiting is fortunately quite rare (Kingdon et al. 2001; Rebenitsch and Owen 2016; Stanney et al. 1998a, b). HMD use is often also accompanied by feelings of nausea, stomach awareness, sweating, headaches, dizziness, and oculomotor discomfort (e.g. Gavgani et al. 2017; Lawson 2014; LaViola 2000; McCauley and Sharkey 1992; Rebenitsch and Owen 2016; Stanney et al. 1998a, b). Some of these unpleasant symptoms can persist long after HMD use has ceased (Kennedy and Lilienthal 1994; Kennedy et al. 1994; Merhi et al. 2007). This is presumably why many novice users only try HMD VR a few times before completely giving up on the technology.

Display lag is thought by many to be the main cause of cybersickness during active HMD VR (Golding 2016; Howarth and Finch 1999; Kinsella et al. 2016). The term refers to the time required for the user’s tracked head movements to produce a compensatory change in the scene rendered on their HMD (also known as motion-to-photon latency). There is mounting evidence that display lag plays a major role in cybersickness. Studies have examined the effects of adding constant and time-varying display lags to HMD VR (i.e. introduced as extra lag on top of the system’s baseline lag). When HMD users move their heads (e.g. in order to carry out a virtual search task), adding both types of lag can significantly increase the likelihood and severity of cybersickness (e.g. Caserman et al. 2019; DiZio and Lackner 1997; Feng et al. 2019; Jennings et al. 2000, 2004; Kim et al. 2020; Kinsella et al. 2016; Palmisano et al. 2019; St. Pierre et al. 2015; Stauffert et al. 2018—see Palmisano et al. 2020 for a recent review of this literature).

While considerable efforts have been made to reduce the amount of display lag in modern HMD systems,Footnote 2 some lag will always remain (due to sensing, processing, data smoothing, transmission, rendering and frame rate delays—see Stauffert et al. 2018). During active HMD VR, this display lag will generate spatial discrepancies between the user’s visual and non-visual information—referred to as differences in their virtual and physical head pose (or DVP for short—see Kim et al. 2020 and Palmisano et al. 2020). For example, when a user nods their head to indicate “yes” in HMD VR, the visually perceived orientation of their head will tend to lag behind its true physical orientation (as detected by their inner ears and neck proprioception).

While cybersickness could (in theory) be triggered by an increase in the HMD user’s postural instability (Riccio and Stoffregen 1991), or by them making certain types of eye-movements (Ebenholtz et al. 1994), it is most commonly assumed to be caused by sensory conflict.Footnote 3 In our discussion below, we will focus on how sensory conflicts during HMD VR (e.g. the DVP described above) might impact the user’s experience of cybersickness.

1.1 Sensory conflict theories of cybersickness

There are many different explanations of motion sickness based on alternative definitions of sensory conflict (e.g. Bles et al. 1998; Claremont 1931; Hettinger et al. 1990; Kim et al. 2020; Palmisano et al. 2020; Prothero and Parker 2003; Reason and Brand 1975; Reason 1978). According to most of these theories, potentially provocative conflicts for sickness can occur whenever two or more sensory systems provide contradictory information (such as when we are stationary and experience a visually induced illusion of self-motion—see Hettinger et al. 1990). Depending on the theory, motion sickness could be triggered either by: (1) an intersensory conflict (e.g. when our visual motion is incompatible with the signals from our inner ears), (2) an intrasensory conflict (e.g. when our otolith organs and semicircular canals provide incompatible information about self-tilt), or (3) an expectancy violation (i.e. when our current sensory information differs from what we were expecting based on past experience and concurrent movement) (see Bles et al. 1998; Oman 1982, 1990; Reason and Brand 1975; Reason 1978). We will start by describing the best known, and most highly cited, sensory conflict theory of motion sickness—the sensory rearrangement theory.

1.1.1 Sensory rearrangement theory and cybersickness

According to Reason (1978): (1) we each have a neural store that contains a record of every pattern of motion stimulation we have experienced before; (2) when we plan to move, the expected sensory stimulation for that movement is selected from this neural store; and (3) any discrepancies between our current and expected sensory stimulation can trigger motion sickness. The theory predicts that cybersickness should be more likely and severe when HMD users enter VR for the first time. At this time, their expected multisensory stimulation should be what they would normally experience in the real world. However, as was noted above, when HMD users move their head in VR, their visual stimulation will be delayed by the display lag. Not only will this display lag generate intersensory conflict (i.e. their visual display will be inconsistent with their non-visual information about head orientation), but it will also result in an expectancy violation (as their overall pattern of multisensory stimulation will be different to what they were expecting). Such conflicts should trigger significant cybersickness in most novice users.Footnote 4 However, the theory also predicts that the severity of this cybersickness should decrease with repeated exposure to HMD VR—as the neural store gradually recalibrates to these display lag effects.

In recent years, physiological support has been provided for several aspects of this theory—such as its proposed mechanisms for the neural coding of sensory expectancy conflicts, as well as its proposed pathways for mediating nausea and vomiting (e.g. Oman and Cullen 2014; Yates et al. 2014). However, despite the popularity of the theory and this physiological support, it is often criticised for being difficult to test (e.g. Ebenholtz et al. 1994; Keshavarz et al. 2014; Lackner and DiZio 2020; Palmisano et al. 2011, 2020—please see Irmak et al. 2021 for one notable exceptionFootnote 5). While sensory rearrangement theory predicts that cybersickness should increase with the sensory conflict, it is often not clear how to calculate the magnitude of this conflict—since we currently “do not have an adequate understanding of the formation, nature and operation of [the neural store]” (Lackner and DiZio 2020, p. 1212). Thus, researchers attempting to test its predictions are forced to make assumptions about the presence, or degree, of sensory conflict in their different experimental conditions. Another problem with the theory is its definition of what constitutes provocative sensory conflict for motion sickness (Ebenholtz et al. 1994). Based on its rather broad definition, we are likely to be exposed to many potentially provocative sensory conflicts every day (Stoffregen and Riccio 1991). However, our daily experiences of motion sickness (outside of VR) are relatively rare. Thus, for this, or any other, sensory conflict theory to be useful, we need a clearer, more precise definition of what constitutes a provocative conflict for motion sickness. In an attempt to address the latter criticism, Bles and colleagues (1998) revised this theory. Their subjective vertical conflict theory, and its predictions for cybersickness during HMD VR, are outlined below.

1.1.2 Subjective vertical conflict theory and cybersickness

Bles et al. (1998) propose that “all situations which provoke motion sickness are characterised by a condition in which the sensed vertical … is at variance with the subjective vertical as predicted on the basis of previous experience” (pp. 481–482—see also Bles et al. 2000; Bos and Bles 1998, 2002; Bos et al. 2008; de Graaf et al. 1998). While their theory still relies on a hypothetical internal model of sensory processing, it does provide clearer criteria for distinguishing provocative (from non-provocative) conflicts. According to their theory, only sensory conflicts involving the direction of gravity should cause motion sickness. Thus, it predicts that cybersickness should be worse when an upright HMD user moves their head in pitch (e.g. nodding “yes”) and roll (e.g. wobbles their head towards and away from their shoulders) compared to yaw (e.g. gesturing “no”) (see Fig. 1).

Fig. 1
figure 1

An HMD user making head movements in pitch (top row), yaw (middle row) and roll (bottom row)

This is because pitch and roll movements: (1) tilt the user’s head away from alignment with gravity (whereas the user’s head can remain approximately upright when making yaw movements); and (2) generate patterns of DVP that are more likely to result in misperceptions of the direction of gravity.Footnote 6 Support for this theory has thus far been mixed (see Bonato et al. 2009; Bubka and Bonato 2003; Diels and Howarth 2011; Lo and So 2001; Keshavarz and Hecht 2011a; Keshavarz et al. 2019). Contrary to its predictions, display lag effects on cybersickness in HMD VR do not appear to be worse during self-generated pitch, compared to yaw, head movements (Feng et al. 2019; Kim et al. 2020; Palmisano et al. 2019). However, to date, such observations have only been based on comparisons made across different studies (which either examined pitch or yaw head movements—not both). So this prediction of subjective vertical conflict theory needs further evaluation—e.g. by comparing the experiences of cybersickness during yaw, pitch, and roll head movements in the same HMD VR study with the same participants.

1.1.3 The DVP hypothesis for cybersickness during active HMD VR

Existing sensory conflict theories have difficulties predicting when, and how much, sickness will be induced during active HMD VR (due to their broad definitions of provocative conflicts and their reliance on hypothetical internal models). Thus, we recently outlined a new sensory conflict-based approach to studying and understanding this cybersickness (Palmisano et al. 2020). We proposed that sensory conflict during active HMD VR can be directly quantified as differences in the user’s virtual and physical head pose (i.e. DVP). When the user moves their head during HMD VR, the sign and amplitude of their DVP will vary throughout the movement (with their visually displayed head orientation sometimes trailing, and other times leading, its true physical orientation). In this particular situation, the instantaneous DVP will depend on both the user’s head velocity and the amount of display lag.

Unlike the theories outlined in 1.1.1–1.1.2, our hypothesis does not attempt to model sensory transduction and encoding effects, or the multisensory cortical interactions, produced by DVP. It also ignores the influence of user expectations (e.g. based on past experience) on this internal processing. It is simply focussed on directly observable factors. Being able to objectively estimate the sensory input conflict in a particular situation should allow us to identify the most provocative patterns of DVP for cybersickness (by comparing patterns of DVP produced in different conditions to the amounts of cybersickness they produce). Based on our empirical observations to date, we proposed that: (1) large-amplitude, time-varying patterns of DVP are the primary trigger for cybersickness during HMD VR; and (2) this cybersickness should increase when the mean, peak, and standard deviation of the user’s DVP all increase (Palmisano et al. 2020).

As noted above, DVP incorporates the physical effects of both display lag and the user’s head movements into a single integrated metric. During active HMD VR, we can directly estimate the user’s instantaneous DVP throughout each exposure (assuming we have access to both their head-tracking data and knowledge of the system’s display lag). Thus far, we have examined the predictions of our DVP hypothesis using data from three previous studies (Feng et al. 2019; Kim et al. 2020; Palmisano et al. 2019). In these studies, HMD users were asked to make continuous oscillatory pitch or yaw head movements under six different levels of constant imposed display lag—ranging from a minimum imposed lag of 0 ms to a maximum imposed lag of 198 ms (Feng et al. 2019; Kim et al. 2020; Palmisano et al. 2019). As the baseline system lag was only ~ 4 ms in these studies, we estimated the DVP during each exposure based on the amount of added/imposed display lag for that trial. Our initial analyses revealed significant positive linear relationships between the mean DVP and cybersickness severity (Kim et al. 2020). However, when we re-examined the data from all three display lag studies (see Palmisano et al. 2020), we found evidence that cybersickness severity was also predicted by the peak and standard deviation of the DVP as well. Thus, the findings of all three studies appear to be highly consistent with our DVP hypothesis for cybersickness.

1.1.4 Perceived scene instability and cybersickness

When HMD users move their heads, they will often consciously perceive the DVP generated as scene instability (i.e. their virtual world will appear to swim and oscillate around them; effects sometimes referred to as “oscillopsia”—see Allison et al. 2001). Like their DVP, these perceptions of scene instability will tend to increase with head movement speed and the amount of imposed display lag (e.g. Allison et al. 2001; Kim et al. 2020, 2021; Palmisano et al. 2019). Taken together, such observations suggest that consciously perceived scene instability might provide confirmatory evidence of provocative DVP/sensory conflicts (see Kim et al. 2020). Contrary to this notion, Stauffert et al. (2018) recently reported that cybersickness was exacerbated by injecting very brief latency spikes into their VR system. Their findings suggested that large-amplitude, time-varying DVP might still be problematic even when they fail to reach the threshold for conscious detection. Nevertheless, we still expect user reports of perceived scene instability to help identify many potentially provocative conditions for cybersickness.

1.2 The current study

While sensory rearrangement and subjective vertical conflict theories attempt to explain all motion sickness, our DVP hypothesis is focussed exclusively on the cybersickness experienced during active HMD VR (see Palmisano et al. 2020). This hypothesis was empirically derived based on the analysis of archival data from three different display lag-based cybersickness studies (Feng et al. 2019; Kim et al. 2020; Palmisano et al. 2019). Thus, one goal of the current study was to experimentally test our prediction that large-magnitude, time-varying patterns of DVP are provocative for cybersickness (on all three cardinal axes and in the same group of participants). To do this, we: (1) had 22 HMD users make continuous (fast/slow) head movements under different levels of imposed display lag; (2) objectively estimated their DVP time series data during each of these trials (based on new, as opposed to archival, head-tracking data), and (3) compared the mean, peak and standard deviation of these DVP to their cybersickness severity ratings for the same trials.

A number of researchers have noted that user perceptions of scene instability also increase with the magnitude of these DVP (Allison et al. 2001; Kim et al. 2020). Thus, there might be a more straightforward way to identify provocative sensory conflicts for cybersickness during HMD VR (compared to objectively estimating the active user’s instantaneous DVP throughout their entire exposure). Kim et al. (2020) suggested that we might be able to identify provocative VR conditions based simply on subjective reports of perceived scene instability. Thus, in order to test this idea, the current study will also examine the possible relationships between: (1) DVP and perceived scene instability, and (2) perceived scene instability and cybersickness severity.

As noted above, subjective vertical conflict theory predicts that cybersickness (due to DVP/display lag) should be less likely and severe when HMD users make yaw, as opposed to pitch or roll, head movements. While we have not yet examined roll head movements during HMD VR, our previous research suggests that cybersickness can be quite similar during pitch and yaw head movements (Kim et al. 2020; Feng et al. 2019; Palmisano et al. 2019). Although this observation appears to be inconsistent with subjective vertical conflict theory, it is based on comparisons made across 3 different studies. Thus, in the current study, we planned to examine cybersickness during self-generated head movements along each of the 3 cardinal axes (i.e. pitch, roll and yaw). This will allow us to examine whether: (1) relationships between DVP and cybersickness hold for all 3 types of head rotation, and (2) cybersickness is worse for head movements assumed to generate greater subjective vertical conflict.

In addition to measuring cybersickness severity and perceived scene instability in this study, we also measured HMD users’ feelings of spatial presence (or “being there” in the virtual environment). HMD users normally want to experience minimal cybersickness and maximum presence during VR. The purpose of including this additional measure was to see how cybersickness and perceived scene instability affect presence—as such feelings are often argued to be crucial for a complete VR experience (see Slater and Wilbur 1997; Clifton and Palmisano 2019).

2 Method

2.1 Participants

Twenty-two participants were initially recruited from the University of Wollongong and the general population. These 12 females and 10 males ranged in age between 18 and 50 years (M = 27 years; SD = 7.8 years). All were healthy (i.e. no visual, neurological, or vestibular impairments) and had either normal or corrected-to-normal vision (if needed participants wore contacts). Ten of them were regular users of HMD VR. The remaining 12 participants were HMD VR novices. One of these 22 participants dropped out early due to sickness (after recording FMS ratings above 15 in block 1). The experiment was approved by the University of Wollongong Human Research Ethics Committee prior to testing. All of these participants provided written informed consent before participating in the study.

2.2 Materials

We used an Oculus Rift CV1 HMD and custom code to present our participants with a “Tron”-like virtual room. This HMD had a resolution of 1080 × 1200 pixels per eye and a refresh rate of 90 Hz using organic light-emitted diode (OLED) technology.Footnote 7 Its horizontal and vertical fields of view were 87° and 88°, respectively, with 71.15° of binocular overlap. Due to COVID-19 safety protocols in place at the time of testing, all participants were tested using the same interpupillary distance setting (62 mm). The Rift CV1 has an outside-in tracking system, which provides six-degrees-of-freedom (6-DOF) head-tracking data (sampled at 90 Hz and with sub-millimetre accuracy). Head position and orientation data were provided via two external infrared tracking sensors (Constellation), which were placed on a table located directly in front of the participant.

Our custom software was run on a high-performance Microsoft Windows 10 Dell Precision 5820 computer, with a NVidia GeForce GTX1080 graphics card and an Intel 7th generation CPU. This presented the user with a simulated environment consisting of a blue a wireframe ground plane (located 1 m below the user’s viewpoint) and a blue wireframe ceiling plane (placed 3 m above their viewpoint) presented on an otherwise black background. The ground and ceiling planes were each 16 m wide by 12 m deep—with the user’s location simulated to be at the very start of the ground plane.

Using previously published methods (Feng et al. 2019), baseline system lag was estimated to be ~ 4 ms with our HMD and custom code. [This was the average effective display lag when running our PC-VR simulation without additional display lag imposed]. On different trials, small constant increments in display lag were added on top of this baseline system lag. This was achieved by constructing a circular memory array to store the participant’s 6-DOF head-tracking data. We were able to adjust the level of display lag during each trial by increasing/decreasing the element length (N) of this memory array (see Fig. 2). Trials with a single element array (N = 1) had 0 ms extra display lag (i.e. scene updates were only delayed by the system’s ~ 4 ms baseline lag). Other trials with element lengths of N = 4, 8, 12, 16, or 20 had constant delays of either 44, 89, 133, 178 or 222 ms (respectively) added on top of this ~ 4 ms baseline lag.

Fig. 2
figure 2

A circular memory array of length (N) was used to buffer 6-DOF head pose data and impose an additional constant display lag (on top of the system’s baseline lag). Raw head orientation data from the HMD sensors was stored on every frame. In the above figure, new data is currently being stored at index <1> . Next, previously stored data will be read from index <2> and used to update the visual display. Data in <1> will be held until all previously stored data has been used for rendering. Thus, using an N element memory array will increase the effective display lag by N × 11 ms for our 90 Hz HMD

2.3 Procedure

The experiment consisted of 3 blocks of 12 experimental trials. Each block examined the effects of self-generated head movements along a different axis (HEAD MOVEMENT AXIS). In one block participants only made pitch head movements, in another block they only made yaw head movements, and in the remaining block they only made roll head movements (see Fig. 1). On each trial, participants made these oscillatory head movements while viewing the “Tron”-like virtual room. Depending on the trial: (1) the requested self-generated head movements were either slow (0.5 Hz) or fast (1.0 Hz) (HEAD MOVEMENT SPEED); and (2) the visual consequences of their head movements were delayed by an extra 0, 44, 89, 133, 178, or 222 ms (DISPLAY LAG). While HEAD MOVEMENT AXIS was (randomly) manipulated across blocks, HEAD MOVEMENT SPEED and DISPLAY LAG were both (randomly) manipulated within blocks.

Before each experimental block, seated participants first observed the experimenter make the type of angular head movement required for all of the trials in that block. They then attempted to recreate that head movement (matching their head movement amplitude to that of the experimenter). After donning their HMD, they then completed two practice trials (one fast and one slow HEAD MOVEMENT SPEED condition—both with 0 ms DISPLAY LAG). While participants generated oscillatory head movements continuously for 17 s (timed to the sound of an audible metronome), the virtual room was only visible for the last 12 s of each trial (the screen was black—except for a green fixation point—for the first 5 s). Then, when the virtual room had disappeared from view, they were prompted to provide ratings of: (1) cybersickness severity (using the Fast Motion Sickness (FMS) Scale: “How sick do you feel?” from 0 = “well” to 20 = “frank sickness”—Keshavarz and Hecht 2011b); (2) perceived scene instability (“How stable does the environment appear to be?” from 0 = “stable” to 20 = “completely unstable”—Kim et al. 2021); and (3) spatial presence (“How much do you feel like you are there?” from 0 = “not at all in the virtual environment” to 20 = “completely present within the virtual environment”—Teixeira and Palmisano 2021). These ratings were each made via a partially filled 2-D virtual rectangular figure shown in the HMD. Pressing the up and down arrow keys on a computer keyboard either filled or emptied this rectangular figure, respectively (to a maximum of 20 or a minimum of 0—in steps of ± 1 with each press). After a 2-min break, participants then commenced their experimental trials (these were identical to the practice trials—except that the level of DISPLAY LAG varied from trial-to-trial). After completing their 3 ratings for each trial, there was a delay of at least 90 s before the next trial. During each block of 12 trials, participants remained in HMD VR. When it was time for the next trial, this was indicated to them via a text prompt on their display (there was also a check that they felt well enough to continue). It took approximately 20 min to complete one block of trials. Thus, each block roughly approximated the situation commonly faced by HMD users, where longer periods of minimal conflict are interspersed by brief periods of salient conflict (e.g. due to tracking problems or an under-performing VR system). At the end of each block, there was a minimum 10-min break before the next block of trials began. This study design (with its brief exposures to conditions of imposed display lag, its much longer interstimulus intervals, and its significant breaks between blocks) was aimed at minimising the likelihood of sickness contamination across trials and blocks.

2.4 Estimation of DVP

After testing had completed, we estimated each participant’s instantaneous DVP throughout each trial. We first obtained their yaw, pitch, and roll head orientation at each instant throughout the trial. Head orientation in the Oculus Rift CV1 HMD is reported in terms of yaw-pitch-roll rotations in a right-handed coordinate system (https://developer.oculus.com/documentation/native/pc/dg-sensor/). Yaw rotation occurs around the y axis, pitch rotation occurs around the x axis, and roll rotation occurs around the z axis (note that order matters when applying these different rotations to the virtual camera). In each case, positive rotations occur in a counter-clockwise direction (i.e. pitch-up, yaw-left, roll-left). In our head movement analysis code, we first used the rotation vectors from the participant’s HMD sensor data to build a 4 × 4 view matrix for each eye (accounting for their interocular separation––see Eq. 1). Using information about the forward, up and right vectors, we then calculated the deviation of the user’s current head orientation (in yaw, pitch and roll) from the base coordinate frame (in Euler angles). This was achieved using the mathematical transformations shown in Eqs. 24:

$${\text{View}}\,{\text{matrix}} = \left\{ {\begin{array}{*{20}c} {{\text{right}}_{x} } & {{\text{up}}_{x} } & {{\text{forward}}_{x} } & {{\text{position}}_{x} } \\ {{\text{right}}_{y} } & {{\text{up}}_{y} } & {{\text{forward}}_{y} } & {{\text{position}}_{y} } \\ {{\text{right}}_{z} } & {{\text{up}}_{z} } & {{\text{forward}}_{z} } & {{\text{position}}_{z} } \\ 0 & 0 & 0 & 1 \\ \end{array} } \right\}$$
(1)
$${\text{Yaw}}\, \left( {\theta } \right) = {\text{atan}}\left( { - \frac{{{\text{forward}}_{x} }}{{{\text{forward}}_{{\text{z}}} }}} \right)$$
(2)
$${\text{Pitch}}\, \left( {\phi } \right) = - {\text{asin}}\left( {{\text{forward}}_{y} } \right)$$
(3)
$${\text{Roll}} \,\left( {\psi } \right) = {\text{atan}}\left( {\frac{{{\text{right}}_{y} }}{{{\text{up}}_{y} }}} \right)$$
(4)

In this study, DVP was estimated based only on the participant’s head orientation data around the axis of the instructed head movement. For example, if they were asked to make roll head movements on a particular trial, then we estimated their DVP using their roll head orientation data only (ignoring the other differences in yaw and pitch head orientation) (see Fig. 3A).Footnote 8 The virtual room was visible for 12 s on each trial. However, we only calculated DVP for 8 of those 12 s (from 2 s after the room first appeared until 2 s before it disappeared). At each instant during this 8 s sample, the participant’s physical head orientation was estimated as their recorded head orientation along the instructed axis at that time. Their virtual head orientationFootnote 9 was also estimated using the same head orientation data. However, in this case, these data were phase shifted to account for the display lag (which was either 0, 44, 89, 133, 178 or 222 ms depending on the trial) (see Fig. 3B). The DVP (or sensory input conflict) at each instant during the trial was thus calculated as the difference between these physical and virtual head orientation estimates. Figure 3C shows the unsigned magnitudes of the DVP experienced by participant CN during a 0.5 Hz roll head movement trial when 133 ms display lag was added. Finally, we calculated the mean, peak and standard deviation of these DVP data. These DVP summary measures were then used to test the predictions of our DVP hypothesis for cybersickness (see Sect. 3.4).

Fig. 3
figure 3

This shows how the DVP time series data were estimated for each trial. A Participant CN’s yaw, pitch and roll head orientation data for one trial. On this trial, he was instructed to oscillate his head in roll at 0.5 Hz while an extra 133 ms display lag was added to the simulation. B CN’s physical and estimated virtual roll head-orientation data for that trial. C The unsigned differences between his virtual and physical roll head orientation (DVP) across the 8 s time sample

3 Results

3.1 Check on head movement compliance

Head movement frequencies and amplitudes were estimated for each trial by fitting a sinusoidal function to each trace (note: all values in yaw, pitch and roll were initially shifted so that their mean values for the trial were 0). Head movement amplitudes were then inspected along all 3 axes to check on participant compliance with instructions. As can be seen in Fig. 4, head movements occurred primarily along the axis of the instructed head movement in each block (with considerably smaller head movement amplitudes observed along the other two orthogonal head movement axes).

Fig. 4
figure 4

Mean recorded head movement amplitudes (deg) along all 3 axes for each instructed head movement axis (pitch, yaw or roll) and head speed (0.5 or 1.0 Hz) condition. Error bars represent standard errors of the mean

For example, when participants were asked to make pitch head movements, their yaw and roll head movement amplitudes were on average 84.6% less than those around the pitch axis. Similarly, when participants were asked to make roll head movements, their pitch and yaw head movement amplitudes were 91.3% less than those around the roll axis. Lastly, when participants were asked to make yaw head movements, their roll and pitch head movement amplitudes were on average 91.2% less than those around the yaw axis. Thus, participants clearly attempted to comply with their head movement instructions. However, it should be noted that while they were asked to keep their heads upright during yaw head movement conditions, they still generated unintended head movements in both roll and pitch.

3.2 Effects of head movement axis, head movement speed and display lag on head movements

On each trial, participants moved their heads at one of 2 speeds (0.5 or 1.0 Hz), along one of 3 different axes, and under one of 6 different levels of display lag. We were interested in how their tracked head movements varied across these 36 different experimental conditions. Thus, we conducted separate 3 (HEAD MOVEMENT AXIS) × 2 (HEAD MOVEMENT SPEED) × 6 (DISPLAY LAG) repeated-measures ANOVAs on the frequencies and amplitudes of their recorded head movements (Note: whenever there was a violation of sphericity, Greenhouse–Geisser corrections were applied).

3.2.1 Recorded head movement frequency

As expected, we found a significant main effect of HEAD MOVEMENT SPEED on recorded head movement frequencies, F(1,20) = 372.400, p < 0.0001, ηp2 = 0.949. Head movement frequencies were higher on average in the fast (1.0 Hz), compared to the slow (0.5 Hz), requested head speed conditions (see Fig. 5). We also found a significant main effect of HEAD MOVEMENT AXIS on recorded head movement frequencies, F(1.017,20.343) = 5.660, p = 0.03, ηp2 = 0.0221. However, after Bonferroni correction, pairwise comparisons failed to reveal any significant differences based on the axis of the instructed head movement (p > pcritical = 0.025). We also found a significant main effect of DISPLAY LAG on recorded head movement frequencies, F(5,100) = 5.008, p = 0.0001, ηp2 = 0.200. However, again after Bonferroni correction, pairwise comparisons failed to reveal any significant differences based on the level of display lag (p > pcritical = 0.01). No 2- or 3-way interactions were found to reach significance for recorded head movement frequency.

Fig. 5
figure 5

Mean head movement frequencies (in Hz) for each movement instruction (pitch, yaw, or roll), display lag (0–222 ms + 4 ms baseline lag) and head speed (1.0 or 0.5 Hz) condition. Error bars represent the standard errors of the mean

3.2.2 Recorded head movement amplitude

As expected, we found a significant main effect of HEAD MOVEMENT SPEED on recorded head movement amplitudes, F(1,20) = 30.872, p < 0.0001, ηp2 = 0.619. Head movement amplitudes were smaller on average in the fast (1.0 Hz), compared to the slow (0.5 Hz), requested head speed conditions (see Fig. 6). We also found a significant main effect of HEAD MOVEMENT AXIS on recorded head movement amplitudes, F(1.510,28.699) = 45.344, p = 0.0001, ηp2 = 0.705. Bonferroni-corrected pairwise comparisons revealed that recorded head movement amplitudes were greater in the yaw axis instruction conditions (M = 38.61°) compared to the roll axis instruction conditions (M = 26.88°) (p < 0.0001), which in turn produced greater head movement amplitudes than the pitch axis instruction conditions (M = 18.97°) (p < 0.0001).Footnote 10 The main effect of DISPLAY LAG did not reach significance for recorded head movement amplitudes, F(5,100) = 2.019, p = 0.083, ηp2 = 0.096. No 2- or 3-way interactions were found to be significant for these recorded head movement amplitude data.

Fig. 6
figure 6

Mean head movement amplitudes (in deg) for each movement instruction (pitch, yaw, or roll), display lag (0–222 ms + 4 ms baseline lag) and head speed (1.0 or 0.5 Hz) condition. Error bars represent standard errors of the mean

3.3 Effects of head movement axis, speed and display lag on subjective rating data

Separate 3 (HEAD MOVEMENT AXIS) × 2 (HEAD MOVEMENT SPEED) × 6 (DISPLAY LAG) repeated-measures ANOVAs were conducted on participants’ cybersickness severity, perceived scene instability and spatial presence ratings (when there was a violation of sphericity, Greenhouse–Geisser corrections were applied).

3.3.1 Cybersickness severity ratings

The main effects of HEAD MOVEMENT AXIS [F(2,40) = 3.080, p = 0.06, ηp2 = 0.133] and HEAD MOVEMENT SPEED [F(1,20) = 1.252, p = 0.276, ηp2 = 0.059] on cybersickness severity ratings were not significant. However, we did find a significant main effect of DISPLAY LAGFootnote 11 on cybersickness severity ratings, F(1.343,26.865) = 25.076, p = 0.0001, ηp2 = 0.556. Bonferroni-corrected pairwise comparisons revealed that: (1) 93 ms lag (M = 4.33) produced significantly higher cybersickness ratings than 48 ms lag (M = 2.14); and (2) 48 ms lag produced significantly higher cybersickness ratings than ~ 4 ms baseline lag (M = 1.09) (p < 0.0001 in both cases). While Fig. 7 shows a trend for mean cybersickness severity to continue to increase with higher levels of display lag, significant differences were not found when display lag was incremented above 137 ms (after Bonferroni correction was applied, p > pcritical = 0.01). None of the 2- or 3-way interactions were significant.

Fig. 7
figure 7

Shows the effects of HEAD MOVEMENT AXIS (pitch, yaw or roll) and DISPLAY LAG (0–222 ms + 4 ms baseline lag) on mean cybersickness severity ratings (0–20). The plot on the left shows these effects for slower requested head speeds (~ 0.5 Hz), whereas the plot on the right shows these effects for faster requested head speeds (~ 1.0 Hz). Error bars represent standard errors of the mean

Studies which repeatedly expose participants to potentially provocative motion stimuli run the risk of sickness contamination across trials and blocks. In this study, we attempted to minimise such carry-over effects by using brief 12 s exposures to increased display lag, much longer 90 s interstimulus intervals, and 10 min breaks between blocks. We performed an additional repeated-measures ANOVA on these cybersickness severity data to check for BLOCK NUMBER (1–3) and TRIAL ORDER (1–12) effects. We found a significant main effect of BLOCK NUMBER, F(2,40) = 3.850, p = 0.03, ηp2 = 0.161. However, the main effect of TRIAL ORDER [F(3.949, 78.976) = 1.256, p = 0.252, ηp2 = 0.059] and the BLOCK NUMBER by TRIAL ORDER interaction [F(22,440) = 0.646, p = 0.891, ηp2 = 0.031] were not significant. Post-hoc analyses confirmed that sickness severity was worse in the third/final block of trials (compared to the second block of trials).

3.3.2 Perceived scene instability ratings

The main effects of HEAD MOVEMENT AXIS [F(1,20) = 1.399, p = 0.259, ηp2 = 0.065] and HEAD MOVEMENT SPEED [F(1,20) = 1.294, p = 0.269, ηp2 = 0.061] on scene instability ratings were not significant. However, we did find a significant main effect of DISPLAY LAG on scene instability ratings, F(1.723,34.456) = 95.708, p = 0.0001, ηp2 = 0.827 (see Fig. 8). Bonferroni-corrected pairwise comparisons revealed that: (1) 226 ms lag (M = 11.91) produced higher scene instability ratings than 182 ms lag (M = 10.90) (p = 0.008); (2) 182 ms lag produced higher scene instability ratings than 137 ms lag (M = 9.41) (p = 0.0001); (3) 137 ms lag produced higher scene instability ratings than 93 ms lag (M = 7.50) (p = 0.0001); (4) 93 ms lag produced higher scene instability ratings than 48 ms lag (M = 3.83) (p = 0.0001); and (5) 48 ms lag produced higher scene instability ratings than the 4 ms baseline lag (M = 1.58) (p = 0.0001). None of the 2- or 3-way interactions were significant for perceived scene instability.

Fig. 8
figure 8

The effects of HEAD MOVEMENT AXIS (pitch, yaw or roll) and DISPLAY LAG (0–222 ms + 4 ms baseline lag) on mean scene instability ratings (0–20). Error bars represent standard errors of the mean

3.3.3 Spatial presence ratings

We found a significant main effect of HEAD MOVEMENT SPEED on spatial presence ratings, F(1,20) = 11.377, p = 0.003, ηp2 = 0.363—with higher presence ratings found for slow (M = 7.82), compared to fast (M = 6.21), requested head speeds (see Fig. 9). We did not find a significant main effect of HEAD MOVEMENT AXIS on spatial presence ratings, F(2,40) = 2.738, p = 0.08, ηp2 = 0.12]. However, we did find a significant main effect of DISPLAY LAG, F(1.237,24.737) = 25.898, p = 0.0001, ηp2 = 0.564 (see Fig. 10). Bonferroni-corrected pairwise comparisons revealed that: (1) 4 ms baseline lag conditions (M = 12.09) produced higher presence ratings than the 48 ms lag conditions (M = 9.26) (p = 0.0001), (2) 48 ms lag conditions produced higher presence ratings than the 93 ms lag conditions (M = 6.60) (p = 0.001), (3) 93 ms lag conditions produced higher ratings than the 137 ms lag conditions (M = 5.24) (p = 0.001). However, increasing the display lag above 137 ms did not result in further significant decreases in presence (p > pc = 0.01). We also found a significant interaction between HEAD MOVEMENT AXIS and HEAD MOVEMENT SPEED, F(2,40) = 5.169, p = 0.010, ηp2 = 0.205. As can be seen in Fig. 9, mean presence ratings were highest when slow head movements were made in yaw (as opposed to pitch or roll). However, mean presence ratings were similar for all three HEAD MOVEMENT AXIS conditions with faster head movements. None of the other 2- or 3-way interactions were found to reach significance for presence.

Fig. 9
figure 9

Shows that mean spatial presence decreased as HEAD MOVEMENT SPEED increased (from 0.5 to 1.0 Hz) for instructed head movements along all 3 axes (pitch, yaw and roll). Error bars represent standard errors of the mean

Fig. 10
figure 10

The effects of HEAD MOVEMENT AXIS (pitch, yaw or roll) and DISPLAY LAG (0–222 ms + 4 ms baseline lag) on mean spatial presence ratings (0–20). Error bars represent standard errors of the mean

3.4 Predicting cybersickness severity based on estimated DVP and ratings of perceived scene instability

While regression-based analyses assume their data represent independent samples, this study had a repeated-measures design. Thus, in order to investigate whether DVP estimates,Footnote 12 or perceived scene instability ratings, could be used to predict cybersickness severity, we adopted an approach recommended by Lorch and Meyers (1990). First, we calculated the mean, peak, and standard deviations of the DVP time series data for each trial. We also identified the perceived scene instability rating for each trial. Then, for each participant in each HEAD MOVEMENT AXIS condition, we separately plotted each of these possible predictor variables (i.e. mean, peak, standard deviation of the DVP or perceived scene instability ratings) against their corresponding cybersickness severity ratings for the same trial. This resulted in 12 different scatterplots for each participant (one plot for each of the 4 possible predictors in each of the 3 HEAD MOVEMENT AXIS conditions). We then calculated the slopes of the lines of best fit for each of these scatterplots. These slope data were collated for all 21 participants and subjected to 12 different (Bonferroni-corrected) one-sample t tests. The means and standard deviations of the slope data used in these one-sample t tests are reported in Table 1 (which also provides the means and the standard deviations of the R2 values for these lines of best fit). We found that estimates of our participants’ DVP (along the axis of the instructed head movement) significantly predicted their cybersickness severity in each of the different HEAD MOVEMENT AXIS conditions tested (pitch, yaw, and roll—see Table 1). In each case, positive linear relationships were found between the mean, peak, and standard deviation of these DVP estimates and cybersickness severity ratings. We also found that participant ratings of perceived scene instability significantly predicted their cybersickness severity in each of these different HEAD MOVEMENT AXIS conditions (pitch, yaw, and roll—see Table 1). In each case, we found positive linear relationships between their perceived scene instability ratings and their cybersickness severity ratings.

Table 1 Means and standard deviations of the regression slope data used in one-sample t tests examining relationships between DVP/scene instability and cybersickness severity

3.5 Relationship between DVP and perceived scene instability

Similar analyses to those outlined in Sect. 3.4 were also conducted to examine the possible relationships between DVP and perceived scene instability. We found significant positive linear relationships between estimates of our participants’ DVP (along the axis of the instructed head movement) and their perceived scene instability ratings (see Table 2). Specifically, significant relationships were found between the mean, peak, and standard deviation of this DVP and perceived scene instability during pitch, yaw, and roll head movements.

Table 2 Means and standard deviations of the regression slope data used in one-sample t tests examining relationships between DVP and perceived scene instability

3.6 Relationship between DVP and spatial presence

Similar analyses to those outlined in Sect. 3.4 were also conducted to examine the possible relationships between DVP/scene instability and spatial presence. We found significant negative linear relationships between estimates of our participants’ DVP (along the axis of the instructed head movement) and their spatial presence ratings (see Table 3). Specifically, significant relationships were found between the mean, peak, and standard deviation of this DVP and spatial presence ratings during pitch, yaw, and roll head movements. We also found significant negative linear relationships between perceived scene instability and spatial presence ratings during each of these three different types of head movements (see Table 3).

Table 3 Means and standard deviations of the regression slope data used in one-sample t tests examining relationships between DVP/scene instability and spatial presence

3.7 Relationship between spatial presence and cybersickness severity

Similar analyses to those outlined in Sects. 3.43.6 were also conducted to examine the possible relationships between spatial presence and cybersickness severity. We found significant negative linear relationships between spatial presence and cybersickness severity for pitch and yaw head movements (see Table 4). While there was also a trend for a negative relationship between spatial presence and cybersickness severity during roll head movement conditions, this did not remain significant after Bonferroni correction for multiple comparisons.

Table 4 Means and standard deviations of the regression slope data used in one-sample t tests examining relationships between spatial presence and cybersickness severity

4 Discussion

In this study, participants were asked to make fast or slow head rotations around one of the three cardinal axes while in HMD VR. Head movements were confirmed to occur primarily along the instructed axis (either roll, pitch or yaw) and at approximately the indicated speed (either 0.5 or 1.0 Hz) on each trial. As expected, participant ratings of perceived scene instability and cybersickness severity were both found to increase with the DISPLAY LAG. In Figs. 7 and 8, both types of ratings can be seen to increase as the amount of imposed display lag increased. However, other HEAD MOVEMENT AXIS (i.e. pitch, yaw and roll) and HEAD MOVEMENT SPEED (i.e. ~0.5 Hz and ~1.0 Hz) manipulations were not found to significantly alter either perceived scene instability or cybersickness severity ratings.

We also found significant effects of DISPLAY LAG on participants’ spatial presence ratings. However, in this case, feelings of presence were found to decrease (rather than increase) as the imposed display lag increased. Unlike perceived scene instability and cybersickness severity ratings, HEAD MOVEMENT AXIS and HEAD MOVEMENT SPEED manipulations were both found to alter participants’ spatial presence ratings. In general, slower head movements were found to result in stronger feelings of spatial presence—especially when participants made slow head movements around the yaw axis (as opposed to the pitch or roll axes).

However, the main purpose of the current study was to examine whether objective estimates of the DVP might explain the above effects on cybersickness severity and scene instability ratings (as well as any observed effects on spatial presence ratings). Importantly, we found evidence that the mean, peak and standard deviation of our participants’ DVP could be used to predict the effects of our experimental manipulations on both types of subjective ratings. These results are discussed in more detail below.

4.1 Does DVP predict perceived scene instability and cybersickness severity?

Consistent with the predictions of our DVP Hypothesis, we found significant positive linear relationships between the mean, peak, and standard deviation of our participants’ objectively estimated DVP and their subjective reports of cybersickness severity.Footnote 13 Each of these DVP summary measures appeared to be capable of predicting the cybersickness ratings produced by our DISPLAY LAG and HEAD MOVEMENT SPEED manipulations. Importantly, these relationships were also found to hold for our three different HEAD MOVEMENT AXIS conditions. That is, these DVP summary measures appeared to be capable of predicting increases in cybersickness during all three types of self-generated head rotation (i.e. head movements in yaw, pitch and roll). As can be seen in Table 1, these DVP summary measures were found to account for between 35 and 44% of the variance in cybersickness severity ratings (on average). However, it is important to note that there were also individual differences in the nature of these DVP-cybersickness relationships (indicated by the standard deviations of these slope data—which are also provided in Table 1).

Large-amplitude DVP generated by display lag during active HMD VR is often consciously experienced as perceived scene instability. As expected, we found significant positive linear relationships between the mean, peak and standard deviation of our participants’ DVP and their subjective ratings of perceived scene instability. This suggests that perceived scene instability might serve as a reliable indicator of some provocative sensory conflicts/DVP for cybersickness. Consistent with this idea (originally outlined in Kim et al. 2020), significant positive linear relationships were also found between perceived scene instability and cybersickness severity for all three types of self-generated head rotation (i.e. head movements in yaw, pitch and roll). As can be seen in Table 1, perceived scene instability was found to account for between 51 and 57% of the variance in cybersickness severity ratings. Thus, in the current experiment, subjective ratings of perceived scene instability appeared to be as effective as objectively estimated DVP in identifying problematic HMD VR conditions for cybersickness.

4.2 Is yaw DVP less provocative for cybersickness than roll or pitch DVP?

In our previous studies, we examined display lag effects on cybersickness when HMD users made head movements in either pitch (Kim et al. 2020) or yaw (Feng et al. 2019; Palmisano et al. 2019). Based on subjective vertical conflict theory, we had expected self-generated pitch head movements to be more provocative than self-generated yaw head movements. Instead, our earlier observations (made retrospectively across the three studies) suggested that sickness and display lag effects were similar during yaw and pitch head movements.

In the current study, HEAD-MOVEMENT AXIS was deliberately manipulated as a within subjects factor. Participants were instructed to make head rotations about each of the three cardinal axes (i.e. with sickness due to DVP/display lag during roll head rotations being examined here for the first time). Contrary to the predictions of subjective vertical conflict theory, we found that: (1) cybersickness still increased with the level of display lag during yaw head rotations; and (2) sickness severity ratings were roughly similar (on average) across all three types of head rotation (i.e. yaw, pitch and roll).

However, caution should be used when interpreting the above findings in terms of the subjective vertical conflict theory. In our research, participants were asked to rotate their heads exclusively around a single axis (either roll, pitch, or yaw depending on the block of trials/experiment). However, as can be seen in Figs. 3A and 4, they also made smaller (but still potentially significant) head movements along the other two orthogonal axes as well. Conceivably, the smaller pitch and roll head movements made in our yaw instruction trials could still have generated provocative subjective vertical conflicts (Bles et al. 1998). As can be seen in Fig. 7, there was also a (non-significant) trend for cybersickness severity to be reduced during 0.5 Hz yaw (compared to roll and pitch) head movement conditions. Thus, it still might be possible to explain our findings based on a version of subjective vertical conflict theory. However, in apparent contradiction to stricter interpretations of the theory, conditions which predominately generated yaw DVP were still found to generate cybersickness, and the severity of this cybersickness was found to increase with the amplitude and the variability of this yaw DVP.

4.3 Understanding relationships between DVP and perceived scene instability

In the current experiment, we found that cybersickness severity appeared to be generally predicted by both objective estimates of the DVP and subjective ratings of perceived scene instability. If perceived scene instability is actually the conscious experience of provocative DVP, then this would explain why these ratings also predicted cybersickness severity (in a very similar fashion to the objectively estimated DVP).Footnote 14 But why were correlations between the summary measures of the DVP and perceived scene instability ratings not higher (i.e. close to explaining 100% of the variance in the other variable)? As can be seen in Table 2, the mean, peak, and standard deviation of the DVP accounted for between 45 and 74% of the variance in the perceived scene instability ratings (on average). There are several possible explanations for the remaining unexplained variance, which are each outlined below:

First, the simplest explanation for this unexplained variance was that our DVP and scene instability data were based on objective measurements and subjective reports, respectively (subjective reports typically introduce more variance).

Second, in the current study, we only estimated angular DVP around the axis of the instructed head movement. For example, when participants were asked to make pitch head movements, we only estimated DVP based on angular differences in their pitch head orientation. As a result, our objective estimates ignored additional DVP produced by: (1) unintended head rotations around the other two orthogonal axes; and (2) unintended head translations along all three axes. So our estimates likely underestimated the actual magnitudes of the DVP. By contrast, it was likely that participant ratings of perceived scene instability were based on 6-DOF DVP (i.e. angular DVP around, as well as linear DVP along, each of these three axes). This could explain why perceived scene instability appeared to be somewhat better at predicting cybersickness severity (compared to our objectively estimated DVP around a single axis).

Third, the DVP generated during active HMD VR were complex time series data. However, in the current study, we only calculated, and then subsequently analysed, three summary measures of these DVP data (i.e. the mean, peak and standard deviation of the DVP along the axis of instructed head movement). Unfortunately, none of these DVP summary measures captured how the DVP evolved over time (i.e. its temporal dynamics). They only provide information about the spatial magnitude of the DVP. By contrast, ratings of perceived scene instability were likely based on both the spatial magnitude and the temporal dynamics of the HMD user’s DVP.

Thus, in summary, if subjective ratings of perceived scene instability were influenced by additional information about DVP magnitude and temporal dynamics, then this would explain why they appeared to account for 7 to 18% more of the variance in cybersickness severity ratings (compared to the three summary measures of the objectively estimated DVP that we calculated—which were all focussed only on its spatial magnitude along the axis of the instructed head-movement).

4.4 Understanding relationships between DVP, spatial presence and cybersickness

Participant feelings of spatial presence generally decreased as the mean, peak and standard deviation of their DVP increased. Each of these DVP summary measures appeared to be capable of predicting the presence ratings produced by our DISPLAY LAG and HEAD MOVEMENT SPEED manipulations. These relationships were also found to hold for our three different HEAD MOVEMENT AXIS conditions. As can be seen in Table 3, the summary DVP measures appeared to account for between 40 and 48% of the variance in the spatial presence ratings (on average). We also found similar inverse relationships between perceived scene instability and spatial presence. As can be seen lower in Table 3, perceived scene instability appeared to account for between 54 and 61% of the variance in the presence ratings. We interpreted these findings as follows: During active HMD VR, participants perceived the DVP due to display lag as scene instability, which in turn, weakened their feelings of “being there” in the virtual environment.

Based on a review of the results of 20 VR studies, Weech et al. (2019) recently concluded that there is an inverse relationship between presence and cybersickness. They argued that either: (1) strong feelings of presence suppress this sickness by directing attention away from sensory conflict; or (2) sickness symptoms decrease presence by reducing attention to the virtual environment (see also Usoh et al. 1999, as well as Witmer and Singer 1998).Footnote 15 Consistent with Weech et al.’s conclusion, and past findings on the effects of display lag on these experiences (Kim et al. 2020), we also found negative linear relationships between spatial presence and cybersickness severity in the current study. However, it should be noted that these relationships were only significant during pitch and yaw (not roll) head movements. As can be seen in Table 4, cybersickness severity was found to account for either 45 or 47% of the variability in spatial presence ratings (during pitch and yaw head movement conditions, respectively).

4.5 Limitations of the study

There was some evidence of sickness contamination due to running multiple blocks of trials on the same day. While sickness severity ratings did not vary significantly within each block as a function of trial order, they were found to be higher on the final (relative to the second) block of trials. Nevertheless, we were still able to find clear DISPLAY LAG effects on sickness ratings. However, it is possible that this sickness contamination might have obscured more modest (but expected) effects of (requested) head speed on cybersickness. Alternatively, it is possible that identification of such effects might have benefitted from an increase in the study’s statistical power.

While our participants were instructed to rotate their heads exclusively around a single axis, they also made other unintended head movements as well (see Figs. 3A, 4). This was not ideal for testing subjective vertical conflict theory (see Sect. 1.1.2), which predicts that display lag should not provoke cybersickness during pure yaw head rotations. In our study, display lag was applied equally to all user head movements made during a trial (irrespective of whether they were intended or not). This meant that the visual consequences of yaw, pitch and roll head movements were all delayed to the same degree. It is therefore quite possible that small pitch and roll head movements in our yaw rotation instruction trials were still capable of generating significant subjective vertical conflicts. Thus, instead of adding extra display lag to all head movements, it might have been better to selectively apply it only to head movements along a single axis (e.g. delay the visual consequences of the HMD user’s yaw head movements, but not the visual consequences of their pitch and roll head movements). Alternatively, if self-generated head movements were not essential, an alternative way to test this theory might have been to physically rotate our participants passively around each axis separately while in HMD VR.

As was noted above, in the current study, we only estimated DVP along the axis of the instructed head movement. There are, of course, other ways to calculate these DVP time series data. In the future, we need to determine whether combining DVP across all three head axes, and across both linear and angular head movements, improves: (1) the relationship between DVP and perceived scene instability; and (2) the ability of DVP to predict cybersickness severity and presence.

In the current study, we also calculated the mean, peak and standard deviation of the DVP for every trial. However, these summary measures of DVP magnitude/variability ignore the temporal evolution of DVP over the course of each trial. Thus, future research needs to determine whether summary DVP and temporal dynamics based DVP measures are both required to fully account for user perceptions of scene instability and experiences of cybersickness. However, such studies will require longer exposures to the visual consequences of display lag during active HMD VR (ideally, for Detrended Fluctuation Analysis, or Recurrence Quantification Analysis, the individual head movement exposure trials would need to last for a minimum of 60 s).

This study was focused on the relationship between DVP and cybersickness severity. However, future research still needs to determine the exact nature of the changes in DVP that initially trigger this cybersickness. In such a study, the participants would need to remain in active HMD VR until their first report of cybersickness (or until the simulation timed out). This would allow us to identify triggering changes in the DVP by examining the sick participant’s DVP just prior to them dropping out (compared to their DVP at early times in the HMD VR exposure).

4.6 Reconciling the DVP hypothesis with other well-known sickness findings

Currently, our DVP hypothesis does not include specific mechanisms to explain two well-known cybersickness findings: (1) sickness severity tends to decrease with repeated exposure to an initially provocative stimulus (e.g. Gavgani et al. 2017; Hill and Howarth 2000; Howarth and Hodder 2008; Risi and Palmisano 2019; Palmisano and Constable 2022); and (2) some HMD users are more likely to become sick, and experience this cybersickness more severely, than others (e.g. Arcioni et al. 2019; Munafo et al. 2017; Risi and Palmisano 2019; Teixeira and Palmisano 2021). As noted previously, our rather pragmatic DVP hypothesis is focussed exclusively on cybersickness during active user HMD VR. In that very specific context, we have shown here that it can be used to objectively predict reported increases in cybersickness. That is, based solely on directly observable information about sensory input conflicts (i.e. without any other information about the user), DVP indices can be used to predict significant amounts of the variability in cybersickness severity ratings (i.e. 35–44%). As was noted above, this DVP-based prediction is likely to improve when we estimate 6-DOF DVP and include temporal dynamics information (i.e. relative to the present predictions, which were only based on simple summary statistics of 1-DOF DVP spatial magnitude). However, it is likely that DVP-based prediction would also be further improved by the inclusion of user-specific information. This might require researchers (and content developers) to obtain additional subjective/qualitative/demographic data about the HMD user (such as estimates of their past exposure history to motion both inside and outside of HMD VR, their expectations about current/future patterns of multisensory stimulation, their biological sex, etc.). However, it might be possible to explain the effects of repeated presentation and individual differences on sickness experiences by obtaining other more objective types of information about the user. Important user-specific information could potentially be provided by assessments of their postural stability/instability—as pre-exposure sway has been successfully used to predict susceptibility to cybersickness (Arcioni et al. 2019; Munafo et al. 2017; Risi and Palmisano 2019; Teixeira and Palmisano 2021), and sway in the presence of provocative motion stimulation has been shown to be altered by repeated exposure (e.g. Stoffregen et al. 2013). Thus, a modified version of our DVP theory that includes stability-based predictors could potentially also explain individual differences in user susceptibility to cybersickness in HMD VR, as well as account for reductions in this sickness with repeated exposure.

5 Conclusions

In this study, we replicate and extend our recent findings that cybersickness due to display lag scales with differences in an HMD user’s virtual and physical head pose (see also Kim et al. 2020; Palmisano et al. 2020). We treat this sensory conflict—experienced during active user HMD VR—simply as the DVP (i.e. directly observable differences in the stimulation applied to the HMD user’s sense organs).Footnote 16 Unlike other sensory conflict theories, DVP estimates do not rely on internal models of the user’s sensory processing (i.e. they do not attempt to model any transduction or encoding effects, or multisensory cortical interactions, or the influence of expectations based on the user’s past experience on this internal processing). Even so, it would appear that they can still be used to predict experiences of cybersickness (and spatial presence) during active HMD VR. Consistent with our DVP hypothesis, cybersickness severity in the current study was found to consistently increase with the mean, peak and standard deviation of the DVP. These summary DVP measures were each found to account for between 35 and 44% of the variance in cybersickness ratings (during self-generated yaw, pitch and roll head movements). As this simple, directly observable definition of sensory conflict appears to provide sufficient general predictive power, we believe that it could greatly assist researchers identify provocative situations for cybersickness in HMD VR (since it is straightforward to design experiments that systematically manipulate DVP across trials).

In the current study, we injected extra constant display lag into our simulation to systematically manipulate the user’s DVP during active HMD VR. While provocative DVP for cybersickness can still be generated in the absence of such display lag manipulations, the artificial addition of display lag does produce DVP that is easy to control and quantify. For example, a head-stationary HMD user would also experience DVP when viewing a virtual roller coaster ride (because their visually simulated self-acceleration would not be confirmed by their available non-visual stimulation). When they later move their head during the simulation, extra DVP could also be generated by transient tracking problems (even when the VR system lag is minimal). Importantly, it has been shown that cybersickness can be exacerbated by latency spikes that are so brief that they are not consciously detected by the user (see Stauffert et al. 2018). All of the above situations can produce large-magnitude, time-varying patterns of DVP, which would be expected to increase the likelihood and severity of cybersickness based on our DVP hypothesis. The relationship between DVP and cybersickness during brief latency spikes is of particular interest to us (as these latency spikes are similar to display lags commonly produced in under-performing VR systems). Examination of their effects on cybersickness incidence and severity should therefore be a particular focus of future research.

In this study, ratings of perceived scene instability appeared to provide somewhat better predictions of cybersickness severity than our objective estimates of the DVP. When taken together, these findings suggest that it should be possible to: (1) further improve cybersickness prediction using objective estimates of the DVP; and (2) identify many provocative VR conditions based simply on the HMD user’s subjective reports of perceived scene instability.