4.3 Immersive VR View
Inspired by related work [
16,
67,
107], the immersive VR view enables the interactive re-experience of AUI studies (see Figure
3). For this,
AutoVis replicates the ego-vehicle, other road users, passenger behavior, and environmental context (see
4.1.2).
Analysts can interact with the environment using their VR controllers for direct touch (tracked by Unity GameObject collisions) or interact with distant objects via raycast. Analogous to object selections in the desktop 3D scene panel, analysts can interact with avatars, trajectories, heatmaps, events, and annotations. The object selection via direct touch places a context menu next to the selected component in VR. For example, next to an avatar’s head. The context menu provides the same features as the desktop view’s inspector and overview panel (see
4.2). The (distant) object selection via raycast opens the context menu in the virtual tablet attached to the left controller (see Figure
3 C). This ensures their readability regardless of low VR resolutions and prevents unnecessary approaching of distant objects. In addition, the tablet displays scene controls, the timeline, study-related metrics, the 2D panel, the event line, and a mini-map (
R3) (see Figure
4 c and d).
The VR view hosts 3D visualizations that are adapted from related immersive analytics tools [
16,
67,
107] and novel approaches to overcome AUI domain-specific challenges (see Figure
3):
avatars (A),
trajectories (B),
in-vehicle, and
environment (D) heatmaps.
Spatio-Temporal Events & Annotations. AUI study analysis considers not only the event duration but also their location. Inspired by Büschel et al. [
16], we propose to visualize such spatio-temporal events in the immersive VR view (see Figure
4 a), indicating the location and orientation of (inter)action, emotion, driving, and activity events, addressing
R2 -
R4. This enables discovering spatial relationships between interactions with in-vehicle UIs, driving environment, context, and passenger states (e.g., emotion or stress). However, automotive events can visually overlap on a vertical axis if study vehicles drove the same route. To overcome this, inspired by Fouché et al. [
53], we propose a vertical axis
explode view for events of individual participants, triggered via direct touch or raycast. However, events can be distributed across large distances (e.g., several kilometers, see the challenge in
3.2) and hidden between replicated 3D buildings and trees. Therefore, analysts can visualize on hover (e.g., via raycast) events of the same type (e.g., emotion) on a visual layer of higher priority than the remaining environment to peek through 3D objects.
AutoVis supports creating, editing, and persistent sharing of annotations in VR, addressing
R6 and
R9. Analysts can place annotations in space, similar to 3D markers in MIRIA [
16] and MRAT [
98], by moving to a specific position and open the edit menu via the controller. Such annotations are spatio-temporal
labels or
comments that are linked in space and to the timeline of a dataset (see Figure
4 b).
AutoVis visualizes this link by automatically placing
labels on the spatial event line. Analysts can use the
labels to annotate their dataset, for example, for supervised DL. In contrast,
comments can be set and edited anywhere in the 3D environment, for example, to leave hints, descriptions, and opinions about the analysis for oneself (when switching views) or collaborators.
Avatars. In
AutoVis, avatars replicate passengers (see Figure
3 A) from pre-recorded 3D skeleton data of participant movements (
R1). Inspired by AvatAR [
107],
AutoVis updates the avatars in each playback time step. Free VR movement around avatars enables exploration of posture, relation to the vehicle environment, and movement patterns. Analysts can enter an avatar’s POV to gain first-person insights into how passengers interact with their surroundings. Such an embodied analysis is impossible using non-immersive analysis tools.
AutoVis displays a distinct avatar for each participant. In contrast to AvatAR [
107], where avatars replicated room-scale movements, the
AutoVis avatars have the same positions (e.g., sitting on driver and passenger seats). Therefore, we propose an
aggregated avatar, which aggregates the positions and movements of the individual avatars to increase visual clarity (see Figure
5 a). For the
aggregated avatar’s skeleton,
AutoVis calculates the average position and rotation of the individual avatars’ joints per frame. Using the
aggregated avatar, analysts can explore similar passenger behaviors on a meta-level (
R3). To further reduce visual clutter, the avatars are semi-opaque, and their colors correspond to the participants’ tool-wide colors.
Trajectories. Similar to [
16,
67,
107], we employ 3D trajectories. The trajectories (see Figure
3 B) correspond to an avatar, providing a different representation of movements. They replicate hands and head movements for a selected time frame (
R1). The trajectories are colored lines matching their avatar’s color (see Figure
5 b). They provide an overview of the passenger movements for a specific time frame without the need for playback in real-time.
In-Vehicle Heatmaps. In-vehicle heatmaps provide an overview of interactions with interior surfaces (
R1), such as windshield display, center console, or dashboard. Instead of classic 2D heatmaps (e.g., as in [
107]),
AutoVis employs heatmap textures that accurately map the 3D interior mesh (see Figure
5 c). Gaze heatmaps visualize where passengers looked, for example, to investigate glances at the dashboard or center console. This can help determine passenger states, such as distraction and cognitive load. Likewise, analysts can use touch heatmaps. Combined touch and gaze heatmaps may indicate modality interdependencies (
R3) (see Figure
5 c).
Environment Heatmaps. According to our data specification (see
4.1.1), passengers can interact with the vehicle environment via gaze, pointing, and speech. Therefore,
AutoVis employs gaze and pointing heatmap textures accurately mapping the replicated 3D meshes of buildings and other road users (see Figure
5 d). These heatmaps can help to determine, for example, driver distractions or detect movement and gesture patterns (
R1). It also highlights correlations between environmental context and gaze/pointing interactions (
R4). Moreover, a traffic heatmap displays the positions of other road users (
R1) (see Figure
5 d). This enables inferring the current driving context and traffic flow. In addition, analysts can modify the distinct color schemes of each heatmap (
R5).
Context Portals. Large distances between objects of interest and volatile in-vehicle and environmental contexts are common challenges in AUI study analysis (see
3.2). We propose context portals to overcome these challenges and to address
R3 and
R4. Context portals provide a glimpse of the referenced context (i.e., a location or object) in interactions by showing a spatial portal next to an avatar’s finger or head (e.g., as a thought bubble). The portal shows the object or location up close using a render image from an additional virtual camera (see Figure
6). Analysts can activate a context portal by selecting an
(inter)action event (see timeline in Appendix
B) on the event line in 3D or on the virtual tablet. However, only one portal can be visible at a time. We distinguish two context portal modes: (1)
direct and (2)
indirect.
AutoVis displays a
(1) direct context portal when participants referenced objects or locations in the vicinity using gaze, pointing, or speech. The portal then shows a zoomed view of the referenced entity from the avatar’s POV, for example, enabling to determine an object’s visibility time during an interaction. To explicitly visualize gaze and pointing targets, there is an additional ray and hit point visualization that reaches through the
direct context portal (see Figure
6 a and b). Regardless of interacting modality, the referenced object’s outline is highlighted to make it stand out against the environment (see Figure
6 b).
The interaction modality determines the 3D position of the
direct context portal. In a pointing interaction, the
direct context portal is displayed in an extended line of two meters from the fingertip of the respective avatar (see Figure
6 b). Analogously, the avatar’s eyes are the reference point for positioning the portal for gaze interactions. However, for speech interaction, the portal is a thought bubble next to the respective avatar’s head. In addition, a speech bubble is displayed underneath, which contains the utterance for the inspected time frame (see Figure
6 d).
An
(2) indirect context portal visualizes referenced objects or locations that are not present in the environment.
AutoVis queries the missing information, for example, from Google Maps, and shows a screenshot of the result in the
indirect context portal (see Figure
6 c). Since passengers can only reference objects that are not in the vicinity with speech, a thought bubble displays the query result. The
direct and
indirect context portals circumvent searching for referenced objects or locations (e.g., a landmark) in the environment. Otherwise, this search can be time-consuming and challenging if the driving environment is unknown or may result in a barely visible distant object or location.