3.1 Adaptive UI for MAR
After data is prepared for visualisation, there are three main steps in the visualisation pipeline [
296], including (1) filtering, (2) mapping, (3) and rendering. Zollmann et al. [
296] perform a comprehensive review organised by this visualisation pipeline. We adopt their approach for classifying the characteristics of visualisation techniques, and we further investigate the context-aware adaptive interface techniques that adapt to a person, task, or context environment. Several algorithms, such as ML-based algorithms, have been applied in adaptive user interfaces for improving human-interface interactions. They are designed to aid the user in reaching their goals efficiently, more easily, or to a higher level of satisfaction. We evaluate these works by considering the user experience measurement perspectives and summarise four adaptive user interface techniques [
112,
296] commonly found in the existing frameworks, with the following requirements for the implementations of each set of interfaces:
Information Filtering and Clustering (InfoF): Overload and clutter occur when a large amount of information is rendered to users. Complex visualisation of augmented contents negatively affects visual search and other measures of visual performance. To produce better user interfaces, there are two approaches to reduce the complexity of interfaces by decreasing the amount of displayed content. The first approach is information filtering, which can be implemented by developing a filtering algorithm that utilises a culling step and a detailed refinement step. Another approach is information clustering [
246], which groups and represents the information by their classification.
Occlusion Representation and Depth Cues (OcclR): Occlusion representation provides depth cues to determine the ordering of objects. Users can identify the 3D locations of other physical objects when large structures occlude over or under each other. Julier et al. [
111] suggest three basic requirements for depth cues, including (1) the ability to identify and classify occluding contours; (2) the ability to calculate the level of occlusion of target objects and parameterising the occlusion levels if different parts of the object require this; and (3) the ability to use perceptually identified encodings to draw objects at different levels of occlusion.
Illumination Estimation (IEsti): Light estimation is another source of information depicting the spatial relations in depth perception, enhancing visual coherence in AR applications by providing accurate and temporally coherent estimates of actual illumination. The common methods for illumination estimation can be classified into auxiliary or non-auxiliary information.
Registration Error Adaptation (RegEA): Another mapping step of the rendering pipeline. Trackers are imprecise, and time-varying registration often exists. Therefore, the correct calibration of devices and displays is complex. This subsequently leads to graphical contents not always aligning perfectly with their physical counterparts. Accordingly, the UI should be able to adapt while visualising the information dynamically. Ambiguity arises when the virtual content is not interacting with the context environment around users.
Adaptive Content Placement (AdaCP): AR annotations are dynamic 3D objects that can be rendered on top of physical environments. Text labels must be drawn with respect to the visible part of each rendered object to avoid confusing and ambiguous interactions.
InfoF: Information clutter is prevalent in some context-aware AR applications due to complex outdoor environments. Exploring and searching for information on AR screens become indispensable tasks for users. Information filtering is a necessary technique in modern AR systems [
249], especially in large and complicated environments, where information overload is significant. Without intelligent filtering and selection tools automation, the display would always lead to difficulty in reading information [
123]. There are three main methods for filtering information, including (1) spatial filters, (2) knowledge-based filters, and (3) location-based filters [
249]. Spatial filters select information displayed on screens or in the object space based on physical dimension rules. These filters require user interactions to investigate the entirety of the virtual content. For example, users must move their MAR devices to view large 3D models. But the method only works locally in a small region. The immersive AR application always applies spatial filters to exclude the information that is out of the user’s view. Knowledge-based filters enable user preferences to be the filtering criteria [
249]. Expert knowledge filters embed behaviour and knowledge in the coding, regulating the system’s data structures to infer recommendations and output the items satisfying user requirements. Such knowledge coding can be done in different ways, such as rules in rule-based systems [
195]. Finally, spatial information from the location-based filters can be combined with knowledge-based filters as hybrid methods. As new sensors are embedded in modern AR headsets, such as gaze sensors, the user bio-information and context information can be used for filtering [
6].
OcclR: Comprehensive AR systems track sparse geometric features and compute depth maps for all pixels when visualising occluded objects or floating objects in AR [
112]. Depth maps provide depth values for each pixel in captured scenes. They are essential for generating depth cues, helping users understand their environment, and aiding interactions with occluded or hidden objects. Recent AR frameworks, such as Google ARCore [
82] and Apple ARKit [
16], provide depth map data for enabling depth cue features in MAR applications [
59]. Physical and virtual cues are two options for producing depth cues in AR applications to support depth perception [
296]. Physical cues can be used to rebuild natural pictorial depth cues [
278], such as occlusion or shadows. Integrating depth maps and RGB camera images can provide the necessary natural pictorial depth cues [
296]. Subsequently, virtual cues and visual aids are generated by applications to provide similar depth cues as physical alternatives. “X-ray vision” is a technique for virtual cues frequently used to perceive graphics as located behind opaque surfaces. DepthLab [
59] is an application using ARCore’s Depth API, enabling both physical and virtual cues, helping application developers integrate depth into their AR experiences. DepthLab implements the depth map and depth cues for at least six kinds of interactions in AR, including (1) oriented reticles and splats, (2) ray-marching-based scene relighting, (3) depth visualisation and particles, (4) geometry-aware collisions, (5) 3D-anchored focus and aperture effect, and (6) occlusion and path planning [
59]. Illumination estimation is typically achieved with two traditional approaches, including: (1) Methods utilising auxiliary information that leverages RGB-D data or information acquired from light probes, and the methods can be an active method like the fisheye camera used by Kán et al. [
114] or a passive method like reflective spheres used by Debevec [
57]. (2) Estimating the illumination using an image from the primary AR camera without the need of having an arbitrary known object in the scene. The auxiliary information can also be assumptions of some image features that are known to be directly affected by illumination or simpler models like Lambertian illumination. Shadows, the gradient of image brightness [
119] and shading [
273] are the typical image features for estimating illumination direction.
RegEA: Addressing registration and sensing errors is a fundamental problem in building effective AR systems [
24]. Serious registration errors can produce conflicts between user visual inputs and actions, e.g., a stationary user is viewing AR content that appears to be moving away from the user at constant momentum. Such conflicts between different human senses may be a source of motion sickness. Therefore, the user interface must automatically adapt to changing registration errors. MacIntyre et al. [
150] suggest using
Level Of-Error (LOE) object filtering for different representations of augmentations to be automatically used as registration error changes. This approach requires the identification of a target object and a set of confusers [
151]. Afterwards, their method calculates the registration errors for the target and all confusers. The delegation error convex hulls are used to bound the geometry of the objects. The hulls are constructed for two disjoint objects in the presence of substantial yaw error. The hull surrounding each object with a suitable label is sufficient to direct the user to the correct object edges.
Registration error adaptation is critical in safety-critical AR systems, such as for surgical or military applications. Recent AR frameworks provide real-time registration error adaption with precise IMU tracking data and camera image fusion algorithms, which minimises the registration error. Robertson and MacIntyre [
217] describe AR visualisation techniques for augmentations that can adapt to changing registration errors. The first and traditional technique is the provision of a general visual context of augmentations in the physical world, helping users to realise the intended target of an augmentation. This is achieved by highlighting features of the parent object and showing more feature details as the registration error estimate increases. The second technique presents detailed visual relationships between augmentation and nearby objects in the physical world. A unique collection of objects near the target of the augmentation in the physical world is highlighted, and the user can differentiate between the augmentation target and similar parts of the physical world.
AdaCP: Major label placement solutions include greedy algorithms, cluster-based methods, and screen subdivision methods [
25]. Other methods include making the links between objects and their annotations more intuitive, alleviating the depth ambiguity problem, and maintaining depth separation [
161]. Labels must be drawn with respect to each visible part of the object. Otherwise, the results are confusing, ambiguous, or even incorrect. By computing axially aligned approximations of the object projections, the visibility is then determined with simple depth ordering algorithms [
31]. Several research works focus on providing appropriate moving label dynamics to ensure that the temporal behaviour of the moving labels facilitates legibility. From these works, certain requirements arise, such as the ability to determine visible objects, the parameterisation of free and open spaces in the view plane to determine where and how content should be placed, and labels should be animated in real-time because the drawing characteristics should be updated on a per-frame basis [
112]. The aforementioned requirements are the core of content placement that adapts to various physical environments for enhanced user experiences.
View management algorithms address label placement [
25]. Wither et al. [
276,
277] provide an in-depth taxonomy of annotations, especially regarding the location and permanence of annotations. Tatzgern et al. [
249] propose a cluster hierarchy-based view management system. Labels are clustered to create a hierarchical representation of the data, which is visualised based on the user’s 3D viewpoint. Their label placement employs the “hedgehog labelling” technique, which places annotations in real-world space to achieve stable layouts. McNamara et al. [
162] illustrate an egocentric view-based management system that arranges and displays AR content based on user attention. Their solution uses a combination of screen position and eye-movement tracking to ensure that label placement does not become distracting.
For a comprehensive MAR system, current interfaces do not consider walking scenarios. Lages et al. [
130] explore different information layout adaptation strategies in immersive AR environments. A desirable property of adaptation-based interface techniques is developed in their study. Adaptive content management is implemented in a MAR system, where the behaviours function in a modular system can combine and match individual user behaviours to the visual outputs in AR, and a final minimal set of useful behaviours is proposed that can be easily controlled by the user in a variety of mobile and stationary tasks [
92,
109].
3.2 Collaborative UIs in Multi-user and Multi-device AR
Adaptive AR UIs can serve as ubiquitous displays of virtual objects that can be shown anywhere in our physical surroundings. That is, virtual objects can be floating in the air on any physical background, which can be reached out to or manipulated by multiple users with their egocentric views [
127]. Users engaged in their AR-mediated physical surroundings are encouraged to accomplish tasks in co-facilitated environments with shared and collaborative AR experiences among multiple users [
147]. Multiple dimensions of AR collaborative UIs are discussed throughout various applications, for instance, working [
147,
236] and playful [
62,
127] contents, local/co-located [
180] and remote [
210] users, sole AR [
147,
180] and a mixture of AR and VR [
62,
194], co-creation/co-editing by multiple users (i.e., multiple users at the front of AR scenes) [
147,
180], supported and guided instruction (i.e., multiple users connecting to one user at the front of AR scenes) [
52,
62], human-to-human interactions [
127,
147], and the interaction between human and AR bots, such as tangible robots [
186] and digital agent representatives [
17].
Multi-user collaborative environments have been a research topic in human-computer interaction, which has evolved from sedentary desktop computers [
240] to mobile devices and head-worn computers [
147,
236]. The success of such collaborative environments needs to cope with several design challenges, including (1) high awareness of others’ actions and intentions; (2) high control over the interface; (3) high availability of background information; (4) information transparency among collaborators while preventing user privacy leakage; and the impact of user features on other users [
38,
286], e.g., how users perceive or can interact with different users’ ongoing collaborative experience.
These design challenges of awareness, control, availability, transparency, and privacy serve as fundamental issues enabling multiple users to interact with others in collaborative and shared environments smoothly. When the collaborative and shared environments are deployed to AR, features such as enriched reality-based interaction and high levels of adaptability are introduced. As discussed previously, the additional design challenges extend from resolving multi-user experiences to reality-based interaction across various AR devices, including AR/VR headsets, smartwatches, smartphones, tablets, large-screen displays, and projectors. These challenges prioritise exploring the management of multiple devices and their platform restrictions, unifying the device-specific sensing and their interaction modalities, and connecting the collaborative AR environments with physical coordinate systems in shared views [
236]. The complexity of managing various devices leads to the need for an AR framework to systematically and automatically enable user collaborations in co-aligned AR UIs. It is important to note that the majority of evaluation frameworks focus on small numbers of quantitative metrics, such as completion time and error rate in an example of visual communication cues between an on-site operator and a remote-supporting expert [
125], and they neglect the multi-user responses to the physical environments.
3.3 Evaluation Metrics for AR UIs
AR interfaces were initially considered for industrial applications, and the goodness or effectiveness of such augmentations was limited to work-oriented metrics [
56]. A very early example refers to augmented information, such as working instructions for front-line labourers on assembly lines, where productivity, work quality, and work consistency are regarded as evaluation metrics [
71]. However, these work-orientated metrics are not equivalent to user experience (e.g., the easiness of handling the augmentation) and neglect the critical aspects, especially user-centric metrics. The multitudinous user-centric metrics are generally inherited from traditional UX design issues, which can be categorised into four elements: (1) user perception of information (e.g., whether the information is comprehensible, understandable, or can be easily learned), (2) manipulability (i.e., usability, operability), (3) task-oriented outcome (e.g., efficiency, effectiveness, task success), and (4) other subjective metrics (e.g., attractiveness, engagement, satisfaction in use, social presence, user control) [
19].
When augmentations are displayed on handheld devices, such as smartphones and tablets, ergonomic issues, such as ease in AR content manipulation with two-handed and one-handed operations, are further considered [
222]. Nowadays, AR is deployed in real-world scenarios, primarily acting as marketing tools, and hence business-orientated metrics are further examined, such as utility, aesthetic, enjoyment, and brand perception [
207,
257]. Additionally, multi-user collaborative environments require remote connections in AR [
291]. Lately, the
quality of experience (QoE) through computation offloading mechanisms to cloud or edge servers encounter new design challenges of AR UIs [
221,
251].
The user perception of AR information that leads to the comprehensibility of AR cues, and the learnability of AR operations in reality-based interactions has been further investigated as a problem of multi-modal cues, such as audio, video, and haptics in various enriched situations driven by AR [
77]. The intelligent selection of AR information and adaptive management of information display and multi-modal cues are crucial to users’ perception of AR environments [
47,
133]. This can be considered a fundamental issue of interface plasticity. In the mixed contents between digital and physical realities [
19], the plasticity of AR interfaces refers to the compatibility of the information to physical surroundings as well as situations of users (i.e., context-awareness) [
78,
143].
After defining evaluation metrics, AR UI practitioners (e.g., software engineers and designers) often examine the dynamic user experience in interactive environments while satisfying the metrics mentioned earlier. There are attempts to assess the AR experience by building mini-size studio interactive spaces to emulate AR environments [
124]. However, such physical setups and iterative assessments are costly and time-consuming, primarily when AR is implemented on large scales (i.e., ubiquitously in our living spaces). Moreover, the increasing number of evaluation metrics calls for systematic evaluations of AR UIs [
107]. It is, therefore, preferable to assess AR UI design metrics through systematic approaches and even automation, with prominent features of real-time monitoring of system performance, direct information collection via user-device interaction in AR, and more proactive responses to improve user perceptions of AR information [
107]. However, to our knowledge, the number of existing evaluation frameworks is minimal, and their scopes are limited to specific contexts and scenarios, such as disaster management [
175]. More generic evaluation frameworks with high selectivity of AR evaluation metrics pose research opportunities in the domain of AR interface designs [
77,
107].