GestureMark: Shortcut Input Technique using Smartwatch Touch Gestures for XR Glasses

Juyoung Lee, UVR Lab, GSCT, Korea Advanced Institute of Science and Technology, Republic of Korea, ejuyoung@kaist.ac.kr

Minju Baeck, UVR Lab, GSCT, Korea Advanced Institute of Science and Technology, Republic of Korea, minjnubaeck@kaist.ac.kr

Hui-Shyong Yeo, Huawei, China, yeo.hui.shyong@huawei.com

Thad Starner, School of Interactive Computing, Georgia Institute of Technology, United States, thadstarner@gmail.com

Woontack Woo, UVR Lab, Korea Advanced Institute of Science and Technology, Republic of Korea and KI-ITC ARRC, Republic of Korea, wwoo@kaist.ac.kr

DOI: https://doi.org/10.1145/3652920.3652941
AHs 2024: The Augmented Humans International Conference, Melbourne, VIC, Australia, April 2024

We propose GestureMark, a novel input technique for target selection on XR glasses using smartwatch touch gestures as input. As XR glasses get smaller and lighter, their usage increases rapidly, leading to a higher demand for efficient shortcuts for everyday life. We explored the uses of gesture input on smartwatch touchscreen, including simple swipe, swipe combinations, and bezel-to-bezel (B2B) gesture as an input modality. Through an experiment with 16 participants, we found that while swipe gestures were efficient for four-choice selections, B2B was superior for 16-choice inputs. Feedback mechanisms did not enhance performance but reduced perceived workload. Our findings highlight the potential of integrating smartwatches as secondary input devices for XR glasses.

CCS Concepts: • Human-centered computing → Gestural input; • Human-centered computing → Mixed / augmented reality;

Keywords: XR glasses, smartwatch input, bezel gestures, marking menu

ACM Reference Format:
Juyoung Lee, Minju Baeck, Hui-Shyong Yeo, Thad Starner, and Woontack Woo. 2024. GestureMark: Shortcut Input Technique using Smartwatch Touch Gestures for XR Glasses. In The Augmented Humans International Conference (AHs 2024), April 04--06, 2024, Melbourne, VIC, Australia. ACM, New York, NY, USA 9 Pages. https://doi.org/10.1145/3652920.3652941

Figure 1: Illustrating the user flow of GestureMark in a museum. GestureMark is activated through an initiation gesture, such as tilting the wrist upwards, showing possible targets with small sizes (1). In the input section, the user performs input using touch gestures on their smartwatch. We tested with two different gesture sets, B2B and swipe (2). Finally, the user can reach the intended content with a shortcut (3).

1 INTRODUCTION

XR glasses, equipped with a display in front of the eye, provide the convenience of use no matter the context. These advantages make them useful for many use cases, such as training purposes, remote assistance, or teleconferencing. As XR glasses become lighter, they extend potential from special cases to weaving themselves into everyday life. While XR glasses provide information in the present context, we encounter cases in which we need to perform an input. For example, we could navigate information in demand, take notes, or respond to notifications. Nevertheless, the challenge lies in facilitating fast and easy selection without disrupting the ongoing context. Many commercial products support touch gestures on the temple, but it has limited input choices and poses social acceptability issues in social contexts as their gesture is easily visible to others. Another common technique is mid-air hand gestures, which give more freedom on input choices than temple touch. However, those are not ideal for quick response, raise social acceptability issues, and have form factor limitations to support an affordable range of hand movements with multiple sensors. Carrying an additional controller could be a solution and it provides an alternative interface that does not require extensive trials or attention like navigation keys or raycasting, but carrying it around all day can be another issue. For this reason, we decide to leverage a well-known wearable device, a smartwatch, to an input device for XR glasses.

Indirect-touch pointing, a popular input technique for a laptop, can be a suitable solution for XR glasses. However, Camilleri et al. [7] reported that peripheral indirect-touch pointing devices have the appropriate size to be useful, which is about 112mm in width and 63mm in depth. Gilliot et al. [12] also mentioned the recommended guidelines for indirect touch, including providing visibility on the input device and the performance affected by the input-target ratio. This fact can be seen in that a commodity device with the touchpad on the temple could not fit the affordable size for indirect pointing, so they focused on gesture input [24]. To expand small touch input space, there was research on using gaze [39] or tapping pattern [17]. At the same time, trials were conducted to explore gestural input with hand-worn wearable devices. DRG-Keyboard [23] applied gesture typing with dual IMU rings. Ahn et al. [1] aimed to expand the input capabilities of XR glasses by integrating a touch screen from a smartwatch for text entry. DigiTap [28] investigated symbolic hand input with a wrist-worn camera to activate shortcuts for AR/VR. To improve the visibility of possible input set, there were trials with guidance on 3D space [9] which originated from marking menu input, OctoPocus [5]. However, it is still challenging to quickly select a target on a XR glasses from multiple objects without disrupting the context.

We propose ‘GestureMark,’ a marking menu-style input system for XR glasses that allows target selection with touch gestures on a smartwatch. We suggest adding gesture guidance on the interactable objects following the marking menu to enable shortcut selection. We integrated smartwatch bezel touch gestures, including four-directional swipe and bezel-to-bezel (B2B) gesture [22], which offers 16 possible inputs in one go. Previous research has validated its effectiveness in multiple cases, such as eyes-free [33] or encumbered scenarios [30]. Our method offers a quicker way to select objects through shortcut selection with gesture combinations, reducing input trials compared to navigational selection using a highlighted cursor. Additionally, it would require less attention and time as the user only needs to focus on the intended target rather than passing through all targets.

We designed two input types: B2B and swipe gestures, each with a unique icon representing its touch path. We implemented a gesture detection model using a Random Decision Forest following a previous study [30]. We conducted an experiment to examine the performance of the proposed input technique and answer research questions.

First, B2B touch gestures can be executed efficiently with visual cues, such as the four directional swipe.
Second, if the visual and vibration feedback affects the performance or user experience.
Lastly, what is the most effective way to input with the same bit-per-second: double 4-sided swipes or single B2B.

The experiment was designed as 2(Swipe/B2B)x2(with/without feedback) settings with 16 participants repeated two times across two contexts: seated and walking. We run an analysis to compare B2B with swipe gestures and evaluate the impact of feedback. The study showed that using a swipe gesture worked well for selecting from four options but not as well for selecting from 16 options with two consecutive swipes compared to B2B. Feedback did not improve performance, but participants perceived less workload while walking. As participants gained more experience with the B2B gesture, their success rate improved to an impressive 89.3% in just 1.62 seconds. However, it should be noted that swipe gestures only displayed a learning effect in completion time and not in successive attempts. According to our research, B2B gestures can be effectively used as input for the marking menus on XR glasses.

Overall, our main contributions are 1) the design and implementation of input techniques using smartwatch-glasses combinational uses and 2) empirical results demonstrating the efficacy of the proposed input technique with different settings.

2 RELATED WORK

2.1 Input with wearable device

Given the possible mobility of the smartglasses, there were still limitations on performing input of various options or subtle input to it. For this reason, the input methods rely on wearable devices worn on the finger, hand, or arm became popular. Hsieh et al. investigate the hand gestural interaction technique for smartglasses in public space and propose gesture with a haptic glove [16]. Extending the use of hand gestures, there was a series of research using different sensors. Opisthenar suggested embedded wrist camera can be used [38] and Back-Hand-Pose improve it with a deformation network [34]. EtherPose enabled hand pose tracking with two wrist-worn antennas [19]. More focused on the input than tracking, DigiTap suggested symbolic input for AR/VR [28]. ARPads investigate a design space for mid-air interact input for augmented reality and show that indirect input can achieve less fatigue than direct input [6]. These hand gesture-based interaction methods show the natural way to interact but still have a limitation on form factor to use all days, speed, or reliability. Therefore, BiTipText introduced a bimanual text input method by touching different regions of the index finger with thumb [35]. ‘M[eye]cro’ combines eye-gaze and thumb-to-finger gesture to select on-screen objects [32]. Then, DRG-Keyboard enables fingertip typing with dual IMU rings [23]. Extending this trend, there were also trials to control devices with one ring [4] and exploring input with rings [15]. However, interactions based on rings have limitations as they are relatively new in the market, similar to smart glasses. Therefore, using a smartwatch is a popular solution. The touch input on smartwatches can expand the input space of smart glasses [1], or the smartwatch's sensor can be used as an input modality for gesture interaction [31]. However, these attempts were limited to typing or mid-air gesture input, which cannot match the quick, easy, and tactile input as a reaction to the screen information.

2.2 Touch gesture on smartwatch

In order to expand the limited input space on smartwatches, touch gestures have been suggested in various ways. One common approach is to use the space around the bezel, as it is relatively easy to distinguish its position. Ashbrook et al. developed a mathematical model to determine the error rate when using a circular touchscreen [3]. BezelGlide investigated touch gestures by gliding the bezel to interact with the smartwatch without obscuring the finger [25]. Additionally, Watchit uses touch gestures on the watch strap to overcome finger obstruction [27]. Bezel-to-bezel (B2B)-Swipe introduced 16 different gestures that can be initiated from one side and end at another [22]. They also showed that it can work on even in eyes-free conditions. Wong et al. explored the use of bezel-initiated swipe gestures on rounded smartwatches [33]. Rey et al. demonstrated that B2B gestures are effective with 4 segments, even in walking or encumbered scenarios [30]. Side-Crossing Menus (SCM) introduces touch gestures activated by crossing a 3x3 divided grid of the smartwatch touch screen [11]. They also used tactile feedback around the bezel to aid in eyes-free condition, but this requires additional modifications to the smartwatch. These studies found that tactile and visual feedback can make performing touch gestures on smartwatches easier, and B2B gestures are robust in most cases. Therefore, using bezel touch gestures would be a good choice for quick and easy input for XR glasses, and it could possibly improve by the feedback.

2.3 Gesture input with Visual Cue

Similar to the hotkeys with a keyboard to select displayed items on a desktop environment, there were similar approaches to gestural input. MarkPad used laptop touchpad to create size-dependent gestural shortcuts [10]. Escape enables target selection on highly dense situations by doing swipe gestures according to displayed cue [37]. DirectionQ used a similar technique for XR glasses with mid-air selection [18]. Extending to freehand gestures, Ren and O'Neil proposed an improved design for 3D marking menus based on their studies [29]. ViewfinderVR enables small or distant objects with a virtual viewfinder and finalizes with the gesture [21]. Yan et al. proposed target acquisition using head gestures in virtual and augmented reality [36]. TwinkleTwinkle suggests interacting with smart devices by blinking in Morse code, which is well known [8]. When we extend to the menu selection using gaze with custom interfaces, StickyPie uses a gaze-based marking menu [2]. Lattice Menu investigates a gaze-based marking menu utilizing assistance[20]. SCM demonstrated the ability to quickly select shortcuts and cross sides using a smartwatch, allowing for remote interaction with complex environments such as virtual reality [11]. OctoPocus [5] used a dynamic guide to help gesture input, and Fennedy et al. extended it to VR [9]. Through these experiments, it was discovered that using gestures as an indirect input showed great potential in various forms. However, further investigation is required when working with visual interfaces separate from the gesture detection itself.

Figure 2: Visual cues for the Bezel to bezel and swipe twice gesture with 16 choices. The blue circle and red square indicate the start and end points respectively. Arrows on the dotted line indicate touch gesture direction.

3 GestureMark

We first look at the commodity XR interfaces to design MarkingXR. Microsoft Mixed Reality has nine central and six auxiliary options, and the Meta Quest headsets show 16 main buttons and many more small optional buttons on the side. For this reason, we intended to support more than 12 targets at once. Next, we chose the smartwatch gesture-based approach, prioritizing low false-positive rates and discreet user interaction. For the touch gesture, we use four directional swipes and B2B. Swipe is the most common gesture in commercial devices and B2B gestures, which were studied by several researchers and have showed its potential. The concept of bezel-based gestures was first introduced in B2B-Swipe [22], which included double-crossing and single-crossing swipes. Wong et al. further explored the potential of these gestures on round smartwatches [33], and Rey et al. extended the study to mobile and encumbered scenarios [30]. Multiple settings were presented for B2B gestures, including 4, 6, or 8 segments. However, we decided to use 4 segments. This decision was based on the findings of Rey et al. [30], who reported that using 4 segments resulted in over 90% accuracy with machine learning techniques.

With the four-segment design, we were able to create 16 possible bezel-to-bezel swipes (B2B), with a single input by crossing in the direction of out to inside of the touch screen and crossing inside to out. The first and second crossing could all occur in four possible segments, resulting in 16 possible combinations. However, the swipe (SwipeX2), which starts the gesture from inside the smartwatch touch screen and crosses the bezel outward, only has four possible choices. Therefore, we stacked the Swipe, asked to do it twice consecutively (SwipeX2), and we could get the same 16 possible choices as with the B2B. In simpler terms, the B2B gesture involves twice bezel crossing in one sequence, while the swipe gesture requires input in two separate sequences. We decided to build and test both SwipeX2 and B2B because the research [40] suggests that repeating a simple gesture may be more effective. In addition, if SwipeX2 worked better, we could conclude the potential of maximizing selection choices of GestureMark by stacking more gestures.

Feedback. According to the Side-Crossing Menus [11], tactile feedback is useful for users performing bezel gestures, and Norman mentioned that giving feedback on their performance is important for gesture interaction [26]. For these reasons, we decided to implement both tactile and visual feedback. Our goal was to use common smartwatches and XR glasses, and since most smartwatches have a vibration motor, we chose to substitute tactile feedback with vibration feedback. As opposed to smartwatches, XR glasses’ displays are always visible to the user and the visual feedback on XR glasses should not need extra movements compared to smartwatches. We have enhanced the XR glasses by providing visual feedback through touch points. These touch points are located at the bottom left of the display, enabling users to view their input in their peripheral vision. The touch points are indicated by small dots, and we have distinguished both the beginning and ending points with a red circle and blue square, respectively. Furthermore, the feedback lasts for a few seconds, giving users the opportunity to review their input when the system detects a different input or fails with their trial. These are depicted in Figure 3.

Figure 3: The device shows these layouts on the participants’ viewpoint during the experiment. The device will not render the black area, and the real world will be visible. (a) In the condition with feedback, visual feedback will be displayed on the bottom left of the screen. (b) Two directions will be simultaneously shown for swipe gestures with separate success notices colored green.

Visual Cue. In order to make it easier to understand the B2B and Swipe gestures, we have created a visual representation for each gesture. We found that it can be difficult to interpret the gesture from text alone. We have used the same format for the visual feedback to ensure consistency. We have placed a gray circle contour that is divided into four segments to indicate that the user is performing a gesture on the smartwatch. We have also included a blue circle to mark the start point and a red square to mark the endpoint. To make the direction of the gesture clear, we have added arrowheads to trace the points. We positioned two visual cues side by side for SwipeX2; the first target on the left and the second on the right. Figure 2 shows all possible combinations of the gestures.

System. Our system is designed to work with an Android smartphone and a smartwatch. These two devices are connected through WiFi using the smartphone's hotspot feature. The smartphone streams packets of touch coordinates, as well as touch events that are obtained using the Android API. The classifier then runs on the phone, using the touch points. We used the same features as in previous research [30], but added first and end points’ distance from the center position to help distinguish swipe gestures from B2B gestures that do not change direction, such as starting from the top and ending at the bottom. We used a ported version¹ of Weka [13] to run the machine learning model on the phone. We selected the Random Forest model, which showed the best performance in our trials and previous research [30]. To prepare the model for the experiment, we collected ground truth data from four people who did not participate in the following experiment. Each of them performed five trials of each gesture while sitting. The results showed that the accuracy rate was 98.18% (SD = 2.18) when validating with the leave-one user-out method for 20 different gestures. including 16 B2B gestures and 4 swipe gestures. To connect XR glasses to the phone, we used a cable and the external display handling of Android API to control the layout of XR glasses display from the phone.

4 EXPERIMENT

We conducted an experiment to investigate possible design choices of GestureMark. First, B2B and swipe gestures could easily guide users to perform gestures with visual cues on XR glasses. Second, the visual feedback impacts the performance or user experience. Lastly, which one will be a better choice for target selection with 16 choices, B2B or SwipeX2.

To showcase the mobility of XR glasses, we conducted the experiment in both a seated and walking context. We then examined the impact of visual and vibration feedback on the gestures by conducting trials with and without feedback. In summary, we investigated three factors: gestures, context, and feedback. The experiment was divided into sessions based on context and feedback, which were counterbalanced among participants. Within each session, all gestures were presented in a random order. 16 adults (6 females, 10 males), aged between 19 and 31 (M = 24.8, SD = 3.5), volunteered for the experiment. Those participants volunteered through the local university's web board. Eight were daily smartwatch users, and all wore a watch on their left hand except for one participant. The experiment lasted less than an hour, and they were compensated approximately 10 USD for their time and effort. During the experiment, we measured the accuracy and amount of time required to complete the gesture. Participants completed a NASA Task Load Index (TLX) [14] after each session and a survey on their impressions of the interfaces after all sessions ended. The experiment received approval from the local Institutional Review Board (IRB).

4.1 Apparatus

We used a Pixel watch with a 1.2" display with 384x384 pixels during the experiment. For our XR glasses, we opted for the Epson Moverio BT-45C, which supports Android connectivity and features a binocular see-through Full HD display with a 34° field-of-view. We connected the XR glasses to an Android smartphone, LG V35 ThinQ. To ensure ease of use for our participants, we prepared a small cross bag to place the smartphone in. The setting with all devices equipped is depicted in Figure 4 (a).

Figure 4: Experiment settings. (a) Participants in the experiment wore XR glasses (A) and a smartwatch (B), and placed their phone (C) in a cross bag with a wired connection to the XR glasses. (b) The walking path was set up with a combination of straight and curved lines, and participants were asked to walk back and forth while turning around the cones placed along the path.

4.2 Task and procedure

Prior to beginning the main four sessions, participants had the opportunity to practice using gesture input with the same display layout and procedures as the experiment. We encouraged participants to train themselves until they felt comfortable with the gestures, which typically took five to ten minutes.

The experiment followed a within-subject method, with participants performing the task while seated or walking and with or without feedback. Before each trial, a countdown was displayed for three seconds to ensure participants were ready and to avoid time differences caused by distributed attention. Participants were asked to complete a targeted gesture as quickly as possible during the trial. To avoid the effect of the searching task, we placed only one visual cue in the middle of the display for all trials. After each trial, success or failure was displayed on the smart glasses for two seconds, allowing participants to review their performance. Throughout the experiment, participants were asked to avoid looking at their smartwatch to maintain an eyes-free condition. Each gesture was shown twice for each session, and each participant had to complete 256 trials through the experiment, which concluded in a total of 4096 trials (32 gestures x 2 times x 4 sessions x 16 participants = 4096 trials).

During the seated condition, participants sat on an office chair with armrests. They were instructed to rest their arms after each trial to move both their fingers and arms on every trial. In the walking condition, participants were directed to follow a path approximately 40 meters long. The path consisted of 20 meters of straight line and 20 meters of curved path as depicted in Figure 4. Cones were placed at both ends to mark turns and to continue walking at the endpoint.

Figure 5: Experiment result which is divided left and right by the different context and within it, different color shows each gesture's result. The line and stars indicate significant differences from post-hoc analysis. It is only drawn for conditions where gesture type or feedback varies and others remain the same for better readability. (a)The bar chart shows the average and standard deviation of success rate on each condition. (b) The box chart shows the completion time of three gestures on different conditions.

4.3 Result

We recorded data on two gestures during the experiment: B2B and SwipeX2. Additionally, we extracted the first swipe input into its own category, ran analyses on it, and labeled it as Swipe. In analysis, we first ran a multi-way ANOVA across different participants. Then, we utilized a permutation test with Benjamini-Hochberg correction for multiple hypothesis testing in the post hoc analysis. We used the Greenhouse-Geisser adjustment for violation of sphericity assumption since all measurements met the normality assumption. Detailed settings for the analysis will be described in each section.

4.3.1 Success rate. For the success rate, we first run three-way ANOVA for context, feedback, and gesture type. It shows a significant effect on the gesture(F(2, 30) = 19.82, p < .001) and interaction effect for context with gesture (F(2, 30) = 10.64, p < .001). However, we could not find a significant effect on the context(F(1, 15) = 4.22, p = .053), feedback(F(1, 15) = 0.81, p = .383), or interaction effects. We applied a paired permutation test on 12 possible combinations for post hoc analysis. Figure 5.(a) shows the average and standard deviation values. First, for the single condition difference, as Swipe is extracted from SwipeX2, it shows better rates for all 4 conditions with significancies. Additionally, for the gesture type difference, B2B shows a better rate than both Swipe and SwipeX2 on walking context with feedback (Swipe:p_adj = .023 / SwipeX2:p_adj < .001), but only SwipeX2 shows significant difference on walking without feedback (p_adj < .001) and seated without feedback(p_adj = .024). Then, for the context differences, both Swipe and SwipeX2 show better rate on seated than walking condition with feedback (Swipe:p_adj = .017 / SwipeX2:p_adj < .009), but only SwipeX2 shows difference without feedback (p_adj = .006). However, we did not find any significant differences in the B2B rate between seated and walking. Additionally, feedback did not show any significant difference in the success rate as a single different factor. Next, for the differences in context and gesture, Swipe shows a better rate than SwipeX2. In addition, B2B while walking shows better rates than SwipeX2 on seated (With feedback: p_adj = .0.13/No: p_adj = .017) and of course seated B2B significantly better than walking SwipeX2 (With feedback: p_adj = .006/No: p_adj < .001).

Figure 6: The graph shows how participants’ performance with different gestures improves over time. A different color represents each gesture, and the numbers inside the figure indicate the mean and standard deviation of the colored gesture for each session block. (a) The points on the graph indicate correctness, while the line was determined using logistic regression. (b) Each point represents the time it took to complete each gesture, and the line was determined using a logarithmic function.

4.3.2 Completion time. The completion time was measured from the moment the target appeared on the smart glasses to the moment the user finished touching the smartwatch and the phone detected the input. Only successful trials were selected for the analysis. Following the process for the success rate, run three-way ANOVA for context, feedback, and gesture type. The gesture difference only reflects the significance (F(1.9, 28.5) = 247.60, p < .001). We applied a paired permutation test on 12 possible combinations for post hoc analysis. Figure 5.(b) shows the average and standard deviation values. All possible gesture combinations for four conditions showed highly significant differences (All: p_adj < .001) with showing Swipe requires the least amount of time to complete input followed by B2B and SwipeX2 was the slowest one. However, we could not find any significant difference in feedback or context without variation in gesture type.

We also counted detailed timing by dividing completion time into reaction and interaction times. The reaction time was counted from the target appeared to first touch detected and the interaction time was counted from touch start to end. On average, B2B required 1.20 second (SD: 0.30) until start the touch and Swipe required 1.13 seconds (SD: 0.27). The average interaction time of B2B was 0.36 seconds (SD: 0.13) and Swipe was 0.20 seconds (SD: 0.07). For last, the average gap between touch input from the first swipe and the second on SwipeX2 was 0.15 seconds (SD: 0.17).

Figure 7: Perceived workload was collected after four different session conditions. The left figure shows responses on NASA-TLX subscales. The right figure displays the delta of RAW-TLX values with significant effects represented by lines and asterisks.

4.3.3 Perceived Workload. For the perceived workload, we collected NASA-TLX after each session. For this reason, different from the previous factors, the workload was only collected within four conditions by context and feedback. We used RAW-TLX value, which does not include weighting but uses the sum or average of subscales for the analysis [14]. First, we ran a two-way repeated measure ANOVA for context and feedback. We could only find significant differences for the context(F(1, 15) = 6.01, p = .027) but not in feedback or interaction effects. For the post-hoc comparison, we run a pairwise permutation test. We could find significant different between the walking without feedback condition(M = 35.31, SD = 20.96) with other three conditions: seated without feedback (M = 24.69, SD = 14.77 / p_adj = .007), seated with feedback (M = 24.53, SD = 16.29 / p_adj = .003), and walking with feedback (M = 30.94, SD = 19.40/p_adj = .060). In addition, we could also find significant differences between seated and walking both with feedback (p_adj = .045).

4.3.4 Learning curve. During the experiment, gesture input is often affected by the learning effect. Therefore, we reorganized our data on success rates and completion times into the order of trials per participant for each gesture to investigate it. Figure 6 shows the mean and standard deviation for each condition. We run a two-way repeated measure ANOVA for the order of session and gesture types on both rate and time. We could find significant effects on the gesture type with both success rate(F(2, 30) = 19.8, p < .001) and completion time (F(2, 30) = 247.6, p < .001). In aspect of the performing order, we could only find a significant effect for the competition time(F(3, 45) = 6.8, p = .002) but not in success rate. Then, we run the post hoc analysis to look in detail at the effect of the performing order. Following the other result, we used a pairwise permutation test with Benjamini-Hochberg correction. We could find that the order has significant effects on the B2B gesture. For the B2B, compared to the first session, the second (p_adj = .048), third (p_adj = .021), and last (p_adj = .049) session shows a better rate of succession. In aspect of the completion time, third session and the last session was significantly smaller than the first session on all three gestures: Swipe (3rd:p_adj = .023 / 4th:p_adj = .044), SwipeX2 (3rd:p_adj = .031 / 4th:p_adj = .034), and B2B (3rd:p_adj = .040 / 4th:p_adj = .043).

5 DISCUSSION

The result illustrates that both B2B and Swipe outperformed SwipeX2 in success rate and completion time. At the last session, B2B achieved up to 89% success rate while spending 1.62 seconds on average, and Swipe showed 87.7% with 1.36 seconds. However, SwipeX2 only showed 75.6% with 1.81 seconds. In the following paragraph, we will discuss the result with design suggestions in each aspect and the possible applications of GestureMark.

Feedback. Based on the result, we found that the feedback did not significantly improve success rate or completion time. Nevertheless, participants reported that it reduces the workload when walking. In post-interviews, they also noted that they do not think the feedback enhances their performance after they become familiar with the gestures. Still, it is useful when they lack confidence in their input while walking. In addition, they reported that feedback was useful for reviewing failed attempts and improving their performance. These findings suggest that a conditional feedback system could be designed to activate when the user is moving dynamically or after a failed attempt.

Input gestures: B2B or Swipe. To extend GestureMark, it is important to expand the possible selectable targets. There could be two possible ways: using more complicated gestures or simple gestures multiple times. Zhao and Balakrishnan reported that repeating simple input for the hierarchical marking menu shows better results than "zig-zag" compound marks [40]. However, as XR glasses is used in more various conditions even when the user cannot totally concentrate on performing input, we decided to compare B2B to SwipeX2. Through the experiment, we could find that B2B outperformed SwipeX2 in both accuracy and time. We could guess some advantages of B2B: it requires longer interaction time than Swipe. It concludes with more variance on the input by changing the angle. For this reason, we could possibly imagine that partial errors, such as fault start or ending bezel angle, can be supplemented by other parts. However, Swipe is too short to expect this effect. This can be shown by the success rate of Swipe getting worse while walking, but B2B did not change. Additionally, we found that SwipeX2 had success rates comparable to the squared values of Swipe. This suggests that the two trials in SwipeX2 act as independent trials, and repeating does not help the second trial.

Summary. Our experiment concludes that performing input according to the visual cues with touch gestures on a smartwatch can be an effective way of input. However, we observed that performing multiple gestures in a row can have a negative impact on accuracy as SwipeX2 shows worse results. Nevertheless, since scenarios with more than 16 targets are not typical, we can still consider this approach applicable. Additionally, we can combine both Swipe and B2B gestures to increase the number of choices to 20. Since B2B gestures show more stable performance, we can suggest it be used in main targets. As most of XR glasses UI has additional option buttons to access network or settings, we want to suggest using Swipe on those. Those functional shortcuts need quick access and are usually more independent of the user's context. Furthermore, if more targets are displayed than the maximum input choices, we can limit target selection to the center view and change it by moving the headings using multimodal input, such as gaze or head movement.

6 CONCLUSION

We investigated the use of touch gestures on smartwatches as input with visual cues on XR glasses. We conducted a comparison experiment with two gestures (B2B and Swipe) under four different conditions. From the experiment, we found that the user does not have difficulties performing B2B from the cue, and it achieved up to 89.3% of success rate by spending 1.62 seconds. However, when they performed the swipe gesture twice, the second one caused a higher error rate. Additionally, we found that feedback reduced users’ workload while walking, and participants responded positively to it for the effects of reviewing their failure trials. These findings could be a starting point for visual cue-based input with bezel gestures for XR glasses. We hope this input technique can be a complementary option in addition to direct manipulation, such as hand or gaze interaction for XR glasses.

ACKNOWLEDGMENTS

This work was partly supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01270, WISE AR UI/UX Platform Development for Smartglasses) and Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0012746, The Competency Development Program for Industry Specialist)

REFERENCES

Sunggeun Ahn, Seongkook Heo, and Geehyuk Lee. 2017. Typing on a Smartwatch for Smart Glasses. In Proceedings of the 2017 ACM International Conference on Interactive Surfaces and Spaces (Brighton, United Kingdom) (ISS ’17). Association for Computing Machinery, New York, NY, USA, 201–209. https://doi.org/10.1145/3132272.3134136
Sunggeun Ahn, Stephanie Santosa, Mark Parent, Daniel Wigdor, Tovi Grossman, and Marcello Giordano. 2021. StickyPie: A Gaze-Based, Scale-Invariant Marking Menu Optimized for AR/VR. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 739, 16 pages. https://doi.org/10.1145/3411764.3445297
Daniel Ashbrook, Kent Lyons, and Thad Starner. 2008. An Investigation into Round Touchscreen Wristwatch Interaction. In Proceedings of the 10th International Conference on Human Computer Interaction with Mobile Devices and Services (Amsterdam, The Netherlands) (MobileHCI ’08). Association for Computing Machinery, New York, NY, USA, 311–314. https://doi.org/10.1145/1409240.1409276
Sandra Bardot, Bradley Rey, Lucas Audette, Kevin Fan, Da-Yuan Huang, Jun Li, Wei Li, and Pourang Irani. 2022. One Ring to Rule Them All: An Empirical Understanding of Day-to-Day Smartring Usage Through In-Situ Diary Study. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–20. https://doi.org/10.1145/3550315
Olivier Bau and Wendy E. Mackay. 2008. OctoPocus: a dynamic guide for learning gesture-based command sets. In Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology (Monterey, CA, USA) (UIST ’08). Association for Computing Machinery, New York, NY, USA, 37–46. https://doi.org/10.1145/1449715.1449724
Eugenie Brasier, Olivier Chapuis, Nicolas Ferey, Jeanne Vezien, and Caroline Appert. 2020. ARPads: Mid-air Indirect Input for Augmented Reality. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (Porto de Galinhas, Brazil). IEEE, 332–343. https://doi.org/10.1109/ISMAR50242.2020.00060
M. Camilleri, B. Chu, A. Ramesh, D. Odell, and D. Rempel. 2012. Indirect Touch Pointing with Desktop Computing: Effects of Trackpad Size and Input mapping on Performance, Posture, Discomfort, and Preference. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 56, 1 (2012), 1114–1118. https://doi.org/10.1177/1071181312561242
Haiming Cheng, Wei Lou, Yanni Yang, Yi-pu Chen, and Xinyu Zhang. 2023. TwinkleTwinkle. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 2 (2023), 1–30. https://doi.org/10.1145/3596238
Katherine Fennedy, Jeremy Hartmann, Quentin Roy, Simon Tangi Perrault, and Daniel Vogel. 2021. OctoPocus in VR: Using a Dynamic Guide for 3D Mid-Air Gestures in Virtual Reality. IEEE Transactions on Visualization and Computer Graphics 27, 12 (2021), 4425–4438. https://doi.org/10.1109/tvcg.2021.3101854
Bruno Fruchard, Eric Lecolinet, and Olivier Chapuis. 2017. MarkPad: Augmenting Touchpads for Command Selection. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 5630–5642. https://doi.org/10.1145/3025453.3025486
Bruno Fruchard, Eric Lecolinet, and Olivier Chapuis. 2020. Side-Crossing Menus: Enabling Large Sets of Gestures for Small Surfaces. Proc. ACM Hum.-Comput. Interact. 4, ISS, Article 189 (nov 2020), 19 pages. https://doi.org/10.1145/3427317
Jérémie Gilliot, Géry Casiez, and Nicolas Roussel. 2014. Impact of form factors and input conditions on absolute indirect-touch pointing tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 723–732. https://doi.org/10.1145/2556288.2556997
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (nov 2009), 10–18. https://doi.org/10.1145/1656274.1656278
Sandra G. Hart. 2006. Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50, 9 (2006), 904–908. https://doi.org/10.1177/154193120605000909 arXiv:https://doi.org/10.1177/154193120605000909
Anuradha Herath, Bradley Rey, Sandra Bardot, Sawyer Rempel, Lucas Audette, Huizhe Zheng, Jun Li, Kevin Fan, Da-Yuan Huang, Wei Li, and Pourang Irani. 2022. Expanding Touch Interaction Capabilities for Smart-rings: An Exploration of Continual Slide and Microroll Gestures. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 292, 7 pages. https://doi.org/10.1145/3491101.3519714
Yi-Ta Hsieh, Antti Jylhä, Valeria Orso, Luciano Gamberini, and Giulio Jacucci. 2016. Designing a Willing-to-Use-in-Public Hand Gestural Interaction Technique for Smart Glasses. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 4203–4215. https://doi.org/10.1145/2858036.2858436
MD. Rasel Islam, Doyoung Lee, Liza Suraiya Jahan, and Ian Oakley. 2018. GlassPass: Tapping Gestures to Unlock Smart Glasses. In Proceedings of the 9th Augmented Human International Conference (Seoul, Republic of Korea) (AH ’18). Association for Computing Machinery, New York, NY, USA, Article 16, 8 pages. https://doi.org/10.1145/3174910.3174936
Seoyoung Kang, Emmanuel Ian Libao, Juyoung Lee, and Woontack Woo. 2022. DirectionQ: Continuous Mid-air Hand Input for Selecting Multiple Targets through Directional Visual Cues. In 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 757–762. https://doi.org/10.1109/ISMAR-Adjunct57072.2022.00160
Daehwa Kim and Chris Harrison. 2022. EtherPose: Continuous Hand Pose Tracking with Wrist-Worn Antenna Impedance Characteristic Sensing. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 58, 12 pages. https://doi.org/10.1145/3526113.3545665
Taejun Kim, Auejin Ham, Sunggeun Ahn, and Geehyuk Lee. 2022. Lattice Menu: A Low-Error Gaze-Based Marking Menu Utilizing Target-Assisted Gaze Gestures on a Lattice of Visual Anchors. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (<conf-loc>, <city>New Orleans</city>, <state>LA</state>, <country>USA</country>, </conf-loc>) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 277, 12 pages. https://doi.org/10.1145/3491102.3501977
Woojoo Kim and Shuping Xiong. 2022. ViewfinderVR: configurable viewfinder for selection of distant objects in VR. Virtual Reality 26, 4 (2022), 1573–1592. https://doi.org/10.1007/s10055-022-00649-z arXiv:2110.02514
Yuki Kubo, Buntarou Shizuki, and Jiro Tanaka. 2016. B2B-Swipe: Swipe Gesture for Rectangular Smartwatches from a Bezel to a Bezel. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 3852–3856. https://doi.org/10.1145/2858036.2858216
Chen Liang, Chi Hsia, Chun Yu, Yukang Yan, Yuntao Wang, and Yuanchun Shi. 2023. DRG-Keyboard: Enabling Subtle Gesture Typing on the Fingertip with Dual IMU Rings. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 4, Article 170 (jan 2023), 30 pages. https://doi.org/10.1145/3569463
Google LLC. 2021. Inputs and sensors | Glass Enterprise Edition 2 Google for Developers — developers.google.com. https://developers.google.com/glass-enterprise/guides/inputs-sensors. [Accessed 08-01-2024].
Ali Neshati, Bradley Rey, Ahmed Shariff Mohommed Faleel, Sandra Bardot, Celine Latulipe, and Pourang Irani. 2021. BezelGlide: Interacting with Graphs on Smartwatches with Minimal Screen Occlusion. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 501, 13 pages. https://doi.org/10.1145/3411764.3445201
Donald A. Norman. 2010. Natural User Interfaces Are Not Natural. Interactions 17, 3 (may 2010), 6–10. https://doi.org/10.1145/1744161.1744163
Simon T. Perrault, Eric Lecolinet, James Eagan, and Yves Guiard. 2013. Watchit: Simple Gestures and Eyes-Free Interaction for Wristwatches and Bracelets. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 1451–1460. https://doi.org/10.1145/2470654.2466192
Manuel Prätorius, Dimitar Valkov, Ulrich Burgbacher, and Klaus Hinrichs. 2014. DigiTap: an eyes-free VR/AR symbolic input device. In Proceedings of the 20th ACM Symposium on Virtual Reality Software and Technology (Edinburgh, Scotland) (VRST ’14). Association for Computing Machinery, New York, NY, USA, 9–18. https://doi.org/10.1145/2671015.2671029
Gang Ren and Eamonn O'Neill. 2012. 3D Marking menu selection with freehand gestures. In 2012 IEEE Symposium on 3D User Interfaces (3DUI). IEEE, 61–68. https://doi.org/10.1109/3DUI.2012.6184185
Bradley Rey, Kening Zhu, Simon Tangi Perrault, Sandra Bardot, Ali Neshati, and Pourang Irani. 2022. Understanding and Adapting Bezel-to-Bezel Interactions for Circular Smartwatches in Mobile and Encumbered Scenarios. Proceedings of the ACM on Human-Computer Interaction 6, MHCI (2022), 1–28. https://doi.org/10.1145/3546736
David Verweij, Augusto Esteves, Saskia Bakker, and Vassilis-Javed Khan. 2019. Designing Motion Matching for Real-World Applications: Lessons from Realistic Deployments. In Proceedings of the Thirteenth International Conference on Tangible, Embedded, and Embodied Interaction (Tempe, Arizona, USA) (TEI ’19). Association for Computing Machinery, New York, NY, USA, 645–656. https://doi.org/10.1145/3294109.3295628
Jérémy Wambecke, Alix Goguey, Laurence Nigay, Lauren Dargent, Daniel Hauret, Stéphanie Lafon, and Jean-Samuel Louis de Visme. 2021. M[eye]cro. Proceedings of the ACM on Human-Computer Interaction 5, EICS (2021), 1–22. https://doi.org/10.1145/3461732
Pui Chung Wong, Kening Zhu, Xing-Dong Yang, and Hongbo Fu. 2020. Exploring Eyes-Free Bezel-Initiated Swipe on Round Smartwatches. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3313831.3376393
Erwin Wu, Ye Yuan, Hui-Shyong Yeo, Aaron Quigley, Hideki Koike, and Kris M. Kitani. 2020. Back-Hand-Pose: 3D Hand Pose Estimation for a Wrist-worn Camera via Dorsum Deformation Network. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 1147–1160. https://doi.org/10.1145/3379337.3415897
Zheer Xu, Weihao Chen, Dongyang Zhao, Jiehui Luo, Te-Yen Wu, Jun Gong, Sicheng Yin, Jialun Zhai, and Xing-Dong Yang. 2020. BiTipText: Bimanual Eyes-Free Text Entry on a Fingertip Keyboard. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376306
Yukang YAN, Xin YI, Chun YU, and Yuanchun SHI. 2019. Gesture-based target acquisition in virtual and augmented reality. Virtual Reality & Intelligent Hardware 1, 3 (2019), 276–289. https://doi.org/10.3724/SP.J.2096-5796.2019.0007
Koji Yatani, Kurt Partridge, Marshall Bern, and Mark W. Newman. 2008. Escape: a target selection technique using visually-cued gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 285–294. https://doi.org/10.1145/1357054.1357104
Hui-Shyong Yeo, Erwin Wu, Juyoung Lee, Aaron Quigley, and Hideki Koike. 2019. Opisthenar: Hand Poses and Finger Tapping Recognition by Observing Back of Hand Using Embedded Wrist Camera. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 963–971. https://doi.org/10.1145/3332165.3347867
Maozheng Zhao, Alec M Pierce, Ran Tan, Ting Zhang, Tianyi Wang, Tanya R. Jonker, Hrvoje Benko, and Aakar Gupta. 2023. Gaze Speedup: Eye Gaze Assisted Gesture Typing in Virtual Reality. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 595–606. https://doi.org/10.1145/3581641.3584072
Shengdong Zhao and Ravin Balakrishnan. 2004. Simple vs. compound mark hierarchical marking menus. In Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (Santa Fe, NM, USA) (UIST ’04). Association for Computing Machinery, New York, NY, USA, 33–42. https://doi.org/10.1145/1029632.1029639

FOOTNOTE

¹ https://github.com/nneonneo/weka-android

CC-BY license image
This work is licensed under a Creative Commons Attribution International 4.0 License.

AHs 2024, April 04–06, 2024, Melbourne, VIC, Australia