Keywords

1 Introduction

The aging society presents new challenges for future generations. Increasing life expectancy leads to a longer period where elderly depend on others to support them, for example due to frailty, declining muscular strength and control of upper extremities, and neurological diseases. In the long term this could lead to increased health-care cost and reduced quality of life. A lot of focus is thus directed at maintaining the self-reliance and health of elderly. A major factor affecting one’s self-reliance is the ability to accurately control the upper extremities, especially the fingers, which is negatively affected by age-related changes in the nervous system [5]. These changes can partially be counteracted and one’s ability to control the fingers can be improved through coordination [10] and strength training [14].

A commonly used device for strength training is the Pressing Evaluation Testing System (PETS). During the training, participants are required to apply a given amount of force with each finger of their hand, as indicated on a monitor in front of the user. Therefore, while users focus on the monitor their hand is outside their field-of-view. This could lead to divided attention, resulting in decreased performance and training efficiency [9]. While it is possible to present the information closer to the hand by placing a small monitor close to it, the information would still be decoupled from the fingers. We use a Video See-Through Head-Mounted Display (VST-HMD) to give the user the impression that the training guidance is located next to his finger. This combination of computer graphics (CG) and the real world is referred to as Augmented Reality (AR).

AR has been successfully used in a variety of systems designed to assist elderly in doing everyday tasks [7], remote assistance [2], or for rehabilitation [4, 11]. Our goal is to investigate if it can be combined with PETS into an AR-PETS that has a smaller mental demand and improves the training efficiency, compared to standard PETS training,

As a first step in the development of an AR-PETS we investigate if presenting guidance spatially aligned with the user’s fingers instead of a monitor improves the training efficiency. In this study we replicate common tasks encountered during PETS training with a haptic device. Our system provides users with a realistic sensation and can measure the amount of force the user exerts onto it. We use our AR-PETS to measure how well participants can learn to exert a given amount of force and how well they can control the amount of force they apply to the device. Furthermore, we evaluate if localized visualization helps coordinate finger movement and reduces involuntary finger movement.

The main contribution of our paper is a user study with 18 young healthy adults that investigates whether training guidance presented on a VST-HMD leads to better results, or is preferred to guidance presented on a monitor. Although training with our AR-PETS did not improve the user’s performance, several participants felt that guidance presented with the AR-PETS resulted in lower mental workload and was more intuitive. Our findings also suggest that designing training conditions with the AR medium in mind could result in improved performance and efficiency of training.

2 Related Work

Over the past years AR has garnered a lot of interest in the Assistive Technology (AT) and Rehabilitation Engineering (RE) fields. In this section we discuss previous findings on effects of hand training, and systems that integrated AR into training and rehabilitation procedures.

The positive effects of finger training for elderly have been studied extensively. Keogh [8] found that strength training improved the participant’s ability to control and apply force with their fingers. In a similar study Olafsdottir et al. [14] found that after a 6-week training period participants exhibited reduced finger-pinch force variability and improved targeting control. Wu et al. [21] studied how training improved finger control in young and elderly. They found that training improved the performance in both groups. We can therefore expect training that proves efficient for young participants to also have a similar effect with elderly.

Some applications make use of AR to support finger training and rehabilitation. Shen et al. [16] use AR to present a virtual piano to the user. By detecting finger presses on the virtual keys this training can assist coordination training for participants who lack muscle strength to use an actual keyboard. Mousavi et al. [12] project virtual objects on a table that users can point at and interact with, while Burke et al. [4] use tracked objects that participants use as controls of various AR games. In a practical application their system would use an HMD to correctly render the CG into the user’s view. By combining AR with haptic devices it is also possible to provide users with realistic haptic feedback, thus increasing the realism of the experience and allowing users to engage in exercises they could not do otherwise. Similar to our work, Luo et al. [11] utilize head-mounted displays for post-stroke hand rehabilitation. They combine AR with haptic feedback gloves to provide feedback during a reach-and-grasp task.

While previous work primarily targeted hand motion training, to the best of our knowledge, our work is the first to study how AR can be applied to support PETS training. Furthermore, while previous methods integrated AR as an essential element of the training, we use it only to present localized guidance. Therefore, the CG are still decoupled from the user’s fingers. This will help us understand if such visualization supports training or if a task that is designed for AR is necessary to improve training results.

3 System Design

In this section we describe the design of our AR-PETS. AR can be presented on a variety of systems, including hand-held devices, projectors, and head-mounted displays. While designing our AR-PETS we considered features that are important for our system. These were portability, accurate rendering of the CG independent of the user’s viewpoint, and keeping the user’s hands unoccupied. Considering these requirements, we opted to use a VST-HMD. Using a VST-HMD allows us to keep the user’s hands free to perform the training, and it can be easily deployed at various locations. Furthermore, the pose of the VST-HMD can be accurately tracked thus presenting consistent AR overlays to the user.

Our VST-HMD consists of an Oculus Rift [13] with Ovrvision Pro attachment [20] (Fig. 1a). Oculus Rift presents the rendered content with a resolution of 1080 \(\times \) 1200 pixels per eye at 90 fps, and Ovrvision provides frames with a resolution of 1280 \(\times \) 960 pixels per eye at 45fps. We use a Phantom Omni [1] haptic device (Fig. 1b) for single-finger PETS training. The Phantom Omni provides haptic feedback to the user and can measure the amount of force the user is exerting, up to 3.3 N.

To ensure that the user perceives the CG as being located next to the Omni controller, it is necessary to perform spatial and temporal calibration of the system. This accounts for offsets and temporal delays of the various devices. We describe both procedures in the following sections.

Fig. 1.
figure 1

Devices used in our AR-PETS: (a) the Oculus Rift with Ovrvision Pro attachment and (b) the Phantom Omni haptic device.

3.1 Spatial Calibration

During the calibration process we determine the alignment between the camera and the HMD, as well as the HMD and the haptic device. After all devices have been properly aligned, we correct the respective temporal delays.

HMD-Camera Calibration. Our VST-HMD captures the scene with two cameras mounted onto the HMD frame. After overlaying computer graphics onto them the resulting view is shown on the display. The system uses an external tracker to determine the pose of the HMD and generates CG corresponding to its pose. This inevitably leads to a mismatch between the rendered CG and the images provided by the cameras. An example of this effect is shown in Fig. 2a. We correct this offset through hand-eye calibration [18]. Given the camera coordinate system C and the HMD coordinate system H hand-eye calibration computes the transformation \(\mathsf{{T}^{H}_{C}}\) from H to C.

We use the implementation of the calibration algorithm provided by the Ubitrack library [6]. After correcting the pose of the CG by \(\mathsf{{T}^{H}_{C}}\) the virtual content appears aligned with the camera images, as shown in Fig. 2b.

HMD-Haptic Device Calibration. In our application we restrict the movement of the haptic tool and instruct users to press onto the tip of the controller from above, as shown in Fig. 1b. Therefore, it is sufficient to approximate the alignment of the haptic device and the HMD. To track the position of the haptic device, we rigidly attached an Oculus controller to it. The external tracking system that is used to track the HMD also tracks the pose of the controller. Because the controller rigidly attached to the haptic device, it is sufficient to place a virtual plate relative to this controller. As the user perceives haptic feedback only when the device is touching the virtual object and can move the device only in one direction, we placed a virtual plane approximately 5 cm above the ground plane. When the user does not push down onto the haptic controller, it rests on the virtual plane, and the pressing force is measured whenever the user pushes onto it from above.

Fig. 2.
figure 2

Without correcting the offset between the HMD and the camera coordinate systems (a) CG that should align with the marker appear misaligned. (b) After the spatial calibration the CG appear correctly overlayed over the marker.

3.2 Temporal Calibration

The VST-HMD could cause users discomfort due to temporal misalignment of the CG and the camera images. This is in part because the processing time of the frames captured by the cameras is longer than the pose estimation of the HMD. As a result, CG may reflect head movement, while the scene image shown on the HMD has not been updated yet. This will inevitably lead to an apparent mismatch of the CG and the camera image. For our system we measured that camera images are processed with a delay of approximately 65 ms. This delay can severely affect the user’s performance and also lead to cybersickness [3]. We thus delay processing of the HMD poses by 65 ms to render CG that are consistent with the camera image.

We also detected a mismatch between data availability of the haptic device and the camera images. However, the pose of the haptic device does not affect the position of the rendered CG, as we use the controller attached to the haptic device to track its position. Therefore, the position of the CG presented to the user is not affected by delays in processing the data of the haptic controller as the user presses onto it. We decided to update the applied forces in real-time and to disregard the mismatch, to ensure that users get accurate feedback about the amount of force they apply at any given time.

4 User Study

PETS training commonly includes all fingers of a hand, while our current AR-PETS was designed for a single finger. To provide insight into applicability of the AR-PETS it is necessary to evaluate it in conditions similar to the actual training scenario. When training with a PETS, users have to perform the following tasks while looking at a monitor:

  1. 1.

    During the sequential test (ST) apply the indicated amount of force with a single, or multiple fingers for a duration of time (Fig. 3a).

  2. 2.

    During the Force Track Test (FTT) adjust the amount of force exerted with a single, or multiple fingers, as indicated on the monitor (Fig. 3b).

  3. 3.

    During ST and FTT, do not apply force with any, but the indicated, fingers.

Fig. 3.
figure 3

Common training tasks with the PETS: (a) A sequential test where users apply a fixed amount of force with a selected fingers at a time, and (b) a force track test where users adjust the amount of force applied by the fingers (in red) according to match the presented diagram (in green). (Color figure online)

Fig. 4.
figure 4

In our experiment participants were asked to perform the task in the Monitor (top row) and the AR-PETS conditions (bottom row). The three tasks were (a) A Force Learning Test where participants learned to apply a given amount of force. (b) A Variable Force Application Test where participants had to vary the amount of force they applied according to the guidance on the display. (c) A Keyboard Pressing Test where participants had to press onto the keyboard with the indicated fingers (in blue) as fast as possible. (Color figure online)

For our experiment we designed the following three tests that require users to apply similar skills:

  1. 1.

    Force Learning Test (FLT)

  2. 2.

    Variable Force Application Test (VFAT)

  3. 3.

    Keyboard Pressing Test (KPT)

Figure 4 shows a user performing these tests with guidance presented on the monitor and in AR. In the following we outline the contents of each test.

Force Learning Test. During the FLT the goal is to learn to apply a given amount of force with the finger. The test consists of a training and an evaluation part. During the training participants are provided feedback about how much force they apply (Fig. 4a). To successfully complete the training, participants have to apply and maintain the indicated amount of force for a period 3 s. During evaluation participants have to press onto the haptic controller with the amount of force they have learned and maintain it for a period of 3 s. During the evaluation participants do not receive any visual feedback about their performance.

Variable Force Application Test. The VFAT was designed to understand if users can better adjust the amount of force they exert onto the haptic device if the guidance is presented next to the finger as shown in Fig. 4b. Overall, the procedure of this test is similar to FTT. To let users get used to the task it includes a 20 s training period, followed by a 30 s long evaluation. Over the course of these 50 s, the guidance alternates between 2 s long stable periods during which participants have to maintain the indicated force, followed by a 1 s transition period in which the indicated amount of force changes linearly.

Keyboard Pressing Test. The KPT was designed to evaluate if participants can more easily translate spatial information to action when it is presented in AR. For this test participants place their left hand onto a Razer Orbweaver keyboard [15] and press as fast as possible onto the buttons with the fingers indicated on the display (Fig. 4c). When participants press onto the keyboard with the wrong finger the corresponding indicator turns red to notify them about the mistake. Once they press onto the keyboard with the correct fingers, the next task appears on the display. In the monitor condition the target fingers are indicated by blue indicators next to the outline of a hand. For the AR-PETS condition, blue indicators are placed next to each finger and their location is adjusted for each participant, to account for different hand sizes and placement.

We implemented all tests in Unity 2017.1.0 [19] on a desktop computer with a 3.5 GHz Intel Core i7-5930K CPU, an NVIDIA GeForce GTX980, and 16 GB RAM.

4.1 Participants

We designed three experiments that address each task and compare how our AR-PETS compares to guidance presented on a monitor. We recruited 18 students from a local university (21–31 years, mean 24, std. dev. 2.1 years). Among the participants 4 had prior experience with haptic devices and 8 had prior experiences with AR. One participant was female.

4.2 Experiment Procedure

Our experiment was designed as a within-subject study with a single independent variable Condition(AR-PETS or Monitor). All participants completed all tests in random order. For each test the order of the Condition was counterbalanced and randomly assigned to all participants.

Before beginning the experiment, we explained the purpose, the procedure, and potential risks of the experiment to the participants. After signing a consent form, participants were seated in front of a monitor and performed all tests. Whenever participants completed a test they answered a questionnaire with 8 questions on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The questions shown in Table 1 were used to evaluate their impression about the task, its complexity, and whether they believed it could be performed by elderly. Finally, participants could also provide comments about their impression of the experiment. Overall, the experiment took about 30–40 min.

Table 1. Questions asked in the questionnaire.

Force Learning Test. To better understand if participants have a good sense about how much force they apply and to evaluate the efficiency of the training we first asked participants to press onto the haptic controller with a force equivalent to 1.0 N (100 g) and to maintain this force for 3 s. This was repeated 3 times. Before the actual experiment, participants first did a practice session. The practice session consisted of a single iteration of training and evaluation for 2.0 N. After the trial participants took the FLT for 1.0 N. During the experiment they repeated the training session 3 times, followed by 3 repetitions of the evaluation session.

We expected that presenting guidance in the AR-PETS condition will result in lower mental workload. Participants would thus be able to reproduce the force more accurately. Based on this we formulated the following hypotheses:

  1. H1:

    After training participants will more accurately apply the targeted force.

  2. H2:

    Participants will be able to more accurately reproduce the amount of force when training in the AR-PETS condition.

Variable Force Application Test. During the VFAT the amount of force participants had to apply varied between 0.5 and 2.0N, and each transition period featured a change of at least 0.2N. In each condition participants did the VFAT three times.

We expected that being able to see how the amount of applied force varies with movement of one’s finger will support accurate control of the applied force. We thus formulated the hypothesis:

  1. H3:

    Participants will follow the indicated force more accurately in the AR-PETS condition.

Keyboard Pressing Test. Before the KPT experiment started, participants could practice with 3 random tasks. After the practice, participants had to press down all fingers to initiate the experiment phase. They proceeded with the test for 31 tasks that consisted of 5 1-finger, 10 2-finger, 10 3-finger, 5 4-finger, and 1 5-finger tasks. The task order and included fingers were randomly selected. This procedure was repeated 3 times.

In this test, we measured how fast participants completed each task and how many mistakes they committed. Because our AR-PETS provides information accurately aligned with the participant’s fingers we expect very little mental workload as the indication does not have to be mapped from the hand on the monitor to one’s own hand. We thus formulate the following hypotheses:

  1. H4:

    Participants will perform the tasks faster in the AR-PETS condition than the Monitor condition.

  2. H5:

    Participants will make less mistakes in the AR-PETS condition than the Monitor condition.

4.3 Results

To evaluate how well users could estimate the amount of force they exerted with their finger, we compared the deviation of the expressed force before and after the training sessions during the FLT. Repeated-measures ANOVA revealed that participants could match the desired amount of force with significantly higher accuracy after training in the AR-PETS (\(F(1,17)=12.39, p=0.00263\)) and the Monitor conditions (\(F(1,17)=13.36, p=0.00196\)). We did not observe any statistically significant difference between the two conditions. We also investigated if there was a tendency to over-/underestimate the amount of exerted force based on the training condition. Our results show that the training condition did not have a statistically significant impact on the user’s performance. The order in which the participants trained also did not have a significant impact on the results.

During the VFAT we evaluated by how much users deviated from the amount of force they should be exerting. Repeated-measures ANOVA did not reveal any statistically significant difference in the deviation between the indicated and applied force between the AR-PETS and Monitor conditions, nor the order in which participants performed the task.

For KPT we compared the reaction time and the number of mistakes for the AR-PETS and the Monitor conditions. Repeated-measures ANOVA did not reveal any statistically significant difference between the user’s reaction time, nor the number of errors. Similarly, the order in which they performed the task did not have a significant effect on the results.

A t-test of the questionnaire results revealed that the only statistically different result was for Q1 for all experiments (\(t=3.7765,~p<0.01\) for FLT; \(t=4.5198,~p<0.001\) for VFAT; \(t=3.0437,~p<0.01\) for KPT), and Q6 (\(t=-2.0785,~p<0.05\)) and Q7 (\(t=-2.0616,~p<0.05\)) for VFAT. We show the results of the questionnaire in Fig. 5.

Fig. 5.
figure 5

The results of the questionnaire. The answers to Q1 for all tests, and for questions Q6 and Q7 for VFAT were significantly different. (* \(p < 0.05\), ** \(p < 0.01\), *** \(p < 0.001\)).

4.4 Discussion

The results of our experiment support our hypothesis H1. After training participants had a better understanding of how much force they apply to the haptic controller. However, we did not find any significant difference in the training results between the AR-PETS and Monitor conditions. This observation rejects our hypothesis H2. We believe that this in part because the task was relatively simple, and users only had to remember how to apply a specific amount of force. Another explanation could be that the discomfort caused by the device distracted the participants from the task. We also cannot deny the possibility that when participants focused onto the force indicators during training they disregarded their finger. In this case the training conditions had similar degrees of divided attention and mental workload.

Our results show that the condition did not affect how well participants could follow the indication during the VFAT. This rejects our hypothesis H3. Some participants mentioned that the guidance in the AR-PETS condition appeared to be smaller than in the Monitor condition. Therefore, it was easier for them to do minute adjustments while looking at the monitor. Another reason could be that similarly to the FLT participants had to focus on the indicators rather than their fingers, which reduced the benefits of localized visualization. They could have also been affected by the unfamiliar device.

Surprisingly we found no difference in how well participants performed during the KPT thus rejecting our hypotheses H4 and H5. This is surprising as we expected to find the largest difference during this task. One possible explanation is that participants quickly got used to the mental mapping from the monitor to the corresponding finger. It is also possible that the CG presented on the VST-HMD was slightly misaligned with the user’s fingers and required mental mapping as well. Another explanation could be that participants were not familiar with the HMD, which negatively affected their performance.

Through the questionnaire our goal was to understand if the rendering was more appealing to the users, even if it did not improve the overall performance. The results of the questionnaire indicate that participants experienced a small degree of discomfort while performing the task in the AR-PETS condition. Most participants attributed this to the relatively low resolution of the images captured by the cameras of the Ovrvision Pro attachment, and the weight of the device. However, some participants also stated that the non-adjustable focus of the cameras created discomfort when coupled with CG objects. This effect could also be because the camera’s location not coincides with the location of the user’s eyes, thus resulting in incorrect depth perception. We believe that such limitations can be overcome in the future by using an optical see-through head-mounted display (OST-HMD). OST-HMDs overlay CG directly into the user’s field-of-view thus not occluding the real world, or requiring scene-cameras that capture the scene.

Interestingly, even though AR-PETS received a similar score to the Monitor condition for most questions, several participants expressed that they perceived it to be more intuitive, easier to understand, and more natural. This was in particular the case for KPT. We believe that during this task the location of the augmentation made it easier for participants to understand what they needed to do, while in the VFAT and the FLT, the task itself was decoupled from the visualization. Nonetheless, participants stated that seeing the force meter move while also seeing their finger helped them perform the task more easily. In the future, a small portable monitor that is placed next to the PETS could be used to achieve a similar effect during the VFAT and the FLT. We believe that the lower rating of AR-PETS for Q6 and Q7 for VFAT was due to the unfamiliarity of the users with the device. Another explanation could be that while the participants did not feel that AR-PETS was worse than the Monitor condition, they did not feel motivated to do the task while wearing the HMD every day. In the future we will develop training scenarios that take advantage of the AR medium to support long-term motivation when performing simple training with HMDs.

A major limitation of our study is that we tested the feasibility with young adults who have good control of their fingers. Therefore, they could have perceived the tasks as being very easy, or had no difficulty making the mental mapping from the visualization to their finger movements. This is supported by the questionnaire results, where participants scored both conditions similarly, but stated that they perceived the AR-PETS condition to be more intuitive and having a lower mental workload. It is therefore necessary to investigate if similar effects can be observed for elderly, and how they translate to the PETS.

5 Conclusion

In this paper we have presented first results of our development of an AR-PETS. We simulate PETS training with a haptic device and compare the user’s performance and impressions when guidance is presented next to the user’s fingers, or on a monitor as in standard PETS. Our results indicate that there was no significant difference in the performance, or the perceived complexity of the two systems. However, multiple users expressed that they felt the AR-PETS provided better feedback and was easier to use.

In the future, we want to conduct experiments where we apply our system to an actual PETS and evaluate it with elderly. Past experiments showed that there is a difference in how well elderly and young adults control their fingers [17]. Given the positive feedback from the participants of our experiment, we expect it to have an even more pronounced effect with elderly.

A major limitation of our current implementation was the use of a VST-HMD that did not provide any focus cues. We plan to develop a system that takes advantage of OST-HMDs to place CG into the user’s field-of-view without occluding the real world, thus providing natural focus cues. We will also investigate how the presentation of the task can be adapted to support multiple fingers at the same time.

Finally, we plan to investigate how AR could be employed to improve the long-term training motivation through gamification of the training process.