1 Introduction

Over the past two decades, VR technology has achieved remarkable progress, primarily in the visual domain (Korkut and Surer 2023; Koulieris et al. 2019; Xiong et al. 2021; Yin et al. 2021; Zhan et al. 2020). Breakthroughs in sensor data resolution and introducing new sensor data types have significantly enhanced the fidelity of virtual experiences. Advances in refresh rates, optics, and the integration of multimodal sensor data have further refined the immersive quality of VR. Complementing these hardware innovations, software developments have enabled the rapid rendering of high-resolution virtual objects and environments, setting new standards for realism and interactivity in virtual spaces. However, these rapid and groundbreaking developments contrast with the relatively slow progress in tactile interactions, particularly those involving the human hand (Lim et al. 2021; Nilsson et al. 2021; Wee et al. 2021). The limited effectiveness of VR for fine object manipulation stems primarily from its inability to achieve precise and realistic interactions within the virtual environment. Fine object manipulation tasks, such as picking up small items or assembling components, require high accuracy of hand representation and multi-digit motion that current VR systems often fail to replicate. The absence of nuanced grasping patterns and the inability to accurately simulate virtual objects’ response to precision manipulation leads to a significant disparity between real and virtual experiences. This gap results in diminished immersion, user frustration, and suboptimal task performance; users may struggle with precise control for delicate or intricate object manipulations, making VR applications feel clumsy and unconvincing. Consequently, the inability to perform these tasks undermines the utility and appeal of VR for precision applications, contributing to a lower rate of adoption and a higher rate of user abandonment. Enhancing the fidelity of the virtual hand representation, multi-digit grasping patterns, and the responsiveness of virtual objects to delicate hand movements constitute critical steps toward a more engaging and effective user experience across multiple VR applications (Table 1).

Table 1 Applications of naturalistic hand-object interactions in VR

In the future, enhanced hand models in virtual surgical simulations could enable trainees to perform intricate procedures with unprecedented realism (McKnight et al. 2020). Advanced collision detection algorithms may allow for the precise manipulation of tiny instruments, simulating delicate tissue interactions with nuanced haptic feedback, helping surgeons refine their dexterity and decision-making skills in a risk-free environment (Abi-Rafeh et al. 2019; Badash et al. 2016). In engineering design, VR-based CAD environments with advanced grasping techniques could allow engineers to interact more naturally with virtual prototypes, facilitating intuitive assembly, testing of complex machinery, rapid design iteration, improved ergonomic evaluations, and remote collaborative design reviews (Wu and Song 2024; Wolfartsberger et al. 2023). Similarly, bio-inspired VR interactions could transform education and skills training in fields requiring fine motor skills, such as welding (Prasetya et al. 2023), pilot training (Hight et al. 2022), watchmaking (Gomelsky 2016), or musical instrument practice (Syukur et al. 2023; Yu et al. 2023). These advancements may closely replicate real-world tactile experiences, with sophisticated haptic feedback enhancing the understanding of proper techniques and force application, while adaptive difficulty levels and targeted feedback could accelerate learning and improve proficiency across various disciplines.

Consumer VR systems typically utilize handheld controllers to control virtual hand avatars’ spatial orientation and positioning. However, these controllers often limit the precision of manipulating virtual objects. In many VR scenarios, the connection between the virtual body avatar and the virtual hand avatar(s) is omitted due to computational complexities and challenges in simultaneous tracking. Although inverse kinematics techniques have extended the range of movements and manipulations possible with fewer tracking points, they can compromise immersion due to tracking inaccuracies and visual inconsistencies (Roth et al. 2016). As a result, virtual hand avatars frequently appear disjointed from the body, detracting from the overall user experience. Many VR games employ diverse strategies for virtual hand-object interactions tailored to their gameplay styles, sometimes sacrificing naturalistic interactions. For example, the “Job Simulator,” developed by Owlchemy Labs in 2016, requires rapid object interaction where the game hides the avatar’s hand during grasping to prevent unnatural collisions with objects. Instead, a continuously tracked stand-in object represents the player’s hand within the virtual space (Schwartz and Reimer 2017). Developers at Owlchemy Labs argue that players adapt seamlessly to this substitution and maintain a sense of ownership, as evidenced by a study where 90% of players did not notice the virtual hand avatar disappearing during gameplay (Schwartz and Reimer 2015). This phenomenon, termed “tomato presence” (Fig. 1, top) from the game’s early builds featuring a tomato as the first object, underscores the adaptability of users to non-standard representations (Steed et al. 2021; Owlchemylabs 2017). However, this approach remains contentious because users typically feel greater embodiment with avatars featuring human-like hands that adjust automatically to objects upon grasping (Ganias et al. 2023), and embodiment tends to be lower when users interact with non-corporeal objects like arrows instead of hands (Yuan and Steed 2010). Optical hand-tracking software, such as those developed by Meta (Fig. 1, bottom left) and Ultraleap (Fig. 1, bottom right), have made notable strides in enhancing accuracy but encounter challenges in fine object manipulation. Ultraleap’s Gemini hand-tracking platform has made notable progress in grasping capabilities but lacks the ability to perform finger manipulation post-grasp. These persistent challenges highlight the ongoing complexity in achieving lifelike hand-object interactions in VR, particularly in the absence of tactile feedback cues from virtual objects (Buckingham 2021).

Fig. 1
figure 1

Examples of hand representations from contemporary virtual reality setups. Top: A frame sequence illustrating the concept of tomato presence. The hand avatar becomes invisible after grasping the object, with the object itself replacing the visual representation of the hand. Bottom left: Virtual hand representation following hand-tracking in Meta. Bottom right: Virtual hand representation following hand-tracking in Ultraleap

The human hand is one of the most complex biological entities and can perform tasks ranging from precisely assembling a mechanical timepiece to punching the opponent in a rink (Jones and Lederman 2006; Napier 1956; Wilson 1999). This complexity has captivated researchers and laypeople from Aristotle’s era to today’s psychophysics, neuroscience, and movement sciences fields. Despite extensive research, we are only beginning to understand and replicate how the hand interacts with diverse objects and situations (Andrychowicz et al. 2020; Laffranchi et al. 2020; Sobinov and Bensmaia 2021; van Duinen and Gandevia 2011). It is, therefore, unsurprising that technology for interacting with virtual objects remains limited. Bridging this gap is crucial for enhancing the functionality and applicability of VR technology. However, before delving into the specifics of hand-object interactions in VR, it is valuable to examine the parallels with the development of human hand prostheses, as both domains may follow similar trajectories. Despite decades of extensive research spanning neuroscience, psychophysics, movement sciences, control systems engineering, and design, even the most advanced prosthetic hands fall short compared to their biological counterparts.

Traditionally, upper limb prostheses, such as split hooks (pre-sensors) and more advanced myoelectric hands (Belter et al. 2013; Bouwsema et al. 2010; Engdahl et al. 2020; Patel et al. 2018; Zecca et al. 2002), have been designed as tools rather than full replacements for a missing arm or hand. These limitations have often led to high abandonment rates of upper limb prostheses, presenting a significant challenge for research (Biddiss and Chau 2007)—a phenomenon not different than the abandonment of VR applications noted above. Instead, developing a truly bioinspired prosthetic hand must replicate several key aspects of human hand form and function. First, it should exhibit anthropomorphic features such as size, weight, and appearance to mimic a natural hand closely (Belter et al. 2013). Second, it should resemble the natural hand in performance aspects like speed, force, and dexterity, with the digits moving at suitable speeds, exerting sufficient grip strength, and grasping synergistic for effective daily functioning (Bicchi 2000; Bicchi et al. 2011). Effective use and dexterity, essential for achieving embodiment, rely on precise and responsive digit control (D’Anna et al. 2019), which requires accurate replication of the human hand’s kinematic model, joint coordination, and synergistic finger motions (Bicchi et al. 2011; Gentner and Classen 2006; Leo et al. 2016; Santello et al. 2016; Weiss and Flanders 2004). This understanding has driven the development of prosthetic systems that incorporate these synergies through designs based on underactuation and, in some cases, mechanical compliance, enabling the device to adapt during interactions with the environment (e.g.,Catalano et al. 2014; Laffranchi et al. 2020; Li et al. 2014).

Despite rapid advancements, VR has yet to fully integrate these concepts into human-object interactions, albeit with distinctions in which features are more critical for VR applications than prostheses. The precision and fidelity of visual feedback from virtual hands are closely tied to motor performance during complex manipulation tasks, as evidenced by several studies (Argelaguet and Andujar 2013; Geiger et al. 2018; Lam et al. 2018; Vosinakis and Koutsabasis 2018). Moreover, the fundamental awareness of hand shape and position (Melmoth and Grant 2006; Schettino et al. 2003; Winges et al. 2003) underscores the crucial need for maintaining consistent virtual hand representation to avoid a sense of sudden disembodiment in VR. However, limitations in hand functionality within VR can transform routine tasks into frustrating challenges, significantly affecting perceived control and agency. Although real-time animation enhances the immersive quality of VR experiences (Jörg et al. 2020; Joy et al. 2022; Lin and Jörg 2016a; Lin et al. 2019; Ogawa et al. 2018; Pan and Steed 2019; Pyasik et al. 2020; Tieri et al. 2017), unnatural interactions of virtual hand avatars with non-body virtual objects, along with unrealistic post-collision behavior of virtual objects, severely limit VR’s applicability for tasks requiring precise manipulation.

This position paper presents a comprehensive theoretical framework for bio-inspired approaches to hand-object interactions in VR, focusing on precision manual tasks. Drawing from biomechanics, neuroscience, and psychophysics insights, we critically review and synthesize existing knowledge to propose a roadmap for advancing virtual hand models, collision detection, and grasp-release mechanisms. While we do not present new empirical data, our work builds on and extends current research, offering a forward-looking perspective. We emphasize the need for rigorous experimental studies to (1) quantitatively compare traditional and bio-inspired hand-object interaction techniques across various precision tasks and (2) empirically validate these methods in real-world VR applications. Through this review, we aim to spark discussion, guide future research, and ultimately contribute to developing more immersive, realistic, and effective VR experiences, unlocking new opportunities for training, skill development, and broader applications.

This review is structured to comprehensively address the multifaceted challenges and advancements in simulating naturalistic hand-object interactions in VR. It begins by outlining the key aspects and challenges in achieving real- istic hand-object interactions, setting the stage for a detailed examination of current limitations and requirements. Following this, the review presents a roadmap to bio-inspired hand-object interactions, offering a strategic vision for future developments. The discussion then delves into the challenges and methods of virtual hand avatar animation, highlighting the technical intricacies involved. Insights from neuroscience and psychophysics are integrated to in- form the development of a sufficiently realistic virtual hand model, emphasizing the importance of interdisciplinary approaches. Contemporary approaches for handling hand-object collisions in VR are reviewed, providing a snapshot of the current state-of-the-art technologies. This is complemented by exploring potential solutions to develop bio-inspired collision handling algorithms, focusing on innovative techniques and methodologies. The review also addresses the implementation of seamless and fluid releases of virtual grasps, a critical aspect for maintaining immersion and realism. Evaluation metrics and criteria for bio-inspired hand-object interactions are then discussed, providing a framework for assessing progress and effectiveness. Finally, the review concludes with potential approaches to address key questions in bio-inspired hand-object interactions, offering a forward-looking perspective on future research directions and technological advancements.

2 Key aspects and challenges in simulating naturalistic hand-object interactions in VR

Simulating naturalistic grasping in VR is a complex problem that involves multiple factors influencing grasping performance, summarized in Table 2. Two critical aspects of the virtual hand model include (1) the extent to which users perceive the virtual hand avatar as their own, known as the sense of ownership, and (2) the degree to which the hand avatar’s movements correspond with the user’s actual hand movements, referred to as the sense of agency. VR users feel a strong sense of ownership over realistic hand avatars but not over abstract representations like arrows (Yuan and Steed 2010). For example, studies demonstrate that users can even perceive ownership over a hand avatar extended up to three times the length of their real arm (Kilteni et al. 2012). The realism of the hand avatar significantly enhances the sense of ownership (Argelaguet et al. 2016; Jung et al. 2017, 2018; Lin et al. 2019; Lin and Jörg 2016b; Lin et al. 2019; Lougiakis et al. 2020), improving grasping performance (Canales et al. 2019) and overall manual interactions (Aslandere et al. 2015; Schwind et al. 2018). Conversely, users often experience a stronger sense of agency with less realistic hand avatars (Argelaguet et al. 2016; Hameed et al. 2023; Lougiakis et al. 2020), illustrating a tradeoff between avatar realism and functional complexity. This discrepancy is exacerbated by the limitations of current hand-tracking technologies in consumer VR applications which struggle to replicate the nuances of real hand movements. Thus, while increasing avatar realism can enhance ownership, it may also amplify movement discrepancies, distorting the balance between realism and functional efficacy in virtual grasping.

Table 2 Key aspects and challenges in simulating naturalistic hand-object interactions in VR

Simulating naturalistic grasping in VR requires mastering two key tasks: collision detection and collision handling. Collision detection involves identifying when and where the virtual hand comes into contact with a virtual object. This is primarily a kinematics problem, focusing on the relative positions of the virtual hand’s digits and the target virtual object. For example, accurate collision detection must track real-time movements of the user’s hand to ensure that the virtual hand avatar reflects these movements precisely. Suppose a virtual finger approaches a virtual sphere; in that case, the system must recognize the point of contact based on predefined bio-inspired criteria, such as the curvature or surface properties of the object. Collision handling deals with what happens after the virtual hand contacts the object. This task is rooted in kinetics or dynamics, simulating the virtual object’s behavior according to physical laws. For instance, if the virtual hand grasps a virtual cup, collision handling algorithms predict how the cup will move, rotate, or even fall if dropped, ensuring that these movements align with realistic physical responses. Accurate collision detection demands real-time tracking of the user’s hand to animate a lifelike hand avatar and determine contact timing and location using bio-inspired criteria. Commonly available physics engines, such as MuJoCo (Todorov et al. 2012) and ThreeDWorld (Gan et al. 2020), offer robust solutions for modeling the behavior of virtual objects after collisions. However, implementing these aspects effectively in VR poses significant technical challenges. For instance, maintaining synchronization between the virtual and real hands in dynamic scenarios, like rapidly grasping a bouncing ball, remains a complex problem. Overcoming these challenges is essential to achieve realistic and intuitive virtual grasping experiences.

The final and yet another critical challenge in simulating naturalistic virtual grasping is managing the release of a grasped virtual object—object release. Implementing a physics-based algorithm for virtual object release often results in discrepancies between the user’s expectations, shaped by real-world interactions, and the simulated object release behavior. For instance, although users expect a smooth and intuitive release based on their experiences with physical objects, a rigid physics-based algorithm might produce unnatural delays or jerky movements when the object is released. Conversely, users face unpredictable release dynamics without a specific release algorithm, causing uncertainty and inconsistent interactions. A notable tradeoff exists between preventing hand-object interpenetration and achieving realistic object release. If the system prioritizes avoiding interpenetration, the release might feel rigid and artificial, as seen in scenarios where the object snaps out of the hand rather than slipping away naturally. Conversely, focusing on a naturalistic release can lead to occasional unwanted interpenetration, where the virtual hand might appear to pass through the object. Effective virtual grasping, therefore, requires algorithms that balance these competing needs: aligning the release mechanism with user expectations while preventing unrealistic visual or physical interactions, such as the hand interpenetrating the object. For example, a well-designed algorithm might allow the virtual hand to adjust its grip and release based on real-time feedback from the user’s hand movements for a seamless and intuitive interaction.

3 Roadmap to bio-inspired hand-object interactions in virtual reality

The ability to grasp and manipulate virtual objects is anticipated to become as crucial in interactive VR as in the physical world. However, VR adoption for precision manual tasks is impeded by the virtual hand avatar’s unnatural interactions with virtual objects and unrealistic post-collision behaviors. This limitation significantly affects VR applications that require fine manipulation, such as manufacturing, virtual assembly, and training in precision manual skills. Replicating naturalistic hand-object interactions is crucial for the adoption and user acceptance of VR applications involving precision manual skills. These innovations will be crucial in generalizing manual skills learned in VR to real-world contexts, reducing dependency on specific training environments. These advancements necessitate (1) Developing a sufficiently realistic virtual hand model (i.e., a set of rigid bodies or deformable meshes) that can implement the complex movements of a biological hand. (2) Exploiting synergistic patterns of multi-digit motion and contact forces revealed by research on neuroscience, psychophysics, and manual actions to develop hand-object collision handling algorithms. (3) Implementing seamless and fluid releases of whole-hand virtual grasps, especially involving complex grasps and in-hand manipulation tasks (Fig. 2; Table 3).

Fig. 2
figure 2

Roadmap to bio-inspired hand-object interactions in VR

Table 3 Roadmap to bio-inspired hand-object interactions in VR

Developing a sufficiently realistic virtual hand model. Creating a realistic virtual hand model capable of emulating the complex movements of a biological hand is crucial for enhancing VR interactions; this model involves developing rigid bodies or deformable meshes that replicate the intricate anatomy and biomechanics of the human hand. Capturing the nuanced flexion, extension, and rotation of each joint, as well as the deformation of skin and underlying tissues during movement, is essential. An accurate representation of the hand’s structure ensures that virtual interactions feel natural and intuitive. Moreover, the model must respond to real-time inputs and integrate tactile feedback, enabling users to perceive and manipulate virtual objects with precision and dexterity akin to the physical world. This realism will enhance VR immersion and facilitate the training and transfer of fine motor skills to real-world applications.

Exploiting synergistic patterns of multi-digit motion and contact forces. Advancing hand-object collision handling algorithms in VR involves leveraging the synergistic patterns of multi-digit motion and contact forces revealed by neuroscience and psychophysics research. These patterns, describing the coordination of multiple fingers for efficient and precise movements, provide valuable insights into the natural dynamics of hand-object interactions. By integrating these insights into algorithm development, VR systems can more accurately simulate the force distribution and digit coordination during grasping, manipulation, and release. This integration enables virtual hands to respond realistically to object properties, such as texture and weight, applying appropriate force and motion to mimic real-world interactions. Consequently, these algorithms will enhance the fidelity of virtual hand movements, making the training environment more authentic and effective and facilitating the transfer of learned skills from the virtual to the real world.

Implementing seamless and fluid releases of whole-hand virtual grasps. Achieving seamless and fluid releases of virtual grasps, especially during complex or in-hand manipulation tasks, is crucial for realistic VR interactions. This process requires mechanisms that allow virtual hands to transition smoothly from holding to releasing objects, mirroring the human hand’s natural grip adjustments and release. Complex grasps involving multiple contact points and intricate finger movements necessitate precise coordination to avoid awkward or unnatural disengagements. Implementing fluid release dynamics ensures that users can intuitively perform actions like setting down objects or adjusting grips without interruption. This capability is essential for applications demanding delicate or precise manipulations, such as virtual assembly or surgical training, where realistic hand behavior is vital for effective practice and skill transfer.

As VR systems advance and virtual environments become more intricate, creating naturalistic hand-based interactions with virtual objects will grow increasingly challenging. Progress in this area will establish the foundation for more nuanced manual interactions with complex virtual objects. The VR field is poised to leverage extensive insights from physical grasping to develop bio-inspired hand-object interactions.

4 Challenges and methods in virtual hand avatar animation

Realistically animating a virtual hand avatar necessitates high-precision, real-time motion tracking of the hand and individual digits. Despite advancements, even the most sophisticated data gloves (e.g. Jiang et al. 2019; Li et al. 2016) lack the precision to mirror real hand movements in a virtual hand avatar perfectly. Creating a fully realistic virtual hand model that captures the biological hand’s complex movements remains a significant challenge (Endo et al. 2014). Often constructed from multiple rigid bodies or deformable meshes, existing hand models offer only close approximations. For instance, a common approach divides the hand into segments using shapes like ellipsoids for the palm and phalanges (Hirota and Hirose 2003; Hirota and Tagawa 2016; Höll et al. 2018; Moustakas et al. 2006; Wan et al. 2004; Wheatland et al. 2015). Modeling a finger can be relatively straightforward: three interconnected cylinders can simulate basic finger movements. However, modeling the thumb is more complex due to its intricate anatomy and functional reliance on finger interactions. The thumb’s unique capabilities, such as opposition—enabled by its free-moving metacarpal and positioning on the trapezium—make it challenging to model accurately (Häger-Ross and Schieber 2000; Hallett and Marsden 1979; Napier 1956). It is even more difficult to model the palm, given its less discrete and more compliant movements than individual digits. Unlike the rigid segments of fingers, the palm’s flexibility and varied motion pose significant hurdles in virtual representation. Current models struggle to accurately simulate the dynamic, non-uniform motions of the palm. Furthermore, integrating this motion tracking with the virtual hand model presents a substantial computational challenge. For instance, synchronizing real-time data from a motion-tracking glove with a deformable mesh model of a virtual hand requires intricate algorithms to ensure fluid and accurate hand animations. This integration is critical for applications requiring precise hand interactions, such as virtual surgery training or intricate object manipulation in VR environments (Table 4).

Table 4 Challenges and methods in virtual hand avatar animation

The most advanced method for animating a virtual hand avatar combines a vision-based tracking algorithm with a deformable hand model, offering robust handling of complex self-interactions and extensive self-occlusions in common hand gestures (Smith et al. 2020). This sophisticated technique also excels in tracking bimanual interactions and managing intricate hand configurations. For instance, it can accurately capture a hand’s movements even when occluded during tasks like clapping or intricate finger weaving. However, deploying this method in real-time and consumer VR is impractical due to its requirement for 128 cameras and extensive computational resources. Alternatively, an image-based method uses non-neural network architectures to detect hands and estimate key-point locations running on personal computers and mobile processors (Han et al. 2020). Despite its accessibility, this method struggles with hand-hand and hand-object interactions. For example, while it can track a hand’s position and orientation in space, it fails to correctly model interactions like picking up a pen or manipulating small objects where fingers and objects occlude one another. Another approach, an autoregressive model, predicts finger poses by feeding the network current finger positions, past and future trajectories, and spatial representations derived from these trajectories (Zhang et al. 2021). Although this method achieves predictive accuracy using a 0.5-second delay to gather future input trajectories, its delay impacts its utility for real-time applications, as seen in scenarios where instantaneous response is crucial, such as playing a fast-paced piano piece or quickly adjusting hand positions in a game. In addition to these methods, there are impressive yet computationally demanding efforts to animate virtual hand avatars by simulating muscles and tendons or even skin dynamics. These approaches create highly realistic hand animations, modeling the subtle movements of tendons under the skin or the deformation of the skin surface during flexion (Sueda et al. 2008; Vaillant et al. 2013).

While these methods offer deep insights into hand mechanics, their resource-intensive nature renders them impractical for real-time applications. This limitation makes them unsuitable for VR systems that demand instantaneous feedback, such as virtual surgery training or interactive VR art creation.

As apparent from this discussion, each current method for animating virtual hand avatars faces distinct challenges that limit their suitability for VR applications involving precise manual tasks. This mismatch often arises from an overemphasis on simulation complexity. For example, while muscle simulation offers exceptional realism, its integration into real-time VR systems for activities involving precision object manipulation remains unfeasible. Effective hand animation in VR requires a delicate balance between computational complexity and real-time performance. One promising innovation is the so-called “anatomical filter” designed to refine hand poses generated by hand tracking systems (Isaac et al. 2021). This filter applies biomechanical constraints derived from the human hand’s articulated structure and joint limitations (Gustus et al. 2012). It adjusts the 26-degree-of-freedom vector representing joint angles to ensure poses remain within anatomically plausible bounds. This approach enhances the accuracy of hand tracking by correcting deviations that might occur due to tracking errors or anatomical inaccuracies. However, this anatomical filter places significant demands on computational resources. Calculating and adjusting joint angles in real time imposes processing delays, affecting the system’s responsiveness during interactive VR applications. Despite these challenges, integrating such filters into hand-tracking systems represents a significant step toward achieving more naturalistic hand-object interactions in VR. Due to their specialized requirements, these solutions are still far from integrated into consumer VR setups. Consumer VR continues to await innovations enabling sufficiently realistic virtual hand models for precision manual tasks.

5 Insights from neuroscience and psychophysics for developing a sufficiently realistic virtual hand model

The literature on the neuroscience and psychophysics of manual movements offers valuable insights to streamline the computational complexity of real-time hand tracking and enhance the naturalness of virtual hand avatar movements during precision manual tasks. Neuroscience research highlights the central nervous system’s mechanisms that support the intricate sensorimotor coordination involved in grasping. For example, studies reveal that various finger movements can be evolutionarily superimposed onto fundamental grasping patterns (Rearick et al. 2003; Reilly and Schieber 2003; Schieber and Santello 2004). This allows a reduced set of coordination patterns, known as principal components or synergies, to effectively correlate joint motions and force exertions across multiple fingers (Della Santina et al. 2017; Latash et al. 2007; Santello et al. 1998, 2016). These synergies are crucial in explaining hand pose variability with minimal complexity, thereby simplifying the computational models required for realistic hand tracking in VR (Table 5).

Table 5 Insights from neuroscience and psychophysics for developing a sufficiently realistic virtual hand model

Principal components or synergies reflect biomechanical constraints and synchronized motor unit activity governed by neural mechanisms (Latash et al. 2007; Leo et al. 2016; Ting and McKay 2007). For instance, research indicates that synergies exhibit a hierarchical gradient, where lower-order synergies account for basic hand opening and closing movements, and higher-order synergies facilitate precise adjustments in hand shape (Santello et al. 1998). This hierarchical structure allows for more efficient computational modeling, as only a few principal components are needed to accurately represent a wide range of hand movements. This approach can significantly reduce the computational load while maintaining the realism of hand interactions in virtual environments. Incorporating these insights into VR systems can revolutionize the user experience, making hand avatars more responsive and natural. By leveraging synergies, developers can create more efficient algorithms that require less computational power yet provide highly accurate and realistic hand movements. This advancement can bridge the gap between the current limitations of consumer VR setups and tasks requiring fine manual dexterity, such as virtual surgery simulations or intricate object manipulation. The insights from neuroscience and psychophysics suggest adopting hand representations that minimize the movement degrees of freedom tailored to specific tasks. Traditional approaches often model distal joint flexion as a function of middle joint flexion, which is effective when these joints move together without external forces (Kamper et al. 2002, 2003). For instance, the model used in finger movements shows that distal joints flex in relation to middle joints when making a fist or gripping a standard object (Kim et al. 2012). This simplification works when intricate individual joint movements are not required, such as in basic hand gestures or simple object manipulation in VR.

More advanced techniques have emerged lately, employing the concept of synergies to create simplified control methods for artificial hands and virtual hand avatars. Synergies involve using a reduced set of coordination patterns—principal components—that capture multiple fingers’ essential movements and forces. This approach can dramatically enhance computational efficiency by reducing the movement degrees of freedom required to animate virtual hand avatars in real time. For example, synergy-based models have been successfully applied to artificial hands, enabling complex tasks with fewer control signals by harnessing these simplified movement patterns (Bicchi et al. 2011; Brown and Asada 2007; Catalano et al. 2012; Gabiccini et al. 2011; Geng et al. 2011). This principle can also be extended to virtual environments, allowing for more fluid and natural hand movements by focusing on coordinated finger actions rather than tracking and reproducing each joint movement independently.

In VR applications, where precise hand movements are crucial for tasks such as virtual surgery, interactive design, or teleoperation, reducing the complexity of hand models without sacrificing realism can significantly enhance system performance and user experience. By applying synergy-based models, VR systems can focus on essential movement patterns, making hand movements appear smoother and more lifelike (cf. Bianchi et al. 2012, 2013; Mulatto et al. 2010). For instance, when a user intends to grasp a virtual object, a synergy model ensures that the avatar’s fingers move in a coordinated manner that mimics natural hand motion. This reduces computational load and provides a more realistic and responsive hand model, crucial for applications that demand high accuracy and real-time feedback, such as virtual training simulations or interactive virtual design environments. This innovative approach is expected to enable more complex interactions while maintaining system efficiency and responsiveness.

The integration of neuroscience and psychophysical insights into manual movements promises significant advancements in realistic virtual hand models for precision manual tasks in VR. Leveraging the central nervous system’s ability to streamline complex sensorimotor coordination into fundamental grasping patterns can simplify computational models for real-time hand tracking. Synergies provide a framework to reduce hand pose variability with minimal complexity. Focusing on a few principal components to represent a wide range of hand actions can enable efficient modeling while preserving the naturalness of hand interactions. This will reduce computational load and enhance the realism and responsiveness of virtual hand avatars. This approach is also expected to allow for smoother, more lifelike hand movements in applications from virtual surgery to interactive design, improving system performance and user experience. As VR technology evolves, integrating these insights will be crucial in developing sophisticated hand models that offer both efficiency and high realism, paving the way for more naturalistic hand-object interactions in VR.

6 Contemporary approaches for hand-object collision handling in VR

In non-grasping contexts, both display and interaction fidelity significantly influence user strategy, performance, and subjective perceptions of presence, engagement, and usability in VR (Laha et al. 2014; McMahan et al. 2012; Nabioyuni and Bowman 2015). For naturalistic hand-object interactions, a virtual object must appear visually realistic and respond accurately to virtual hand interactions. Collision detection algorithms are defined as computational methods that detect contact between a virtual hand and a virtual non-body object based on predefined criteria, potentially involving precise computations of spatial coordinates and timing of the contact. Collision handling, in contrast, encompasses a broader scope, extending beyond mere detection to include the subsequent management of the grasp and manipulation of the virtual object by the virtual hand avatar. Complementary and effective implementation of collision detection algorithms and comprehensive collision handling strategies are pivotal for enhancing the realism and interactivity of virtual environments, ensuring seamless interaction between virtual hands and objects while optimizing user experience and engagement.

Detecting collisions between a virtual hand and an object in VR has spurred the development of various makeshift solutions (e.g., Fig. 3), each accompanied by its own set of limitations. One of the simplest methods involves using a hand avatar while the user manipulates a handheld controller, facilitating basic interaction but lacking the sophistication required for complex virtual environments. Early approaches relied on databases of human grasps to align hand shapes with object shapes; for instance, Li et al. (2007) developed a method using collections of features with similar placements and surface normals to synthesize grasps. In a related vein, Oprea et al. (2019) introduced an automatic fitting mechanism where the virtual hand adapts to intricate object shapes without requiring unique grasping animations for each object, starting from a predefined 6D hand pose. A more refined technique combines a minimization process with heuristics to classify grip types based on finger and palm interactions, thereby altering the visual behavior of the virtual hand. For instance, different touch patterns signify different grasping intents, such as precision or power grasp (Funahashi et al. 1999; Zachmann and Rettig 2001). However, this method may fail if the hand initially collides with the object, necessitating user intervention to relocate to a collision-free space. Another approach focuses on constructing finger grasp trajectories and detecting collisions along these paths rather than modeling the entire hand (Zhu et al. 2006). This method enhances focused collision detection but may overlook other hand components’ interactions. A different strategy identifies colliding parts of the hand avatar with a virtual object and computes their geometric center; if this center overlaps with the object, its simulated physical properties are disabled, allowing it to move with the hand (Liu et al. 2019). While effective, this “caging” approach can be unstable for smaller objects.

Fig. 3
figure 3

Adapted from Prachyabrued and Borst (2016)

Virtual hand-object interpenetration. Interpenetration problem: If a user reaches for a virtual object that does not exist in reality, they may see their hand avatar enter it (inner hand). Various approaches have been suggested to improve virtual grasping. Outer Hand: The 3D hand model is constrained to avoid visual interpenetration while the real (tracked) hand interpenetrates the virtual object. See-Through: The interpenetrated portion of the hand is made visible. Two Hands: The inner and outer hand models are presented simultaneously. Finger Color: As closure increases, finger color changes continuously from normal to red. Object Color: As closure increases, the object color changes continuously from normal to red. Arrow: Arrow glyphs emerge from the fingernails of an outer hand model, growing with increasing hand closure. Vibration: As closure increases, the fingers vibrate visually.

Understanding the diverse hand-object collision handling methods in VR, including hybrid bounding volumes, virtual springs, and post-collision adjustments for tool-object interpenetration, further illuminates their contributions and limitations. Hybrid bounding volumes address simpler objects based on their characteristics but often sacrifice realistic grasping dynamics (Yuan et al. 2003). Meanwhile, some solutions employ virtual springs between the real hand and the hand avatar, triggered by typical grasping criteria to determine grasp feasibility (Borst and Indugula 2005; Delrieu et al. 2020). Addressing tool-object interpenetration, such as with virtual chopsticks, involves adjusting virtual finger angles post-collision to prevent visual overlap, enhancing realism in interaction scenarios (Kitamura et al. 1999). Early surgical simulators simplified interaction with virtual tissue by representing interactions as a single point in space (Zachmann 2000), yet this oversimplified approach limits its applicability in complex surgical maneuvers.

While these methods represent makeshift solutions, more systematic approaches to collision handling in VR can be classified into three main categories (Table 6).

Table 6 Contemporary approaches for hand-object collision handling in VR

Heuristics-based approaches utilize pre-existing knowledge about hand kinematics and virtual objects to facilitate virtual grasping. This methodology relies on pre-recorded real-hand kinematic data, which is synthesized during runtime to find the closest match for the current task (Funahashi et al. 1999; Höll et al. 2018; Huagen et al. 2004; Li et al. 2007; Miller et al. 2003; Oprea et al. 2019; Rijpkema and Girard 1991; Wan et al. 2004; Zachmann and Rettig 2001). For instance, Miller et al. (2003) presented an approach where a database of hand poses is used to determine the best grasp configuration for picking up a virtual tool. Similarly, Oprea et al. (2019) developed a method in which the hand is automatically fitted to the object shape from a position, and the user’s orientation is determined by the VR handheld controllers. This technique is particularly effective for interactions involving predefined objects, such as selecting and manipulating virtual gadgets or tools, because it simplifies the determination of appropriate grasping postures. In a virtual reality training scenario for assembly tasks, heuristic data allows users to pick up parts accurately and position them without complex real-time calculations. However, this reliance on pre-recorded kinematic data limits the approach to scenarios where such data is available and applicable. It requires prior information about the objects and their interaction characteristics, making it less adaptable to unconstrained or dynamic environments where users encounter novel or undefined objects. This can restrict the flexibility of virtual interactions, particularly in open-ended virtual environments where users might engage with a wide range of objects and actions.

Modified heuristics-based approaches build on traditional heuristics but integrate them with modifications of physics- based methods, thus not relying on a grasp database. For instance, Prachyabrued and Borst (2012a) developed a method to detect the intent to release a grasp by modifying a physics-based approach without using a predefined grasp database, enabling more adaptive interactions based on real-time physics calculations. This approach remains physics-driven for establishing grasps, while heuristics refine the release mechanism. Other methods use simple rules for grasp detection, avoiding databases altogether. For example, Iwata (1990) proposed that an object is considered grasped when the thumb and index finger touch it simultaneously. Maekawa and Hollerbach (1998) suggested that the user’s hand posture should match predefined grasping patterns for the object, enabling grasp detection through pattern recognition, and Hilliges et al. (2009) introduced a technique where a ray projected downward from the center of mass of a gesture hole intersects with the object, indicating a grasp. Ullmann and Sauer (2000) utilized contact geometry to establish one- and two-hand grasps, providing a framework for more complex interactions. Additionally, Holz et al. (2008) and Moehring and Froehlich (2010) developed methods to detect pinch grasps by analyzing contact pairs and friction, demonstrating that simple, rule-based heuristics can effectively manage certain grasping tasks without complex databases. These approaches are advantageous because they simplify the grasping process and reduce computational requirements, though they may struggle with complex or unconstrained interactions due to their reliance on simple rules.

Physics-based approaches to collision handling in VR involve representing the user’s hand and fingers as interface points, computing interaction forces through collision response algorithms, and simulating the motion of virtual objects based on these forces. Early implementations used haptic interfaces to handle these interactions but faced limitations due to the exclusive reliance on such hardware (Bergamasco et al. 1994; Borst and Indugula 2005; Hirota and Hirose 2003). More sophisticated approaches aim to eliminate the need for haptic gloves by leveraging real-time simulation, though they encounter issues related to simulation speed, stability, and accurate hand tracking. For example, Erez et al. (2015) explored methods to enhance simulation stability, while Taylor et al. (2016) focused on improving the accuracy of hand tracking. Jacobs and Froehlich (2011) introduced a finger-based deformable soft body approach that uses soft pads on rigid finger bones for convex objects but does not account for the palm, leading to incomplete interaction models. Other methods, like the use of particles on the hand mesh (Hilliges et al. 2012), aim to induce forces but suffer from tracking occlusions and difficulties with sharp object features, as noted by Hilliges et al. (2009) and Hirota and Hirose (2003). Nasim and Kim (2016) proposed a dynamic proxy hand to apply forces; however, the approach’s reliance on freezing the proxy hand post-collision detection restricts grasp possibilities. Alternative techniques include volumetric collision detection (Kim and Vance 2003, 2004; Moustakas et al. 2006), complex friction models (Höll et al. 2018; Talvas et al. 2015), sphere-tree structures (Tzafestas 2003), and physical constraint solvers (Jacobs et al. 2012), with some even considering hand flesh dynamics (Garre et al. 2011; Perez et al. 2013). Despite their potential, these models often struggle with real-time application due to computational complexity and inaccuracies, particularly when managing multiple grasp contacts and constraints. Furthermore, discrepancies between real hand movements and virtual object responses, such as penetration issues, can result in unrealistic interactions, underscoring the challenges in achieving stable and precise physics-based simulations in VR. While physics-based approaches offer realistic simulations, their computational demands and complexities require ongoing refinement to achieve practical VR experiences.

Hybrid approaches to collision handling in VR combine heuristics-based and physics-based methods to leverage the strengths of both strengths. For example, Kim and Park (2015, 2016) proposed a hybrid technique where contact forces are computed at small portions of the hand avatar. When these forces surpass a predefined threshold, the virtual object is considered grasped, initiating its motion. This allows for precise force-based interactions while incorporating heuristic thresholds to simplify computations. Another method developed by Pollard and Zordan (2005) and Zhao et al. (2013) synthesizes new grasps using previously recorded ones, driven by a physics engine, allowing the system to adapt to varying objects and interaction scenarios. For instance, in a VR training scenario, this approach can automatically adjust the grasp based on recorded data, ensuring that the grasp remains consistent with real-world dynamics. However, this automation might compromise the user’s sense of ownership or agency over the hand avatar if the system’s grasping actions become too perceptible. While this approach could be highly effective in scenarios like virtual surgery, where precise and consistent grasping is crucial, it might diminish the immersive experience if users feel that the virtual hand’s movements are not fully controlled. Despite these challenges, hybrid approaches effectively balance realistic physics.

In summary, the ongoing development of collision detection algorithms in VR reflects a blend of heuristic, physics-based, and hybrid approaches. Heuristic-based methods offer simplicity and efficiency but are often limited by their reliance on pre-recorded data and predefined scenarios. Physics-based techniques provide detailed simulations of interaction forces and object dynamics, yet their complexity and computational demands pose significant challenges for real-time applications. Hybrid approaches combine the strengths of heuristics and physics, offering a more adaptable solution that balances computational efficiency with realistic interaction modeling. Despite these advancements, achieving seamless and intuitive hand-object interactions in VR remains a complex task requiring continued innovation. Effective collision detection must integrate visual realism with accurate, responsive behaviors to meet virtual environments’ diverse and evolving requirements. This underscores the critical need for ongoing research and refinement in this dynamic field.

7 Potential solutions to develop bio-inspired collision handling algorithms

Existing methods of collision detection primarily rely on simplified hand models to identify contact points or estimate contact areas or volumes (Aslandere et al. 2015; Furmanek et al. 2019; Kim and Vance 2003, 2004; Mangalam et al. 2021; Moustakas et al. 2006). For instance, some systems use static models that map a fixed number of contact points on the hand’s surface, while others utilize dynamic models that adjust points based on hand movements. Using multiple points can help simulate friction or rotational effects, such as when a user grasps a textured ball and experiences resistance and rolling based on surface interaction. However, as real hands can penetrate virtual objects, defining meaningful contact points in simplified hand models is challenging. Additionally, updating these points when contact slips along an object’s surface, like when sliding a hand along a virtual table, is another significant challenge.

Once a collision is detected, it is crucial to provide users with cues to prevent virtual finger interpenetration. Electrotactile feedback indicating the interpenetration distance is of limited value (Vizcay et al. 2021). Solutions like semi-transparent, interpenetrable hands have significantly improved precise manipulation, as verified by task data and user feedback (Höll et al. 2018; van Veldhuizen and Yang 2021). For example, when users grasp virtual objects with semi-transparent hands, they report better control and accuracy than opaque models. Techniques providing auditory, haptic, or visual cues are generally preferred over simple visuals of interpenetration, with color changes being particularly favored (Fabiani et al. 1996; Moehring and Froehlich 2011; Prachyabrued and Borst 2014, 2016). For instance, changing the color of a virtual object when contact is made can significantly enhance user experience. More direct cues about interpenetration, such as revealing intersecting hand portions, show promise in maintaining proper grasp aperture and improving performance measures (Prachyabrued and Borst 2016).

Behavioral evidence shows that fine inter-digit coordination is highly sensitive to the mechanical properties of the target object, affecting grasp stability during manipulation (Fu and Santello 2014)—visual cues about object shape and density influence digit forces’ anticipatory planning and placement planning. People anticipate the upcoming task by varying digit placement before manipulation (Crajé et al. 2011; Lee-Miller et al. 2016). For instance, when preparing to lift a heavy virtual box, users adjust their finger positions to ensure a secure grip. Objects with heterogeneous mass distribution roll under symmetric digit placement, and people learn to minimize object roll within three trials (Crajé et al. 2013; Fu et al. 2010). With sufficient exposure to a specific collision detection criterion, users might adapt their grasping behavior, slowly distorting it. When one digit is placed higher on a virtual object with a given mass distribution, the object must roll to match the behavior of a physical object with an identical mass distribution. In real life, people counter object roll and prevent slippage by changing hand orientation.

Another significant issue with using contact points is the bulkiness of state-of-the-art haptic interfaces (Burdea 2000)—despite considerable technological advancements, these devices remain cumbersome (Culbertson et al. 2018; Hamza-Lup et al. 2019). For example, modern haptic gloves that provide detailed touch feedback can still be heavy and restrictive, limiting natural hand movements. Borst and Volz (2005) highlight the persistent challenges associated with glove-mounted haptic devices, which continue to apply to modern versions. Studies have shown that haptic technologies, such as vibrotactile feedback systems, can subtly influence natural grasping (Sundaram et al. 2019), complicating the integration of realistic collision detection. Also, haptic feedback in consumer VR remains the exception rather than the rule. Therefore, there is a need for more innovation in the haptic-free domain of collision detection. This calls for studies on the spatiotemporal properties of hand-object interactions in the real world, focusing on the nuances of reaching and grasping physical objects (cf. Furmanek et al. 2019, 2021; Mangalam et al. 2021).

Developing bio-inspired collision handling algorithms is crucial for enhancing the realism and functionality of virtual and augmented reality systems. Table 7 presents various potential solutions to achieve this goal, addressing different aspects of collision handling. Enhanced contact point models rely on complex, dynamic hand representations, adapting to real-time data to predict and update contact points. ML algorithms trained on datasets of natural grasping motions can enable these models to provide dynamic, realistic interactions. Feedback loops allow continuous adjustments based on hand movement or object slippage. Behavioral adaptation algorithms and adaptive grasping interfaces focus on personalizing the interaction experience. Systems can accommodate individual grasping patterns by monitoring user behavior and adjusting collision detection criteria, ensuring natural and undistorted interactions. Adaptive interfaces that measure hand shape and force can modify virtual hand representations in real time, enhancing grasping simulation realism. Advanced predictive models and the simulation of mechanical properties aim to anticipate and react to potential collisions before they occur. Predictive algorithms forecast collision points using motion tracking data, providing proactive input to guide user movements. Incorporating physical properties like mass distribution and texture into collision handling algorithms ensures that virtual objects behave more realistically, mimicking their real-world counterparts. Finally, integrating haptic and multi-modal sensory feedback can enrich the interaction by providing users with tactile, auditory, and visual cues. Lightweight, flexible haptic gloves can simulate various sensations, while synchronized sensory feedback can enhance perception and prevent interpenetration. These innovations collectively will push the boundaries of virtual interactions, making hand-object interactions more naturalistic.

Table 7 Potential solutions to develop bio-inspired collision handling algorithms in VR

8 Implementing seamless and fluid releases of virtual grasps

A pivotal aspect of grasping in VR is managing the release of grasped virtual objects. VRs often fail to accurately simulate the subtle hand-object interactions, resulting in unrealistic behaviors like the virtual fingers sinking into the virtual object. Releasing a grasped virtual object often requires a deliberate action from the user, which contrasts sharply with the intuitive and immediate nature of the release during real-world grasping. Without such a mechanism, users must grapple with the constant uncertainty of when and how a virtual object will be released, leading to unpredictable interactions. This challenge intensifies with intricate grasps and in-hand manipulations. For instance, when shifting a virtual tool from one hand to another or performing fine motor tasks like adjusting a small part during virtual assembly, the system must precisely interpret when to release or maintain the grasp based on users’ intent and hand movements. Ensuring that virtual hands and objects behave consistent with users’ expectations upon release, thus mirroring real-world experiences, is essential for achieving naturalistic hand-object interactions in VR. This involves not just simple releases but extends to complex scenarios where the users manipulate objects without intending to let them go. For example, when a user rotates a virtual screwdriver or juggles virtual objects, the virtual system must replicate the tactile feedback and dynamics that users rely on in the real world. The limitations of current physics engines and the inadequate modeling of hand-object contact areas compound the intricacy of this issue. These problems highlight the need to implement seamless and fluid releases of whole-hand virtual grasps (Table 8).

Table 8 Implementing seamless and fluid releases of virtual grasps

Several methods have been developed to address the challenge of grasp release in VR, each showing varying degrees of success in simple grasp scenarios. One common approach involves using grasp condition violations, such as the distances between contact points (Moehring and Froehlich 2010), or finger motions (Prachyabrued and Borst 2012a), to determine the release state. For instance, if the contact points between the virtual hand and an object move beyond a certain threshold, the system can infer that the object should be released. Another method involves detecting collision-free states or applying positional offsets to the hand state data to facilitate release (Holz et al. 2008; Zachmann and Rettig 2001). This can be useful in tasks like placing virtual items on a shelf, where the system adjusts the hand’s position to avoid collisions and ensures smooth object placement. Heuristic analysis offers another solution: the system searches backward in time to find an adjusted release position. For example, the system can analyze past hand movements in a virtual reality training scenario to identify the optimal moment for releasing a tool or component (Osawa 2005, 2006). Despite these advancements, a fundamental issue remains: preventing the virtual hand from interpenetrating the virtual object while allowing natural grasp release. Collision Detection Algorithms (CDAs) are typically used to manage hand-object interactions by detecting collisions and keeping the virtual hand outside the object for release (Boulic et al. 1996; Holz et al. 2008; Zachmann and Rettig 2001). However, these algorithms often constrain the virtual hand to the object’s boundaries, making it difficult to achieve precise release without increasing hand closure or reducing release accuracy (Borst and Indugula 2006; Jacobs and Froehlich 2011). For instance, in a virtual assembly task, keeping the virtual hand outside the object might result in the hand closing too much around a component, making it challenging to release it precisely in the correct position (Canales et al. 2019; Prachyabrued and Borst 2012b, 2016). To mitigate this, the system could allow a small degree of interpenetration, letting the virtual hand slightly overlap the object during grasping, which enhances release performance. For example, minimal interpenetration can facilitate smoother transitions between grasping and releasing tools in virtual reality simulations involving tool manipulation. Recent studies have confirmed that balancing interpenetration and release precision can significantly improve release performance. When interpenetration is visually minimized, even a small opening motion of the real hand should suffice to release the object in the virtual environment. This approach has been shown to improve the release of virtual objects like surgical instruments or delicate items in a virtual shop without requiring extensive real hand movement outside the object. Allowing a finite extent of interpenetration can thus reduce the strain on precise hand movements and enhance the overall user experience in virtual reality (Prachyabrued and Borst 2011). This balance between preventing interpenetration and enabling naturalistic grasp release remains one of the most formidable challenges in developing naturalistic hand-object interactions in VR.

The current literature on grasping provides limited insights into the dynamics of grasp release. It remains unclear whether hand opening during this process is a generic action—such as a simple extension of the involved digits—or if it is task-specific, relying on the particular coordination patterns used during grasping and subsequent object manipulation. Understanding the coordination between fingers and the influence of object properties is crucial to facilitating smooth object release. This knowledge could inform the development of grasp-release heuristics, as explored in previous studies (Moehring and Froehlich 2011; Osawa 2005, 2006; Zachmann and Rettig 2001). One promising approach involves leveraging machine learning and artificial intelligence (ML/AI) to analyze large datasets of finger release motions across various objects and task contexts. For example, by studying how different finger movements interact with objects of varying shapes and sizes, ML/AI can identify patterns that facilitate smooth release. This analysis will enhance heuristic release detection, refining the parameters used in collision detection algorithms and improving virtual object manipulation’s overall realism and functionality. For instance, in scenarios where a virtual user needs to release a tool after precise manipulation, ML/AI can predict the optimal finger motions based on prior data, making the release

more intuitive and accurate. Understanding these dynamics could also lead to sophisticated interactions like passing a ball between fingers or typing on a virtual smartphone held in the same hand. These complex in-hand manipulations are already achievable in underactuated robotic hands, such as the ETHOHand, which uses minimal control points to achieve nuanced movements (Konnaris et al. 2016). Applying similar principles through ML/AI in VR can enhance the realism and effectiveness of virtual hand interactions, enabling users to perform intricate tasks seamlessly.

9 Evaluation metrics and criteria for bio-inspired hand-object interactions in VR

Bio-inspired hand-object interactions in VR hold immense promise for creating realistic and immersive experiences. But without proper evaluation methods, it is like driving blindfolded—we can not measure progress or ensure our solutions are truly effective. To bridge this gap, robust metrics and evaluation criteria are essential. These metrics encompass objective factors like realism, movement accuracy, haptic feedback, and performance and subjective aspects like user experience, engagement, and customizability. By systematically evaluating these criteria, researchers can gain crucial insights. They can identify strengths and weaknesses, refine their solutions, and ultimately push the boundaries of VR technology toward ever-more realistic and engaging user experiences.

A well-rounded evaluation approach is key to unlocking the full potential of bio-inspired hand-object interactions in VR (Table 9). Realism and immersion are crucial, but that is just the tip of the iceberg. Accurately capturing hand movements is essential to mirror real-world dynamics, boosting user satisfaction. High-quality haptic feedback adds another layer of realism, making virtual interactions feel natural. But VR is not just about looking cool. Performance in complex tasks, like object manipulation, reveals a system’s ability to handle fine motor skills—a game-changer for training and simulations. Computational efficiency, which uses processing power wisely, ensures smooth and responsive VR experiences. Beyond the technical specs, user experience and engagement metrics tell us how enjoyable and intuitive the system is—directly impacting user satisfaction and retention. Finally, a system’s adaptability and customizability cater to diverse user needs and tasks, making it versatile and user-centric. This comprehensive evaluation framework paves the way for refining bio-inspired hand-object interaction systems, propelling VR towards more immersive and functional virtual environments.

Table 9 Evaluation metrics and criteria for bio-inspired hand-object interactions in VR

A comprehensive and systematic evaluation of bio-inspired hand-object interactions in VR is critical for driving innovation and validating the effectiveness of these systems. The framework outlined in Table 9 provides a robust approach to assessing pivotal aspects of VR interactions, encompassing technical performance metrics and user-centric factors such as experience, engagement, and adaptability. The rigorous application of these evaluation criteria enables researchers to diagnose and address deficiencies in current VR technologies, thereby refining and advancing the sophistication of solutions. This iterative evaluation process facilitates the development of VR systems that offer heightened realism, improved user engagement, and demonstrably superior performance. Consequently, the field is expected to advance toward creating highly functional and immersive virtual environments that align more closely with user requirements and expectations.

10 Potential approaches to address key questions in bio-inspired hand-object interactions in VR

Developing bio-inspired hand-object interactions in VR is pivotal for creating immersive and realistic experiences across various applications requiring precision manual skills. Achieving this objective requires overcoming significant challenges, such as constructing accurate virtual hand models that reflect the complexity of biological hand movements, integrating real-world grasping data and biomechanical insights about the human hand function, and developing effective hand-object collision handling algorithms. In this section, we explore potential approaches to addressing these key questions, presenting a range of methodologies and technologies that can enhance the fidelity and functionality of hand-object interactions in VR. Each key question is followed by potential approaches that blend technical innovation with theoretical and empirical research to provide a comprehensive framework for advancing hand-object interactions in VR.

Table 10 outlines various potential approaches to address key questions in developing bio-inspired hand-object interactions in VR. Methods for creating realistic virtual hand models include deformable mesh modeling, high-fidelity motion capture, and optimized fluid simulation techniques. Enhancing virtual hand-object interaction fidelity involves integrating real-world grasping data and biomechanical models through hybrid approaches, data partnerships, and empirical validation. Neuroscientific and psychophysical insights can guide the development of hand-object collision handling algorithms by leveraging studies of neural correlates and implementing real-time machine learning algorithms. To optimize the visual quality of hand-object interpenetration in VR, a balance between rendering performance and user perception is sought through experimental rendering techniques and adaptive algorithms. Hybrid collision handling algorithms can be engineered to switch between methods based on predictive machine learning. For haptic-free VR, sensory substitution techniques are explored using synchronized audio-visual cues and predictive algorithms. Extending VR interactions to fine finger manipulation requires high-resolution tactile feedback and enhanced dexterity in task-specific hand models. Advances in machine learning can be harnessed to train adaptive AI models and generative techniques for naturalistic simulations. Addressing computational challenges involves optimizing algorithms for GPU parallel processing and collaborating with hardware developers. Cognitive science insights inform the ergonomic and intuitive design of VR interfaces. At the same time, user adaptation and customization are integrated through profiling algorithms and modular hand avatars to cater to diverse user needs. Collectively, these approaches represent a multifaceted strategy to enhance realism, interactivity, and user satisfaction in VR hand-object interactions.

Table 10 Potential approaches to address key questions in bio-inspired hand-object interactions in VR

In summary, advancing bio-inspired hand-object interactions in VR demands a multifaceted approach that blends technical innovation, biomechanical integration, and empirical validation. The proposed methodologies encompass developing realistic virtual hand models, enhancing interaction fidelity through hybrid models and data validation, and addressing computational challenges with advanced machine learning and AI techniques. This multifaceted approach is expected to boost the realism, functionality, and user experience of hand-object interactions in VR, laying the groundwork for more immersive and practical applications across various domains requiring precision manual skills, as noted above in Table 1.

11 Concluding remarks

In conclusion, this position paper has presented a comprehensive framework for bio-inspired approaches to enhance hand-object interactions in VR for precision manual tasks. We have proposed novel approaches for improving virtual hand models, collision detection, and grasp release mechanisms by synthesizing biomechanics, neuroscience, and psychophysics insights. Our key contributions include a unified theoretical framework for bio-inspired VR interactions, strategies for optimizing the trade-off between computational complexity and realism, and detailed applications demonstrating the potential impact of these developments across diverse domains. Future research will focus on empirical validation through rigorous user studies, developing adaptive algorithms for personalized interactions, integrating advanced haptic feedback systems, and exploring cross-modal sensory integration techniques. Additionally, longitudinal studies on skill acquisition and transfer, computational optimization for real-time performance, and the incorporation of advanced neurocognitive models will be crucial. We aim to advance state-of-the-art VR technology by pursuing these research avenues, ultimately facilitating more immersive, realistic, and effective virtual environments for precision manual tasks. This work has significant implications for medical training, industrial design, and rehabilitation, potentially impacting how we interact with virtual objects and acquire complex motor skills in simulated environments.