research-article

GrabAR: Occlusion-aware Grabbing Virtual Objects in AR

Authors:

Daniel Cohen-OrAuthors Info & Claims

UIST '20: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

Pages 697 - 708

https://doi.org/10.1145/3379337.3415835

Published: 20 October 2020 Publication History

Abstract

Existing augmented reality (AR) applications often ignore the occlusion between real hands and virtual objects when incorporating virtual objects in user's views. The challenges come from the lack of accurate depth and mismatch between real and virtual depth. This paper presents GrabAR1, a new approach that directly predicts the real-and-virtual occlusion and bypasses the depth acquisition and inference. Our goal is to enhance AR applications with interactions between hand (real) and grabbable objects (virtual). With paired images of hand and object as inputs, we formulate a compact deep neural network that learns to generate the occlusion mask. To train the network, we compile a large dataset, including synthetic data and real data. We then embed the trained network in a prototyping AR system to support real-time grabbing of virtual objects. Further, we demonstrate the performance of our method on various virtual objects, compare our method with others through two user studies, and showcase a rich variety of interaction scenarios, in which we can use bare hand to grab virtual objects and directly manipulate them.

Supplementary Material

VTT File (ufp3996pv.vtt)

Download
.81 KB

VTT File (ufp3996vf.vtt)

Download
4.36 KB

VTT File (3379337.3415835.vtt)

Download
5.96 KB

SRT File (ufp3996pvc.srt)

Preview video captions

Download
.87 KB

SRT File (ufp3996vfc.srt)

Video figure captions

Download
4.74 KB

ZIP File (ufp3996aux.zip)

The supplemental material contains one pdf file that covers the details mentioned in the main paper.

Download
8.68 MB

MP4 File (ufp3996pv.mp4)

Preview video

Download
39.80 MB

MP4 File (ufp3996vf.mp4)

Video figure

Download
58.17 MB

MP4 File (3379337.3415835.mp4)

Presentation Video

Download
48.73 MB

References

[1]

Barton L. Anderson. 2003. The role of occlusion in the perception of depth, lightness, and opacity. Psychological Review 110, 4 (2003), 785--801.

[2]

Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2019. Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In CVPR. 1067--1076.

[3]

Caterina Battisti, Stefano Messelodi, and Fabio Poiesi. 2018. Seamless bare-hand interaction in mixed reality. In ISMAR. 198--203.

[4]

Hrvoje Benko and Steven Feiner. 2007. Balloon selection: A multi-finger technique for accurate low-fatigue 3D selection. In 2007 IEEE Symposium on 3D User Interfaces.

[5]

Adnane Boukhayma, Rodrigo de Bem, and Philip H.S. Torr. 2019. 3D hand shape and pose from images in the wild. In CVPR. 10843--10852.

[6]

Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. 2018. Weakly-supervised 3D hand pose estimation from monocular RGB images. In ECCV. 666--682.

[7]

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. GCNet: Non-Local networks meet squeeze-excitation Networks and beyond. In ICCV Workshops. 1971--1980.

[8]

Thomas P. Caudell and David W. Mizell. 1992. Augmented reality: An application of heads-up display technology to manual manufacturing processes. In Hawaii Iternational Conference on System Sciences, Vol. 2. 659--669.

[9]

Olivier Chapelle and Mingrui Wu. 2010. Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval 13, 3 (2010), 216--235.

Digital Library

[10]

Junyeong Choi, Jungsik Park, Hanhoon Park, and Jong-II Park. 2013. iHand: An interactive bare-hand-based augmented reality interface on commercial mobile phones. Optical Engineering 52, 2 (2013), 1--11.

[11]

Wendy H. Chun and Tobias Höllerer. 2013. Real-time hand interaction for augmented reality on mobile phones. In IUI. 307--314.

[12]

Klaus Dorfmuller-Ulhaas and Dieter Schmalstieg. 2001. Finger tracking for interaction in augmented environments. In ISMAR. 55--64.

[13]

David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In NIPS. 2366--2374.

[14]

Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV. 834--849.

[15]

Thomas Feix, Javier Romero, Heinz-Bodo Schmiedmayer, Aaron M. Dollar, and Danica Kragic. 2015. The grasp taxonomy of human grasp types. IEEE Transactions on human-machine systems 46, 1 (2015), 66--77.

[16]

Qi Feng, Hubert P. H. Shum, and Shigeo Morishima. 2018. Resolving occlusion for 3D object manipulation with hands in mixed reality. In Poster of the ACM Symposium on Virtual Reality Software and Technology (VRST). 1--2.

Digital Library

[17]

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In CVPR. 2002--2011.

[18]

Ryo Furukawa, Ryusuke Sagawa, and Hiroshi Kawasaki. 2017. Depth estimation using structured light flow--analysis of projected pattern flow on an oject's surface. In ICCV. 4640--4648.

[19]

Sergio Garrido-Jurado, Rafael Muñoz-Salinas, Francisco José Madrid-Cuevas, and Manuel Jesús Marín-Jiménez. 2014. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292.

Digital Library

[20]

Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 2019. 3D hand shape and pose estimation from a single RGB image. In CVPR. 10833--10842.

[21]

Derek Hoiem, Andrew N. Stein, Alexei A. Efros, and Martial Hebert. 2007. Recovering occlusion boundaries from a single image. In ICCV. 1--8.

[22]

Aleksander Holynski and Johannes Kopf. 2018. Fast depth densification for occlusion-aware augmented reality. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (Dec. 2018), 194:1--194:11.

[23]

Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, and Minje Park. 2016. PVAnet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016).

[24]

Myron W Krueger, Thomas Gionfriddo, and Katrin Hinrichsen. 1985. VIDEOPLACE'an artificial reality. In ACM SIGCHI Bulletin, Vol. 16. 35--40.

Digital Library

[25]

Lubor Ladicky, Jianbo Shi, and Marc Pollefeys. 2014. Pulling things out of perspective. In CVPR. 89--96.

[26]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 3DV. 239--248.

[27]

LeapMotion. [Online; accessed on 13-August-2019]. https://www.leapmotion.com/. ([Online; accessed on 13-August-2019]).

[28]

Jameel Malik, Ahmed Elhayek, Fabrizio Nunnari, Kiran Varanasi, Kiarash Tamaddon, Alexis Heloir, and Didier Stricker. 2018. DeepHPS: End-to-end estimation of 3D hand pose and shape by learning from synthetic depth. In 3DV. 110--119.

[29]

Nicolai Marquardt, Ricardo Jota, Saul Greenberg, and Joaquim A. Jorge. 2011. The continuous interaction space: interaction techniques unifying touch and gesture on and above a digital surface. In IFIP Conference on Human-Computer Interaction. 461--476.

[30]

Kenneth R. Moser, Sujan Anreddy, and J. Edward Swan. 2016. Calibration and interaction in optical see-through augmented reality using leap motion. In IEEE Virtual Reality (VR). 332--332.

[31]

Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3D hand tracking from monocular RGB. In CVPR. 49--59.

[32]

Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In ICCV. 1284--1293.

[33]

Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147--1163.

Digital Library

[34]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483--499.

[35]

Vassilis C Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos, and Antonis Argyros. 2018. Learning to infer the depth Map of a hand from its color image. arXiv preprint arXiv:1812.02486 (2018).

[36]

Hiroshi Ono, Brian J. Rogers, Masao Ohmi, and Mika E. Ono. 1988. Dynamic occlusion and motion parallax in depth perception. Perception 17, 2 (1988), 255--266.

[37]

Paschalis Panteleris, Iason Oikonomidis, and Antonis Argyros. 2018. Using a single RGB frame for real time 3D hand pose estimation in the wild. In WACV. 436--445.

[38]

Rafael Radkowski and Christian Stritzke. 2012. Interactive hand gesture-based assembly for augmented reality applications. In International Conference on Advances in Computer-Human Interactions. 303--308.

[39]

Xiaofeng Ren, Charless C. Fowlkes, and Jitendra Malik. 2006. Figure/ground assignment in natural images. In ECCV. 614--627.

[40]

Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In CVPR. 4325--4333.

[41]

Ashutosh Saxena, Sung H. Chung, and Andrew Y. Ng. 2006. Learning depth from single monocular images. In NIPS. 1161--1168.

[42]

Ashutosh Saxena, Min Sun, and Andrew Y. Ng. 2008. Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2008), 824--840.

Digital Library

[43]

Jie Song, Fabrizio Pece, Gábor Sörös, Marion Koelle, and Otmar Hilliges. 2015. Joint estimation of 3D hand position and gestures from monocular video for mobile interaction. In CHI. 3657--3660.

[44]

Ching Teo, Cornelia Fermuller, and Yiannis Aloimonos. 2015. Fast 2D border ownership assignment. In ICCV. 5117--5125.

[45]

Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk, Michael John Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Lee Turner, Ivan Dryanovski, Joao Afonso, Jose Pascoal, Konstantine Nicholas John Tsotsos, Mira Angela Leung, Mirko Schmidt, Onur Gonen Guleryuz, Sameh Khamis, Vladimir Tankovich, Sean Fanello, Shahram Izadi, and Christoph Rhemann. 2018. Depth from motion for smartphone AR. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (2018), 1--19.

Digital Library

[46]

Guoxia Wang, Xiaochuan Wang, Frederick W. B. Li, and Xiaohui Liang. 2018b. DOOBNet: Deep object occlusion boundary detection from an image. In ACCV. 686--702.

[47]

Peng Wang and Alan Yuille. 2016. DOC: Deep occlusion estimation from a single image. In ECCV. 545--561.

[48]

Robert Y. Wang and Jovan Popovi´c. 2009. Real-time hand-tracking with a color glove. ACM Transactions on Graphics (SIGGRAPH) 28, 3 (2009), 63.

Digital Library

[49]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local neural networks. In CVPR. 7794--7803.

[50]

Qi Ye and Tae-Kyun Kim. 2018. Occlusion-aware hand pose estimation using hierarchical mixture density network. In ECCV. 801--817.

[51]

Shanxin Yuan, Guillermo Garcia-Hernando, Björn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, and others. 2018. Depth-based 3D hand pose estimation: From current achievements to future goals. In CVPR. 2636--2645.

[52]

Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3D hand pose from single RGB images. In ICCV. 4903--4911.

[53]

Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. 2019. FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images. In ICCV. 813--822.

Cited By

Bashar MBatmaz A(2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670983
Matulic FKashima TBeker DSuzuo DFujiwara HVogel D(2024)Above-Screen Fingertip Tracking and Hand Representation for Precise Touch Input with a Phone in Virtual RealityProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670961(1-15)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670961
Seo HYi JBalan RLee Y(2024)GradualReality: Enhancing Physical Object Interaction in Virtual Reality via Interaction State-Aware BlendingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676463(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676463
Show More Cited By

Index Terms

GrabAR: Occlusion-aware Grabbing Virtual Objects in AR
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Mixed / augmented reality

Recommendations

Resolving occlusion in augmented reality
I3D '95: Proceedings of the 1995 symposium on Interactive 3D graphics

Current state-of-the-art augmented reality systems simply overlay computer-generated visuals on the real-world imagery, for example via video or optical see-through displays. However, overlays are not effective when displaying data in three dimensions, ...
Cooperatively Resolving Occlusion between Real and Virtual in Multiple Video Sequences
CHINAGRID '11: Proceedings of the 2011 Sixth Annual ChinaGrid Conference

The occlusion between real and virtual objects influences not only seamless merging of virtual and real environments but also users' visual perception of orientations & locations and spatial interactions in augmented reality. If there exist a large ...
An AR system on manipulating a virtual object with a bare hand
SA '16: SIGGRAPH ASIA 2016 Posters

For realizing natural interaction with a virtual object, bare hand interaction reduces the discomfort caused by devices mounted on the user's hand. There have been some studies on such bare hand interaction, however in these studies the shape of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '20: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

October 2020

1297 pages

ISBN:9781450375146

DOI:10.1145/3379337

General Chairs:
Shamsi Iqbal
Microsoft Research, USA
,
Karon MacLean
University of British Columbia, Canada
,
Program Chairs:
Fanny Chevalier
University of Toronto, Canada
,
Stefanie Mueller
MIT CSAIL, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Israel Science Foundation
The Research Grants Council of the Hong Kong Special Administrative Region

Conference

UIST '20

Sponsor:

UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology

October 20 - 23, 2020

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
773
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bashar MBatmaz A(2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670983
Matulic FKashima TBeker DSuzuo DFujiwara HVogel D(2024)Above-Screen Fingertip Tracking and Hand Representation for Precise Touch Input with a Phone in Virtual RealityProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670961(1-15)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3670947.3670961
Seo HYi JBalan RLee Y(2024)GradualReality: Enhancing Physical Object Interaction in Virtual Reality via Interaction State-Aware BlendingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676463(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676463
Feld NZielasko DWeyers B(2024)Effects of Hand Occlusion in Radial Mid-Air Menu Interaction in Augmented Reality2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00106(551-558)Online publication date: 16-Mar-2024
https://doi.org/10.1109/VRW62533.2024.00106
Tang XLi RFu C(2023)CAFI-ARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35694996:4(1-23)Online publication date: 11-Jan-2023
https://doi.org/10.1145/3569499
Matulic FKashima TBeker DSuzuo DFujiwara HVogel D(2023)Above-Screen Fingertip Tracking with a Phone in Virtual RealityExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585728(1-7)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3585728
Hirzle TMüller FDraxler FSchmitz MKnierim PHornbæk K(2023)When XR and AI Meet - A Scoping Review on Extended Reality and Artificial IntelligenceProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581072(1-45)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581072
Ishihara AAga HIshihara YIchikawa HKaji HKawasaki KKobayashi DKobayashi TNishida KHamasaki TMori HMorikubo Y(2023)Integrating Both Parallax and Latency Compensation into Video See-through Head-mounted DisplayIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324746029:5(2826-2836)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3247460
Macedo MApolinario A(2023)Occlusion Handling in Augmented Reality: Past, Present and FutureIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311786629:2(1590-1609)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TVCG.2021.3117866
Watson JSayed MQureshi ZBrostow GVicente SAodha OFirman M(2023)Virtual Occlusions Through Implicit Depth2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00874(9053-9064)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.00874
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten