Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3379337.3415835acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

GrabAR: Occlusion-aware Grabbing Virtual Objects in AR

Published: 20 October 2020 Publication History

Abstract

Existing augmented reality (AR) applications often ignore the occlusion between real hands and virtual objects when incorporating virtual objects in user's views. The challenges come from the lack of accurate depth and mismatch between real and virtual depth. This paper presents GrabAR1, a new approach that directly predicts the real-and-virtual occlusion and bypasses the depth acquisition and inference. Our goal is to enhance AR applications with interactions between hand (real) and grabbable objects (virtual). With paired images of hand and object as inputs, we formulate a compact deep neural network that learns to generate the occlusion mask. To train the network, we compile a large dataset, including synthetic data and real data. We then embed the trained network in a prototyping AR system to support real-time grabbing of virtual objects. Further, we demonstrate the performance of our method on various virtual objects, compare our method with others through two user studies, and showcase a rich variety of interaction scenarios, in which we can use bare hand to grab virtual objects and directly manipulate them.

Supplementary Material

VTT File (ufp3996pv.vtt)
VTT File (ufp3996vf.vtt)
VTT File (3379337.3415835.vtt)
SRT File (ufp3996pvc.srt)
Preview video captions
SRT File (ufp3996vfc.srt)
Video figure captions
ZIP File (ufp3996aux.zip)
The supplemental material contains one pdf file that covers the details mentioned in the main paper.
MP4 File (ufp3996pv.mp4)
Preview video
MP4 File (ufp3996vf.mp4)
Video figure
MP4 File (3379337.3415835.mp4)
Presentation Video

References

[1]
Barton L. Anderson. 2003. The role of occlusion in the perception of depth, lightness, and opacity. Psychological Review 110, 4 (2003), 785--801.
[2]
Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2019. Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In CVPR. 1067--1076.
[3]
Caterina Battisti, Stefano Messelodi, and Fabio Poiesi. 2018. Seamless bare-hand interaction in mixed reality. In ISMAR. 198--203.
[4]
Hrvoje Benko and Steven Feiner. 2007. Balloon selection: A multi-finger technique for accurate low-fatigue 3D selection. In 2007 IEEE Symposium on 3D User Interfaces.
[5]
Adnane Boukhayma, Rodrigo de Bem, and Philip H.S. Torr. 2019. 3D hand shape and pose from images in the wild. In CVPR. 10843--10852.
[6]
Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. 2018. Weakly-supervised 3D hand pose estimation from monocular RGB images. In ECCV. 666--682.
[7]
Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. GCNet: Non-Local networks meet squeeze-excitation Networks and beyond. In ICCV Workshops. 1971--1980.
[8]
Thomas P. Caudell and David W. Mizell. 1992. Augmented reality: An application of heads-up display technology to manual manufacturing processes. In Hawaii Iternational Conference on System Sciences, Vol. 2. 659--669.
[9]
Olivier Chapelle and Mingrui Wu. 2010. Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval 13, 3 (2010), 216--235.
[10]
Junyeong Choi, Jungsik Park, Hanhoon Park, and Jong-II Park. 2013. iHand: An interactive bare-hand-based augmented reality interface on commercial mobile phones. Optical Engineering 52, 2 (2013), 1--11.
[11]
Wendy H. Chun and Tobias Höllerer. 2013. Real-time hand interaction for augmented reality on mobile phones. In IUI. 307--314.
[12]
Klaus Dorfmuller-Ulhaas and Dieter Schmalstieg. 2001. Finger tracking for interaction in augmented environments. In ISMAR. 55--64.
[13]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In NIPS. 2366--2374.
[14]
Jakob Engel, Thomas Schöps, and Daniel Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In ECCV. 834--849.
[15]
Thomas Feix, Javier Romero, Heinz-Bodo Schmiedmayer, Aaron M. Dollar, and Danica Kragic. 2015. The grasp taxonomy of human grasp types. IEEE Transactions on human-machine systems 46, 1 (2015), 66--77.
[16]
Qi Feng, Hubert P. H. Shum, and Shigeo Morishima. 2018. Resolving occlusion for 3D object manipulation with hands in mixed reality. In Poster of the ACM Symposium on Virtual Reality Software and Technology (VRST). 1--2.
[17]
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In CVPR. 2002--2011.
[18]
Ryo Furukawa, Ryusuke Sagawa, and Hiroshi Kawasaki. 2017. Depth estimation using structured light flow--analysis of projected pattern flow on an oject's surface. In ICCV. 4640--4648.
[19]
Sergio Garrido-Jurado, Rafael Muñoz-Salinas, Francisco José Madrid-Cuevas, and Manuel Jesús Marín-Jiménez. 2014. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292.
[20]
Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 2019. 3D hand shape and pose estimation from a single RGB image. In CVPR. 10833--10842.
[21]
Derek Hoiem, Andrew N. Stein, Alexei A. Efros, and Martial Hebert. 2007. Recovering occlusion boundaries from a single image. In ICCV. 1--8.
[22]
Aleksander Holynski and Johannes Kopf. 2018. Fast depth densification for occlusion-aware augmented reality. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (Dec. 2018), 194:1--194:11.
[23]
Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, and Minje Park. 2016. PVAnet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016).
[24]
Myron W Krueger, Thomas Gionfriddo, and Katrin Hinrichsen. 1985. VIDEOPLACE'an artificial reality. In ACM SIGCHI Bulletin, Vol. 16. 35--40.
[25]
Lubor Ladicky, Jianbo Shi, and Marc Pollefeys. 2014. Pulling things out of perspective. In CVPR. 89--96.
[26]
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 3DV. 239--248.
[27]
LeapMotion. [Online; accessed on 13-August-2019]. https://www.leapmotion.com/. ([Online; accessed on 13-August-2019]).
[28]
Jameel Malik, Ahmed Elhayek, Fabrizio Nunnari, Kiran Varanasi, Kiarash Tamaddon, Alexis Heloir, and Didier Stricker. 2018. DeepHPS: End-to-end estimation of 3D hand pose and shape by learning from synthetic depth. In 3DV. 110--119.
[29]
Nicolai Marquardt, Ricardo Jota, Saul Greenberg, and Joaquim A. Jorge. 2011. The continuous interaction space: interaction techniques unifying touch and gesture on and above a digital surface. In IFIP Conference on Human-Computer Interaction. 461--476.
[30]
Kenneth R. Moser, Sujan Anreddy, and J. Edward Swan. 2016. Calibration and interaction in optical see-through augmented reality using leap motion. In IEEE Virtual Reality (VR). 332--332.
[31]
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3D hand tracking from monocular RGB. In CVPR. 49--59.
[32]
Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In ICCV. 1284--1293.
[33]
Raul Mur-Artal, Jose Maria Martinez Montiel, and Juan D. Tardos. 2015. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics 31, 5 (2015), 1147--1163.
[34]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483--499.
[35]
Vassilis C Nicodemou, Iason Oikonomidis, Georgios Tzimiropoulos, and Antonis Argyros. 2018. Learning to infer the depth Map of a hand from its color image. arXiv preprint arXiv:1812.02486 (2018).
[36]
Hiroshi Ono, Brian J. Rogers, Masao Ohmi, and Mika E. Ono. 1988. Dynamic occlusion and motion parallax in depth perception. Perception 17, 2 (1988), 255--266.
[37]
Paschalis Panteleris, Iason Oikonomidis, and Antonis Argyros. 2018. Using a single RGB frame for real time 3D hand pose estimation in the wild. In WACV. 436--445.
[38]
Rafael Radkowski and Christian Stritzke. 2012. Interactive hand gesture-based assembly for augmented reality applications. In International Conference on Advances in Computer-Human Interactions. 303--308.
[39]
Xiaofeng Ren, Charless C. Fowlkes, and Jitendra Malik. 2006. Figure/ground assignment in natural images. In ECCV. 614--627.
[40]
Grégory Rogez, James S. Supancic, and Deva Ramanan. 2015. First-person pose recognition using egocentric workspaces. In CVPR. 4325--4333.
[41]
Ashutosh Saxena, Sung H. Chung, and Andrew Y. Ng. 2006. Learning depth from single monocular images. In NIPS. 1161--1168.
[42]
Ashutosh Saxena, Min Sun, and Andrew Y. Ng. 2008. Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2008), 824--840.
[43]
Jie Song, Fabrizio Pece, Gábor Sörös, Marion Koelle, and Otmar Hilliges. 2015. Joint estimation of 3D hand position and gestures from monocular video for mobile interaction. In CHI. 3657--3660.
[44]
Ching Teo, Cornelia Fermuller, and Yiannis Aloimonos. 2015. Fast 2D border ownership assignment. In ICCV. 5117--5125.
[45]
Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk, Michael John Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Lee Turner, Ivan Dryanovski, Joao Afonso, Jose Pascoal, Konstantine Nicholas John Tsotsos, Mira Angela Leung, Mirko Schmidt, Onur Gonen Guleryuz, Sameh Khamis, Vladimir Tankovich, Sean Fanello, Shahram Izadi, and Christoph Rhemann. 2018. Depth from motion for smartphone AR. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (2018), 1--19.
[46]
Guoxia Wang, Xiaochuan Wang, Frederick W. B. Li, and Xiaohui Liang. 2018b. DOOBNet: Deep object occlusion boundary detection from an image. In ACCV. 686--702.
[47]
Peng Wang and Alan Yuille. 2016. DOC: Deep occlusion estimation from a single image. In ECCV. 545--561.
[48]
Robert Y. Wang and Jovan Popovi´c. 2009. Real-time hand-tracking with a color glove. ACM Transactions on Graphics (SIGGRAPH) 28, 3 (2009), 63.
[49]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local neural networks. In CVPR. 7794--7803.
[50]
Qi Ye and Tae-Kyun Kim. 2018. Occlusion-aware hand pose estimation using hierarchical mixture density network. In ECCV. 801--817.
[51]
Shanxin Yuan, Guillermo Garcia-Hernando, Björn Stenger, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, Pavlo Molchanov, Jan Kautz, Sina Honari, Liuhao Ge, and others. 2018. Depth-based 3D hand pose estimation: From current achievements to future goals. In CVPR. 2636--2645.
[52]
Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3D hand pose from single RGB images. In ICCV. 4903--4911.
[53]
Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. 2019. FreiHAND: A dataset for markerless capture of hand pose and shape from single RGB images. In ICCV. 813--822.

Cited By

View all
  • (2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
  • (2024)Above-Screen Fingertip Tracking and Hand Representation for Precise Touch Input with a Phone in Virtual RealityProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670961(1-15)Online publication date: 3-Jun-2024
  • (2024)GradualReality: Enhancing Physical Object Interaction in Virtual Reality via Interaction State-Aware BlendingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676463(1-14)Online publication date: 13-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UIST '20: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology
October 2020
1297 pages
ISBN:9781450375146
DOI:10.1145/3379337
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. augmented reality
  2. interaction
  3. neural network
  4. occlusion

Qualifiers

  • Research-article

Funding Sources

  • The Israel Science Foundation
  • The Research Grants Council of the Hong Kong Special Administrative Region

Conference

UIST '20

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)13
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Virtual Task Environments Factors Explored in 3D Selection StudiesProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670983(1-16)Online publication date: 3-Jun-2024
  • (2024)Above-Screen Fingertip Tracking and Hand Representation for Precise Touch Input with a Phone in Virtual RealityProceedings of the 50th Graphics Interface Conference10.1145/3670947.3670961(1-15)Online publication date: 3-Jun-2024
  • (2024)GradualReality: Enhancing Physical Object Interaction in Virtual Reality via Interaction State-Aware BlendingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676463(1-14)Online publication date: 13-Oct-2024
  • (2024)Effects of Hand Occlusion in Radial Mid-Air Menu Interaction in Augmented Reality2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00106(551-558)Online publication date: 16-Mar-2024
  • (2023)CAFI-ARProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35694996:4(1-23)Online publication date: 11-Jan-2023
  • (2023)Above-Screen Fingertip Tracking with a Phone in Virtual RealityExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585728(1-7)Online publication date: 19-Apr-2023
  • (2023)When XR and AI Meet - A Scoping Review on Extended Reality and Artificial IntelligenceProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581072(1-45)Online publication date: 19-Apr-2023
  • (2023)Integrating Both Parallax and Latency Compensation into Video See-through Head-mounted DisplayIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324746029:5(2826-2836)Online publication date: 1-May-2023
  • (2023)Occlusion Handling in Augmented Reality: Past, Present and FutureIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311786629:2(1590-1609)Online publication date: 1-Feb-2023
  • (2023)Virtual Occlusions Through Implicit Depth2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00874(9053-9064)Online publication date: Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media