Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3488162.3488218acmotherconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

Parallax Engine: Head Controlled Motion Parallax Using Notebooks’ RGB Camera

Published: 03 January 2022 Publication History

Abstract

Research on the Fish Tank Virtual Reality (FTVR) technique commonly uses specific sensors (e.g. infrared cameras and LEDs on glasses) to estimate user’s eye position. However, estimating the face position with an RGB camera is becoming more accessible. In this work, we explore community available face characteristics detection software to implement the FTVR technique for everyday uses of 3D-enabled applications on consumer notebooks without requiring extra devices. We introduce the Parallax Engine solution that can be added with ease to any Unity game engine application. The solution supports two parallax-related visualization options: 1) a monoscopic FTVR mode (FishTank), which locks the virtual camera of the 3D environment to the laptop’s screen 2) and a 2D parallax mode (Parallax2DoF), which allows horizontal and vertical displacement of 3D scene camera. Regarding face characteristics detection techniques, the Parallax Engine uses a standardized interface that can receive input from different methods and currently supports three options: Google’s MediaPipe, dlib, and PoseNet. We evaluated the proposed solution with five users, performing tasks using different options for viewing and face characteristics detection, aiming to understand how suitable it is for end-users. Besides some detection failures from dlib, results showed an overall good acceptance for both the FishTank and Parallax2DoF visualization options.

Supplementary Material

MP4 File (short_video_ParallaxEngineHeadControlledMotionParallaxUsingNotebooksRGBCamera.mp4)
Supplemental video

References

[1]
Brian Amberg, Reinhard Knothe, and Thomas Vetter. 2008. Expression invariant 3D face recognition with a morphable model. In 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 1–6.
[2]
Volker Blanz, Kristina Scherbaum, and Hans-Peter Seidel. 2007. Fitting a morphable model to 3D scans of faces. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
[3]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. 187–194.
[4]
Timo Bolkart and Stefanie Wuhrer. 2013. Statistical analysis of 3d faces in motion. In 2013 International conference on 3D vision-3DV 2013. IEEE, 103–110.
[5]
Mark F Bradshaw, Andrew D Parton, and Andrew Glennerster. 2000. The task-dependent use of binocular disparity and motion parallax information. Vision Research 40, 27 (2000), 3725–3734. https://doi.org/10.1016/S0042-6989(00)00214-5
[6]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008(2018).
[7]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Yu Chen, Chunhua Shen, Hao Chen, Xiu-Shen Wei, Lingqiao Liu, and Jian Yang. 2020. Adversarial Learning of Structure-Aware Fully Convolutional Networks for Landmark Localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 7(2020), 1654–1669. https://doi.org/10.1109/TPAMI.2019.2901875
[9]
Neil Dodgson. 2004. Variation and extrema of human interpupillary distance. Stereoscopic Displays and Virtual Reality Syst XI 5291, 36–46. https://doi.org/10.1117/12.529999
[10]
Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style Aggregated Network for Facial Landmark Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
Dylan Fafard, Ian Stavness, Martin Dechant, Regan Mandryk, Qian Zhou, and Sidney Fels. 2019. Ftvr in vr: Evaluation of 3d perception with a simulated volumetric fish-tank virtual reality display. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[13]
Dylan Brodie Fafard, Qian Zhou, Chris Chamberlain, Georg Hagemann, Sidney Fels, and Ian Stavness. 2018. Design and implementation of a multi-person fish-tank virtual reality display. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology. 1–9.
[14]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2020. Learning an animatable detailed 3D face model from in-the-wild images. arXiv preprint arXiv:2012.04012(2020).
[15]
Lucas S Figueiredo, Edvar Vilar Neto, Ermano Arruda, João Marcelo Teixeira, and Veronica Teichrieb. 2014. Fishtank everywhere: Improving viewing experience over 3D content. In International Conference of Design, User Experience, and Usability. Springer, 560–571.
[16]
James J. Gibson. 1979. The Ecological Approach to Visual Perception (1st ed.). Houghton Mifflin, Boston. 346 pages.
[17]
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z Li. 2020. Towards fast, accurate and stable 3d dense face alignment. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer, 152–168.
[18]
Harvey J Howard. 1919. A test for the judgment of distance. Transactions of the American Ophthalmological Society 17 (1919), 195.
[19]
Thibaut Jacob, Gilles Bailly, Eric Lecolinet, Géry Casiez, and Marc Teyssier. 2016. Desktop orbital camera motions using rotational head movements. In Proceedings of the 2016 Symposium on Spatial User Interaction. 139–148.
[20]
Youngkyoon Jang, Hatice Gunes, and Ioannis Patras. 2019. Registration-free face-ssd: Single shot analysis of smiles, facial attributes, and affect in the wild. Computer Vision and Image Understanding 182 (2019), 17–29.
[21]
Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Differentiable hierarchical graph grouping for multi-person pose estimation. In European Conference on Computer Vision. Springer, 718–734.
[22]
Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, and Ping Luo. 2020. Whole-body human pose estimation in the wild. In European Conference on Computer Vision. Springer, 196–214.
[23]
Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).
[24]
Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724(2019).
[25]
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1867–1874.
[26]
Petr Kellnhofer, Piotr Didyk, Tobias Ritschel, Belen Masia, Karol Myszkowski, and Hans-Peter Seidel. 2016. Motion parallax in stereo 3D: Model and applications. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1–12.
[27]
Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758.
[28]
Sirisilp Kongsilp and Matthew N Dailey. 2017. Communication portals: Immersive communication for everyday life. In 2017 20th Conference on Innovations in Clouds, Internet and Networks (ICIN). IEEE, 226–228.
[29]
Sirisilp Kongsilp and Matthew N Dailey. 2017. Motion parallax from head movement enhances stereoscopic displays by improving presence and decreasing visual fatigue. Displays 49(2017), 72–79.
[30]
Sirisilp Kongsilp and Matthew N Dailey. 2020. User Behavior and the Importance of Stereo for Depth Perception in Fish Tank Virtual Reality. PRESENCE: Virtual and Augmented Reality 27, 2 (2020), 206–225.
[31]
Alexandros Lattas, Stylianos Moschoglou, Baris Gecer, Stylianos Ploumpis, Vasileios Triantafyllou, Abhijeet Ghosh, and Stefanos Zafeiriou. 2020. AvatarMe: Realistically Renderable 3D Facial Reconstruction” In-the-Wild”. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 760–769.
[32]
Jia Li, Wen Su, and Zengfu Wang. 2020. Simple pose: Rethinking and improving a bottom-up approach for multi-person pose estimation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11354–11361.
[33]
Yong-Lu Li, Xinpeng Liu, Han Lu, Shiyi Wang, Junqi Liu, Jiefeng Li, and Cewu Lu. 2020. Detailed 2d-3d joint representation for human-object interaction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10166–10175.
[34]
Jiangke Lin, Yi Yuan, and Zhengxia Zou. 2021. MeInGame: Create a Game Character Face from a Single Portrait. arXiv preprint arXiv:2102.02371(2021).
[35]
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, 2019. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172(2019).
[36]
George Papandreou, Tyler Zhu, Liang-Chieh Chen, Spyros Gidaris, Jonathan Tompson, and Kevin Murphy. 2018. PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model. arxiv:1803.08225 [cs.CV]
[37]
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards Accurate Multi-person Pose Estimation in the Wild. arxiv:1701.01779 [cs.CV]
[38]
Ankur Patel and William AP Smith. 2009. 3d morphable face models revisited. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1327–1334.
[39]
Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). IEEE, 319–330.
[40]
Eduardo Rodrigues, Lucas Silva Figueiredo, Lucas Maggi, Edvar Neto, Layon Tavares Bezerra, João Marcelo Teixeira, and Veronica Teichrieb. 2017. Mixed Reality TVs: Applying Motion Parallax for Enhanced Viewing and Control Experiences on Consumer TVs. In 2017 19th Symposium on Virtual and Augmented Reality (SVR). 319–330. https://doi.org/10.1109/SVR.2017.48
[41]
Jiaxiang Shang, Tianwei Shen, Shiwei Li, Lei Zhou, Mingmin Zhen, Tian Fang, and Long Quan. 2020. Self-supervised monocular 3d face reconstruction by occlusion-aware multi-view geometry consistency. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer, 53–70.
[42]
Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 648–656.
[43]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653–1660.
[44]
M Alex O Vasilescu and Demetri Terzopoulos. 2002. Multilinear analysis of image ensembles: Tensorfaces. In European conference on computer vision. Springer, 447–460.
[45]
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. 2006. Face transfer with multilinear models. In ACM SIGGRAPH 2006 Courses. 24–es.
[46]
Collin Ware, Kevin Arthur, and Kellogg S. Booth. 1993. Fish tank virtual reality. In Conference on Human Factors in Computing Systems - Proceedings. 37–42. https://doi.org/10.1145/169059.169066
[47]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4724–4732.
[48]
E Wright, PE Connolly, M Sackley, J McCollom, S Malek, K Fan, 2012. A comparative analysis of Fish Tank Virtual Reality to stereoscopic 3D imagery. In 67th Midyear Meeting Proceedings. 37–45.
[49]
Yue Wu, Chao Gou, and Qiang Ji. 2017. Simultaneous Facial Landmark Detection, Pose and Deformation Estimation Under Facial Occlusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50]
Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, and Ce Zhu. 2020. Distribution-aware coordinate representation for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7093–7102.
[51]
Jialiang Zhang, Lixiang Lin, Jianke Zhu, and Steven CH Hoi. 2021. Weakly-Supervised Multi-Face 3D Reconstruction. arXiv preprint arXiv:2101.02000(2021).
[52]
Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust Facial Landmark Detection via Occlusion-Adaptive Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53]
Xiangyu Zhu, Zhen Lei, Junjie Yan, Dong Yi, and Stan Z. Li. 2015. High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54]
Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, and Ying Wu. 2019. Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

Cited By

View all
  • (2023)Digital Wah-Wah Guitar Effect Controlled by Mouth MovementsComputer Vision and Graphics10.1007/978-3-031-22025-8_3(31-39)Online publication date: 11-Feb-2023

Index Terms

  1. Parallax Engine: Head Controlled Motion Parallax Using Notebooks’ RGB Camera
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SVR '21: Proceedings of the 23rd Symposium on Virtual and Augmented Reality
    October 2021
    196 pages
    ISBN:9781450395526
    DOI:10.1145/3488162
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 January 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Fish tank virtual reality
    2. face characteristics detection
    3. motion parallax

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SVR'21
    SVR'21: Symposium on Virtual and Augmented Reality
    October 18 - 21, 2021
    Virtual Event, Brazil

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Digital Wah-Wah Guitar Effect Controlled by Mouth MovementsComputer Vision and Graphics10.1007/978-3-031-22025-8_3(31-39)Online publication date: 11-Feb-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media