research-article

Learning Gaze-aware Compositional GAN from Limited Annotations

Authors:

Nerea Aranjuelo,

Ignacio Arganda-Carreras,

Oihana Otaegui,

Hanspeter Pfister,

Donglai WeiAuthors Info & Claims

Proceedings of the ACM on Computer Graphics and Interactive Techniques, Volume 7, Issue 2

Article No.: 28, Pages 1 - 17

https://doi.org/10.1145/3654706

Published: 17 May 2024 Publication History

Abstract

Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.

References

[1]

Ahmed A Abdelrahman, Thorsten Hempel, Aly Khalifa, and Ayoub Al-Hamadi. 2022. L2CS-Net: fine-grained gaze estimation in unconstrained environments. arXiv preprint arXiv:2203.03339 (2022).

[2]

Alexander Amini, Ava P Soleimany, Wilko Schwarting, Sangeeta N Bhatia, and Daniela Rus. 2019. Uncovering and mitigating algorithmic bias through learned latent structure. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 289--295.

Digital Library

[3]

Nuri Murat Arar, Hua Gao, and Jean-Philippe Thiran. 2016. A regression-based user calibration framework for real-time gaze estimation. IEEE Transactions on Circuits and Systems for Video Technology 27, 12 (2016), 2623--2638.

Digital Library

[4]

Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, and Matthias Grundmann. 2019. Blazeface: Sub-millisecond neural face detection on mobile gpus. arXiv preprint arXiv:1907.05047 (2019).

[5]

Sagie Benaim and Lior Wolf. 2018. One-shot unsupervised cross domain translation. advances in neural information processing systems 31 (2018).

[6]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).

[7]

Jingjing Chen, Jichao Zhang, Enver Sangineto, Tao Chen, Jiayuan Fan, and Nicu Sebe. 2021. Coarse-to-fine gaze redirection with numerical and pictorial guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3665--3674.

[8]

Yihua Cheng and Feng Lu. 2022. Gaze estimation using transformer. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 3341--3347.

[9]

Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2021. Appearance-based gaze estimation with deep learning: A review and benchmark. arXiv preprint arXiv:2104.12668 (2021).

[10]

Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. 2018. Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53--65.

[11]

Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5154--5163.

[12]

Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. 2016. Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European conference on computer vision. Springer, 311--326.

[13]

Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841--9850.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).

[16]

Takahiro Ishikawa. 2004. Passive driver gaze tracking with active appearance models. (2004).

[17]

Ali Jahanian, Lucy Chai, and Phillip Isola. 2019. On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171 (2019).

[18]

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852--863.

[19]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.

[20]

Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724 (2019).

[21]

Harsimran Kaur and Roberto Manduchi. 2020. Eyegan: Gaze-preserving, mask-mediated eye image synthesis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 310--319.

[22]

Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF international conference on computer vision. 6912--6921.

[23]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[24]

Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5549--5558.

[25]

Gengyan Li, Abhimitra Meka, Franziska Mueller, Marcel C Buehler, Otmar Hilliges, and Thabo Beeler. 2022. EyeNeRF: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--16.

Digital Library

[26]

David Masko. 2017. Calibration in eye tracking using transfer learning.

[27]

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481--3490.

[28]

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65 (2021), 99--106.

Digital Library

[29]

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.

Digital Library

[30]

Sergey I Nikolenko. 2021. Synthetic data for deep learning. Vol. 174. Springer.

[31]

Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, and Jan Kautz. 2019. Few-shot adaptive gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision. 9368--9377.

[32]

Sonia Porta, Benoit Bossavit, Rafael Cabeza, Andoni Larumbe-Bergera, Gonzalo Garde, and Arantxa Villanueva. 2019. U2Eyes: A binocular dataset for eye tracking and gaze estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.

[33]

Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data cards: Purposeful and transparent dataset documentation for responsible ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1776--1826.

Digital Library

[34]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in neural information processing systems 29 (2016).

[35]

Matan Sela, Pingmei Xu, Junfeng He, Vidhya Navalpakkam, and Dmitry Lagun. 2017. Gazegan-unpaired adversarial image generation for gaze estimation. arXiv preprint arXiv:1711.09767 (2017).

[36]

Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9243--9252.

[37]

Yichun Shi, Xiao Yang, Yangyue Wan, and Xiaohui Shen. 2022. SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11254--11264.

[38]

Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2107--2116.

[39]

Yezhi Shu, Ran Yi, Mengfei Xia, Zipeng Ye, Wang Zhao, Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. 2021. Gan-based multi-style photo cartoonization. IEEE Transactions on Visualization and Computer Graphics 28, 10 (2021), 3376--3390.

[40]

Neelabh Sinha, Michal Balazia, and François Bremond. 2021. FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation. In 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--8.

[41]

Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016. Learning an appearance-based gaze estimator from one million synthesised images. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. 131--138.

Digital Library

[42]

Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Wensen Feng. 2020. Controllable continuous gaze redirection. In Proceedings of the 28th ACM International Conference on Multimedia. 1782--1790.

Digital Library

[43]

Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, and Nicu Sebe. 2019. Gazecorrection: Self-guided eye manipulation in the wild using self-supervised generative adversarial networks. arXiv preprint arXiv:1906.00805 (2019).

[44]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

[45]

Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. 2020. Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In European Conference on Computer Vision. Springer, 365--381.

Digital Library

[46]

Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4511--4520.

Index Terms

Learning Gaze-aware Compositional GAN from Limited Annotations
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques

Recommendations

Gaze estimation with semi-supervised eye landmark detection as an auxiliary task
Abstract
The changes in gaze are often reflected in the movements of eye landmarks, highlighting the relevance of eye landmark learning for accurate gaze estimation. To leverage eye landmarks, we propose a gaze estimation framework that incorporates eye ...
Highlights
- We leverage eye landmark detection as an auxiliary task to assist gaze estimation.
- We apply semi-supervised learning to learn eye landmarks on the real-world data.
- The gaze estimation performance is improved by leveraging eye ...
Vicsgaze: a gaze estimation method using self-supervised contrastive learning
Abstract
Existing deep learning-based gaze estimation methods achieved high accuracy, and the prerequisite for ensuring their performance is large-scale datasets with gaze labels. However, collecting large-scale gaze datasets is time-consuming and ...
Gaze from Head: Gaze Estimation Without Observing Eye
Pattern Recognition
Abstract
We propose a gaze estimation method not from eye observation but from head motion. This proposed method is based on physiological studies about the eye-head coordination, and the gaze direction is estimated from observation of head motion by using ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Computer Graphics and Interactive Techniques

Proceedings of the ACM on Computer Graphics and Interactive Techniques Volume 7, Issue 2

May 2024

101 pages

EISSN:2577-6193

DOI:10.1145/3665652

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Published in PACMCGIT Volume 7, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSF (National Science Foundation)-IIS
University of the Basque Country
NSF (National Science Foundation)
Basque Government

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
110
Total Downloads

Downloads (Last 12 months)110
Downloads (Last 6 weeks)20

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents