Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Learning Gaze-aware Compositional GAN from Limited Annotations

Published: 17 May 2024 Publication History

Abstract

Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.

References

[1]
Ahmed A Abdelrahman, Thorsten Hempel, Aly Khalifa, and Ayoub Al-Hamadi. 2022. L2CS-Net: fine-grained gaze estimation in unconstrained environments. arXiv preprint arXiv:2203.03339 (2022).
[2]
Alexander Amini, Ava P Soleimany, Wilko Schwarting, Sangeeta N Bhatia, and Daniela Rus. 2019. Uncovering and mitigating algorithmic bias through learned latent structure. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 289--295.
[3]
Nuri Murat Arar, Hua Gao, and Jean-Philippe Thiran. 2016. A regression-based user calibration framework for real-time gaze estimation. IEEE Transactions on Circuits and Systems for Video Technology 27, 12 (2016), 2623--2638.
[4]
Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, and Matthias Grundmann. 2019. Blazeface: Sub-millisecond neural face detection on mobile gpus. arXiv preprint arXiv:1907.05047 (2019).
[5]
Sagie Benaim and Lior Wolf. 2018. One-shot unsupervised cross domain translation. advances in neural information processing systems 31 (2018).
[6]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
[7]
Jingjing Chen, Jichao Zhang, Enver Sangineto, Tao Chen, Jiayuan Fan, and Nicu Sebe. 2021. Coarse-to-fine gaze redirection with numerical and pictorial guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3665--3674.
[8]
Yihua Cheng and Feng Lu. 2022. Gaze estimation using transformer. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 3341--3347.
[9]
Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2021. Appearance-based gaze estimation with deep learning: A review and benchmark. arXiv preprint arXiv:2104.12668 (2021).
[10]
Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. 2018. Generative adversarial networks: An overview. IEEE signal processing magazine 35, 1 (2018), 53--65.
[11]
Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5154--5163.
[12]
Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor Lempitsky. 2016. Deepwarp: Photorealistic image resynthesis for gaze manipulation. In European conference on computer vision. Springer, 311--326.
[13]
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841--9850.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[15]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
[16]
Takahiro Ishikawa. 2004. Passive driver gaze tracking with active appearance models. (2004).
[17]
Ali Jahanian, Lucy Chai, and Phillip Isola. 2019. On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171 (2019).
[18]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852--863.
[19]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.
[20]
Yury Kartynnik, Artsiom Ablavatski, Ivan Grishchenko, and Matthias Grundmann. 2019. Real-time facial surface geometry from monocular video on mobile GPUs. arXiv preprint arXiv:1907.06724 (2019).
[21]
Harsimran Kaur and Roberto Manduchi. 2020. Eyegan: Gaze-preserving, mask-mediated eye image synthesis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 310--319.
[22]
Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically unconstrained gaze estimation in the wild. In Proceedings of the IEEE/CVF international conference on computer vision. 6912--6921.
[23]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[24]
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5549--5558.
[25]
Gengyan Li, Abhimitra Meka, Franziska Mueller, Marcel C Buehler, Otmar Hilliges, and Thabo Beeler. 2022. EyeNeRF: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--16.
[26]
David Masko. 2017. Calibration in eye tracking using transfer learning.
[27]
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge?. In International conference on machine learning. PMLR, 3481--3490.
[28]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65 (2021), 99--106.
[29]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.
[30]
Sergey I Nikolenko. 2021. Synthetic data for deep learning. Vol. 174. Springer.
[31]
Seonwook Park, Shalini De Mello, Pavlo Molchanov, Umar Iqbal, Otmar Hilliges, and Jan Kautz. 2019. Few-shot adaptive gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision. 9368--9377.
[32]
Sonia Porta, Benoit Bossavit, Rafael Cabeza, Andoni Larumbe-Bergera, Gonzalo Garde, and Arantxa Villanueva. 2019. U2Eyes: A binocular dataset for eye tracking and gaze estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0--0.
[33]
Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data cards: Purposeful and transparent dataset documentation for responsible ai. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1776--1826.
[34]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in neural information processing systems 29 (2016).
[35]
Matan Sela, Pingmei Xu, Junfeng He, Vidhya Navalpakkam, and Dmitry Lagun. 2017. Gazegan-unpaired adversarial image generation for gaze estimation. arXiv preprint arXiv:1711.09767 (2017).
[36]
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9243--9252.
[37]
Yichun Shi, Xiao Yang, Yangyue Wan, and Xiaohui Shen. 2022. SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11254--11264.
[38]
Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2107--2116.
[39]
Yezhi Shu, Ran Yi, Mengfei Xia, Zipeng Ye, Wang Zhao, Yang Chen, Yu-Kun Lai, and Yong-Jin Liu. 2021. Gan-based multi-style photo cartoonization. IEEE Transactions on Visualization and Computer Graphics 28, 10 (2021), 3376--3390.
[40]
Neelabh Sinha, Michal Balazia, and François Bremond. 2021. FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation. In 2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--8.
[41]
Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. 2016. Learning an appearance-based gaze estimator from one million synthesised images. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. 131--138.
[42]
Weihao Xia, Yujiu Yang, Jing-Hao Xue, and Wensen Feng. 2020. Controllable continuous gaze redirection. In Proceedings of the 28th ACM International Conference on Multimedia. 1782--1790.
[43]
Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, and Nicu Sebe. 2019. Gazecorrection: Self-guided eye manipulation in the wild using self-supervised generative adversarial networks. arXiv preprint arXiv:1906.00805 (2019).
[44]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.
[45]
Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. 2020. Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. In European Conference on Computer Vision. Springer, 365--381.
[46]
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4511--4520.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Computer Graphics and Interactive Techniques
Proceedings of the ACM on Computer Graphics and Interactive Techniques  Volume 7, Issue 2
May 2024
101 pages
EISSN:2577-6193
DOI:10.1145/3665652
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024
Published in PACMCGIT Volume 7, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DNN
  2. GAN
  3. Gaze estimation
  4. domain transfer
  5. generative
  6. synthetic data

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 110
    Total Downloads
  • Downloads (Last 12 months)110
  • Downloads (Last 6 weeks)20
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media