Article

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Authors:

Chen Change Loy,

Ziwei LiuAuthors Info & Claims

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI

Pages 1 - 19

https://doi.org/10.1007/978-3-031-19787-1_1

Published: 23 October 2022 Publication History

Abstract

Unconditional human image generation is an important task in vision and graphics, enabling various applications in the creative industry. Existing studies in this field mainly focus on “network engineering” such as designing new components and objective functions. This work takes a data-centric perspective and investigates multiple critical aspects in “data engineering”, which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) Large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with a vanilla StyleGAN. 2) A balanced training set helps improve the generation quality with rare face poses compared to the long-tailed counterpart, whereas simply balancing the clothing texture distribution does not effectively bring an improvement. 3) Human GAN models that employ body centers for alignment outperform models trained using face centers or pelvis points as alignment anchors. In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community. Code and models are publicly available (Project page: https://stylegan-human.github.io/. Code and models: https://github.com/stylegan-human/StyleGAN-Human.)

References

[1]

Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: CVPR (2020)

[2]

Abdal R, Zhu P, Mitra NJ, and Wonka P StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows ACM TOG 2021 40 3 1-21

Digital Library

[3]

Albahar B, Lu J, Yang J, Shu Z, Shechtman E, and Huang JB Pose with style: detail-preserving pose-guided image synthesis with conditional StyleGAN ACM TOG 2021 40 6 1-11

Digital Library

[4]

Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)

[5]

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)

[6]

Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)

[7]

Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

[8]

Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. In: ICCV (2019)

[9]

Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

[10]

Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)

[11]

Chan, K.C., Wang, X., Xu, X., Gu, J., Loy, C.C.: GLEAN: generative latent bank for large-factor image super-resolution. In: CVPR (2021)

[12]

MMSegmentation Contributors: MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark (2020). https://github.com/open-mmlab/mmsegmentation

[13]

Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS (2021)

[14]

Dionelis, N., Yaghoobi, M., Tsaftaris, S.A.: Tail of distribution GAN (TailGAN): GenerativeAdversarial-network-based boundary formation. In: SSPD (2020)

[15]

Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: ICCV (2019)

[16]

Frühstück, A., Singh, K.K., Shechtman, E., Mitra, N.J., Wonka, P., Lu, J.: InsetGAN for full-body image generation. In: CVPR (2022)

[17]

Gahan, A.: 3ds Max Modeling for Games: Insider’s Guide to Game Character, Vehicle, and Environment Modeling (2012)

[18]

Ghadiyaram, D., Tran, D., Mahajan, D.: Large-scale weakly-supervised pre-training for video action recognition. In: CVPR (2019)

[19]

Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR (2017)

[20]

Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)

[21]

Grigorev, A., et al.: StylePeople: a generative model of fullbody human avatars. In: CVPR (2021)

[22]

Grover, A., et al.: Bias correction of learned generative models using likelihood-free importance weighting. In: NeurIPS (2019)

[23]

Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d aware generator for high-resolution image synthesis. In: ICLR (2022)

[24]

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: NeurIPS (2017)

[25]

Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: CVPR (2018)

[26]

Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GanSpace: discovering interpretable GAN controls. In: NeurIPS (2020)

[27]

Jiang, L., Dai, B., Wu, W., Loy, C.C.: Deceive D: adaptive pseudo augmentation for GAN training with limited data. In: NeurIPS (2021)

[28]

Jiang, Y., Chan, K.C., Wang, X., Loy, C.C., Liu, Z.: Robust reference-based super-resolution via C2-matching. In: CVPR (2021)

[29]

Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: ICCV (2021)

[30]

Jiang Y, Yang S, Qiu H, Wu W, Loy CC, and Liu Z Text2Human: text-driven controllable human image generation ACM TOG 2022 41 4 1-11

Digital Library

[31]

Jojic N, Gu J, Shen T, and Huang TS Computer modeling, analysis, and synthesis of dressed humans TCSVT 1999 9 2 378-388

Digital Library

[32]

Kang, W.C., Fang, C., Wang, Z., McAuley, J.: Visually-aware fashion recommendation and design with generative image models. In: ICDM (2017)

[33]

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR (2017)

[34]

Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: NeurIPS (2020)

[35]

Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)

[36]

Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

[37]

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)

[38]

Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)

[39]

Kocasari, U., Dirik, A., Tiftikci, M., Yanardag, P.: StyleMC: multi-channel based fast text-guided image generation and manipulation. In: WACV (2022)

[40]

Lei, C., Liu, D., Li, W., Zha, Z.J., Li, H.: Comparative deep learning of hybrid representations for image recommendations. In: CVPR (2016)

[41]

Lewis KM, Varadharajan S, and Kemelmacher-Shlizerman I TryOnGAN: body-aware try-on via layered interpolation ACM TOG 2021 40 4 1-10

Digital Library

[42]

Li Z et al. Animated 3D human avatars from a single image with GAN-based texture inference CNG 2021 95 81-91

[43]

Liang X et al. Deep human parsing with active template regression PAMI 2015 37 12 2402-2414

Digital Library

[44]

Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: ICCV (2019)

[45]

Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)

[46]

Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: CVPR (2019)

[47]

Liu Z, Yan S, Luo P, Wang X, and Tang X Leibe B, Matas J, Sebe N, and Welling M Fashion landmark detection in the wild Computer Vision – ECCV 2016 2016 Cham Springer 229-245

[48]

Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: NeurIPS (2017)

[49]

Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: CVPR (2018)

[50]

Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: CVPR (2020)

[51]

Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)

[52]

Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, C.: BaGAN: data augmentation with balancing GAN. arXiv preprint arXiv:1803.09655 (2018)

[53]

Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)

[54]

Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: PULSE: self-supervised photo upsampling via latent space exploration of generative models. In: CVPR (2020)

[55]

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)

[56]

Neuberger, A., Borenstein, E., Hilleli, B., Oks, E., Alpert, S.: Image based virtual try-on network from unpaired data. In: CVPR (2020)

[57]

Nie, X., Feng, J., Zhang, J., Yan, S.: Single-stage multi-person pose machines. In: ICCV (2019)

[58]

Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: CVPR (2021)

[59]

Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2022)

[60]

Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: ICCV (2021)

[61]

Patel, C., Liao, Z., Pons-Moll, G.: TailorNet: predicting clothing in 3D as a function of human pose, shape and garment style. In: CVPR (2020)

[62]

Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)

[63]

Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. In: CVPR (2019)

[64]

Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966 (2020)

[65]

Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

[66]

Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2016)

[67]

Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML (2021)

[68]

Roich D, Mokady R, Bermano AH, and Cohen-Or D Pivotal tuning for latent-based editing of real images ACM TOG 2022 42 1 1-13

Digital Library

[69]

Sarkar, K., Golyanik, V., Liu, L., Theobalt, C.: Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint arXiv:2102.11263 (2021)

[70]

Sarkar K, Mehta D, Xu W, Golyanik V, and Theobalt C Vedaldi A, Bischof H, Brox T, and Frahm J-M Neural re-rendering of humans from a single image Computer Vision – ECCV 2020 2020 Cham Springer 596-613

Digital Library

[71]

Sattigeri, P., Hoffman, S.C., Chenthamarakshan, V., Varshney, K.R.: Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM JRD 63(4/5), 3-1 (2019)

[72]

Sauer, A., Schwarz, K., Geiger, A.: StyleGAN-XL: scaling StyleGAN to large diverse datasets. ACM TOG (2022)

[73]

Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NeurIPS (2020)

[74]

Serengil, S.I., Ozpinar, A.: HyperExtended LightFace: a facial attribute analysis framework. In: ICEET (2021)

[75]

Shen Y, Yang C, Tang X, and Zhou B InterFaceGAN: interpreting the disentangled face representation learned by GANs PAMI 2020 44 4 2004-2018

[76]

Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. In: CVPR (2021)

[77]

Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: CVPR (2018)

[78]

Song, D., Tong, R., Chang, J., Yang, X., Tang, M., Zhang, J.J.: 3D body shapes estimation from dressed-human silhouettes. In: CGF (2016)

[79]

Tewari A et al. PIE: portrait image embedding for semantic control ACM TOG 2020 39 6 1-14

Digital Library

[80]

Tewari, A., et al.: StyleRig: rigging StyleGAN for 3D control over portrait images. In: CVPR (2020)

[81]

Toutouh J, Hemberg E, and O’Reilly U-M Iba H and Noman N Data dieting in GAN training Deep Neural Evolution 2020 Singapore Springer 379-400

[82]

Tov O, Alaluf Y, Nitzan Y, Patashnik O, and Cohen-Or D Designing an encoder for StyleGAN image manipulation ACM TOG 2021 40 4 1-14

Digital Library

[83]

Tzelepis, C., Tzimiropoulos, G., Patras, I.: WarpedGANSpace: finding non-linear RBF paths in GAN latent space. In: ICCV (2021)

[84]

Véges, M., Lőrincz, A.: Absolute human pose estimation with depth prediction network. In: IJCNN (2019)

[85]

Wang B, Zheng H, Liang X, Chen Y, Lin L, and Yang M Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Toward characteristic-preserving image-based virtual try-on network Computer Vision – ECCV 2018 2018 Cham Springer 607-623

Digital Library

[86]

Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. In: CVPR (2022)

[87]

Wang, T., Zhang, T., Lovell, B.: Faces a la carte: text-to-face generation via attribute disentanglement. In: WACV (2021)

[88]

Wu, C., Li, H.: Conditional transferring features: scaling GANs to thousands of classes with 30% less high-quality data for training. In: IJCNN (2020)

[89]

Wu, Q., Li, L., Yu, Z.: TextGAIL: generative adversarial imitation learning for text generation. In: AAAI (2021)

[90]

Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for StyleGAN image generation. In: CVPR (2021)

[91]

Xia, W., Yang, Y., Xue, J.H., Wu, B.: TediGAN: text-guided diverse face image generation and manipulation. In: CVPR (2021)

[92]

Xu, D., Yuan, S., Zhang, L., Wu, X.: FairGAN: fairness-aware generative adversarial networks. In: IEEE BigData (2018)

[93]

Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: CVPR (2020)

[94]

Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)

[95]

Xu, Y., et al.: TransEditor: transformer-based dual-space GAN for highly controllable facial editing. In: CVPR (2022)

[96]

Yildirim, G., Jetchev, N., Vollgraf, R., Bergmann, U.: Generating high-resolution fashion model images wearing custom outfits. In: ICCVW (2019)

[97]

Zhang, D., Khoreva, A.: Progressive augmentation of GANs. In: NeurIPS (2019)

[98]

Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient GAN training. In: NeurIPS (2020)

[99]

Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)

Cited By

Jia ZZhang ZWang LTan T(2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3665869
Cai PLiu ZZhu GNiu YWang JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive MannerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680814(3372-3380)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680814
Liao FZou XWong W(2024)Appearance and Pose-guided Human Generation: A SurveyACM Computing Surveys10.1145/363706056:5(1-35)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3637060
Show More Cited By

Recommendations

Human Image Generation: A Comprehensive Survey
Image and video synthesis has become a blooming topic in computer vision and machine learning communities along with the developments of deep generative models, due to its great academic and application value. Many researchers have been devoted to ...
Human-Centric Image Captioning
Highlights
- We propose a new task of Human-Centric Image Captioning and build a dataset - HC-COCO.
Abstract
In this paper, we propose a new topic, Human-Centric Captioning, to mainly describe the human behavior in an image. Human activities and relationships are the primary objectives of visual understanding in daily applications. However, ...
Open-Vocabulary Text-Driven Human Image Generation
Abstract
Generating human images from open-vocabulary text descriptions is an exciting but challenging task. Previous methods (i.e., Text2Human) face two challenging problems: (1) they cannot well handle the open-vocabulary setting by arbitrary text inputs ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI

Oct 2022

812 pages

ISBN:978-3-031-19786-4

DOI:10.1007/978-3-031-19787-1

Editors:
Shai Avidan
Tel Aviv University, Tel Aviv, Israel
,
Gabriel Brostow
University College London, London, UK
,
Moustapha Cissé
Google AI, Accra, Ghana
,
Giovanni Maria Farinella
University of Catania, Catania, Italy
,
Tal Hassner
Facebook (United States), Menlo Park, CA, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jia ZZhang ZWang LTan T(2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3665869
Cai PLiu ZZhu GNiu YWang JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive MannerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680814(3372-3380)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680814
Liao FZou XWong W(2024)Appearance and Pose-guided Human Generation: A SurveyACM Computing Surveys10.1145/363706056:5(1-35)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3637060
Xu RShen FXie XLi Z(2024)Training-Free Diffusion Models for Content-Style SynthesisAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5609-4_24(308-319)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5609-4_24
Liu LXu XLin ZLiang JYan S(2023)Towards Garment Sewing Pattern Reconstruction from a Single ImageACM Transactions on Graphics10.1145/361831942:6(1-15)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3618319
Wu YXu SXiang JWei FChen QYang JTong X(2023)AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image CollectionsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618164(1-9)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.1145/3610548.3618164
Wang YXiang WZhang SXue DQian S(2023)TOAC: Try-On Aligning Conformer for Image-Based Virtual Try-On AlignmentArtificial Intelligence10.1007/978-981-99-9119-8_3(29-40)Online publication date: 22-Jul-2023
https://dl.acm.org/doi/10.1007/978-981-99-9119-8_3

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents