Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-19787-1_1guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Published: 23 October 2022 Publication History

Abstract

Unconditional human image generation is an important task in vision and graphics, enabling various applications in the creative industry. Existing studies in this field mainly focus on “network engineering” such as designing new components and objective functions. This work takes a data-centric perspective and investigates multiple critical aspects in “data engineering”, which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) Large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with a vanilla StyleGAN. 2) A balanced training set helps improve the generation quality with rare face poses compared to the long-tailed counterpart, whereas simply balancing the clothing texture distribution does not effectively bring an improvement. 3) Human GAN models that employ body centers for alignment outperform models trained using face centers or pelvis points as alignment anchors. In addition, a model zoo and human editing applications are demonstrated to facilitate future research in the community. Code and models are publicly available (Project page: https://stylegan-human.github.io/. Code and models: https://github.com/stylegan-human/StyleGAN-Human.)

References

[1]
Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN++: how to edit the embedded images? In: CVPR (2020)
[2]
Abdal R, Zhu P, Mitra NJ, and Wonka P StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows ACM TOG 2021 40 3 1-21
[3]
Albahar B, Lu J, Yang J, Shu Z, Shechtman E, and Huang JB Pose with style: detail-preserving pose-guided image synthesis with conditional StyleGAN ACM TOG 2021 40 6 1-11
[4]
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)
[5]
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML (2017)
[6]
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)
[7]
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)
[8]
Chan, C., Ginosar, S., Zhou, T., Efros, A.A.: Everybody dance now. In: ICCV (2019)
[9]
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
[10]
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: Pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)
[11]
Chan, K.C., Wang, X., Xu, X., Gu, J., Loy, C.C.: GLEAN: generative latent bank for large-factor image super-resolution. In: CVPR (2021)
[12]
MMSegmentation Contributors: MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark (2020). https://github.com/open-mmlab/mmsegmentation
[13]
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS (2021)
[14]
Dionelis, N., Yaghoobi, M., Tsaftaris, S.A.: Tail of distribution GAN (TailGAN): GenerativeAdversarial-network-based boundary formation. In: SSPD (2020)
[15]
Dong, H., et al.: Towards multi-pose guided virtual try-on network. In: ICCV (2019)
[16]
Frühstück, A., Singh, K.K., Shechtman, E., Mitra, N.J., Wonka, P., Lu, J.: InsetGAN for full-body image generation. In: CVPR (2022)
[17]
Gahan, A.: 3ds Max Modeling for Games: Insider’s Guide to Game Character, Vehicle, and Environment Modeling (2012)
[18]
Ghadiyaram, D., Tran, D., Mahajan, D.: Large-scale weakly-supervised pre-training for video action recognition. In: CVPR (2019)
[19]
Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR (2017)
[20]
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
[21]
Grigorev, A., et al.: StylePeople: a generative model of fullbody human avatars. In: CVPR (2021)
[22]
Grover, A., et al.: Bias correction of learned generative models using likelihood-free importance weighting. In: NeurIPS (2019)
[23]
Gu, J., Liu, L., Wang, P., Theobalt, C.: StyleNeRF: a style-based 3d aware generator for high-resolution image synthesis. In: ICLR (2022)
[24]
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of wasserstein GANs. In: NeurIPS (2017)
[25]
Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: VITON: an image-based virtual try-on network. In: CVPR (2018)
[26]
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GanSpace: discovering interpretable GAN controls. In: NeurIPS (2020)
[27]
Jiang, L., Dai, B., Wu, W., Loy, C.C.: Deceive D: adaptive pseudo augmentation for GAN training with limited data. In: NeurIPS (2021)
[28]
Jiang, Y., Chan, K.C., Wang, X., Loy, C.C., Liu, Z.: Robust reference-based super-resolution via C2-matching. In: CVPR (2021)
[29]
Jiang, Y., Huang, Z., Pan, X., Loy, C.C., Liu, Z.: Talk-to-edit: fine-grained facial editing via dialog. In: ICCV (2021)
[30]
Jiang Y, Yang S, Qiu H, Wu W, Loy CC, and Liu Z Text2Human: text-driven controllable human image generation ACM TOG 2022 41 4 1-11
[31]
Jojic N, Gu J, Shen T, and Huang TS Computer modeling, analysis, and synthesis of dressed humans TCSVT 1999 9 2 378-388
[32]
Kang, W.C., Fang, C., Wang, Z., McAuley, J.: Visually-aware fashion recommendation and design with generative image models. In: ICDM (2017)
[33]
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR (2017)
[34]
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: NeurIPS (2020)
[35]
Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)
[36]
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
[37]
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
[38]
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)
[39]
Kocasari, U., Dirik, A., Tiftikci, M., Yanardag, P.: StyleMC: multi-channel based fast text-guided image generation and manipulation. In: WACV (2022)
[40]
Lei, C., Liu, D., Li, W., Zha, Z.J., Li, H.: Comparative deep learning of hybrid representations for image recommendations. In: CVPR (2016)
[41]
Lewis KM, Varadharajan S, and Kemelmacher-Shlizerman I TryOnGAN: body-aware try-on via layered interpolation ACM TOG 2021 40 4 1-10
[42]
Li Z et al. Animated 3D human avatars from a single image with GAN-based texture inference CNG 2021 95 81-91
[43]
Liang X et al. Deep human parsing with active template regression PAMI 2015 37 12 2402-2414
[44]
Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., Gao, S.: Liquid warping GAN: a unified framework for human motion imitation, appearance transfer and novel view synthesis. In: ICCV (2019)
[45]
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)
[46]
Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., Yu, S.X.: Large-scale long-tailed recognition in an open world. In: CVPR (2019)
[47]
Liu Z, Yan S, Luo P, Wang X, and Tang X Leibe B, Matas J, Sebe N, and Welling M Fashion landmark detection in the wild Computer Vision – ECCV 2016 2016 Cham Springer 229-245
[48]
Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: NeurIPS (2017)
[49]
Ma, L., Sun, Q., Georgoulis, S., Van Gool, L., Schiele, B., Fritz, M.: Disentangled person image generation. In: CVPR (2018)
[50]
Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: CVPR (2020)
[51]
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: ICCV (2017)
[52]
Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., Malossi, C.: BaGAN: data augmentation with balancing GAN. arXiv preprint arXiv:1803.09655 (2018)
[53]
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: ICCV (2017)
[54]
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: PULSE: self-supervised photo upsampling via latent space exploration of generative models. In: CVPR (2020)
[55]
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
[56]
Neuberger, A., Borenstein, E., Hilleli, B., Oks, E., Alpert, S.: Image based virtual try-on network from unpaired data. In: CVPR (2020)
[57]
Nie, X., Feng, J., Zhang, J., Yan, S.: Single-stage multi-person pose machines. In: ICCV (2019)
[58]
Niemeyer, M., Geiger, A.: GIRAFFE: representing scenes as compositional generative neural feature fields. In: CVPR (2021)
[59]
Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: CVPR (2022)
[60]
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., Lischinski, D.: StyleCLIP: text-driven manipulation of StyleGAN imagery. In: ICCV (2021)
[61]
Patel, C., Liao, Z., Pons-Moll, G.: TailorNet: predicting clothing in 3D as a function of human pose, shape and garment style. In: CVPR (2020)
[62]
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: CVPR (2017)
[63]
Pumarola, A., Sanchez-Riera, J., Choi, G., Sanfeliu, A., Moreno-Noguer, F.: 3DPeople: modeling the geometry of dressed humans. In: CVPR (2019)
[64]
Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data. arXiv preprint arXiv:2001.07966 (2020)
[65]
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
[66]
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2016)
[67]
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML (2021)
[68]
Roich D, Mokady R, Bermano AH, and Cohen-Or D Pivotal tuning for latent-based editing of real images ACM TOG 2022 42 1 1-13
[69]
Sarkar, K., Golyanik, V., Liu, L., Theobalt, C.: Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint arXiv:2102.11263 (2021)
[70]
Sarkar K, Mehta D, Xu W, Golyanik V, and Theobalt C Vedaldi A, Bischof H, Brox T, and Frahm J-M Neural re-rendering of humans from a single image Computer Vision – ECCV 2020 2020 Cham Springer 596-613
[71]
Sattigeri, P., Hoffman, S.C., Chenthamarakshan, V., Varshney, K.R.: Fairness GAN: generating datasets with fairness properties using a generative adversarial network. IBM JRD 63(4/5), 3-1 (2019)
[72]
Sauer, A., Schwarz, K., Geiger, A.: StyleGAN-XL: scaling StyleGAN to large diverse datasets. ACM TOG (2022)
[73]
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: GRAF: generative radiance fields for 3D-aware image synthesis. In: NeurIPS (2020)
[74]
Serengil, S.I., Ozpinar, A.: HyperExtended LightFace: a facial attribute analysis framework. In: ICEET (2021)
[75]
Shen Y, Yang C, Tang X, and Zhou B InterFaceGAN: interpreting the disentangled face representation learned by GANs PAMI 2020 44 4 2004-2018
[76]
Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. In: CVPR (2021)
[77]
Siarohin, A., Sangineto, E., Lathuilière, S., Sebe, N.: Deformable GANs for pose-based human image generation. In: CVPR (2018)
[78]
Song, D., Tong, R., Chang, J., Yang, X., Tang, M., Zhang, J.J.: 3D body shapes estimation from dressed-human silhouettes. In: CGF (2016)
[79]
Tewari A et al. PIE: portrait image embedding for semantic control ACM TOG 2020 39 6 1-14
[80]
Tewari, A., et al.: StyleRig: rigging StyleGAN for 3D control over portrait images. In: CVPR (2020)
[81]
Toutouh J, Hemberg E, and O’Reilly U-M Iba H and Noman N Data dieting in GAN training Deep Neural Evolution 2020 Singapore Springer 379-400
[82]
Tov O, Alaluf Y, Nitzan Y, Patashnik O, and Cohen-Or D Designing an encoder for StyleGAN image manipulation ACM TOG 2021 40 4 1-14
[83]
Tzelepis, C., Tzimiropoulos, G., Patras, I.: WarpedGANSpace: finding non-linear RBF paths in GAN latent space. In: ICCV (2021)
[84]
Véges, M., Lőrincz, A.: Absolute human pose estimation with depth prediction network. In: IJCNN (2019)
[85]
Wang B, Zheng H, Liang X, Chen Y, Lin L, and Yang M Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Toward characteristic-preserving image-based virtual try-on network Computer Vision – ECCV 2018 2018 Cham Springer 607-623
[86]
Wang, T., Zhang, Y., Fan, Y., Wang, J., Chen, Q.: High-fidelity GAN inversion for image attribute editing. In: CVPR (2022)
[87]
Wang, T., Zhang, T., Lovell, B.: Faces a la carte: text-to-face generation via attribute disentanglement. In: WACV (2021)
[88]
Wu, C., Li, H.: Conditional transferring features: scaling GANs to thousands of classes with 30% less high-quality data for training. In: IJCNN (2020)
[89]
Wu, Q., Li, L., Yu, Z.: TextGAIL: generative adversarial imitation learning for text generation. In: AAAI (2021)
[90]
Wu, Z., Lischinski, D., Shechtman, E.: StyleSpace analysis: disentangled controls for StyleGAN image generation. In: CVPR (2021)
[91]
Xia, W., Yang, Y., Xue, J.H., Wu, B.: TediGAN: text-guided diverse face image generation and manipulation. In: CVPR (2021)
[92]
Xu, D., Yuan, S., Zhang, L., Wu, X.: FairGAN: fairness-aware generative adversarial networks. In: IEEE BigData (2018)
[93]
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: CVPR (2020)
[94]
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR (2018)
[95]
Xu, Y., et al.: TransEditor: transformer-based dual-space GAN for highly controllable facial editing. In: CVPR (2022)
[96]
Yildirim, G., Jetchev, N., Vollgraf, R., Bergmann, U.: Generating high-resolution fashion model images wearing custom outfits. In: ICCVW (2019)
[97]
Zhang, D., Khoreva, A.: Progressive augmentation of GANs. In: NeurIPS (2019)
[98]
Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient GAN training. In: NeurIPS (2020)
[99]
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: ICCV (2015)

Cited By

View all
  • (2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
  • (2024)Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive MannerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680814(3372-3380)Online publication date: 28-Oct-2024
  • (2024)Appearance and Pose-guided Human Generation: A SurveyACM Computing Surveys10.1145/363706056:5(1-35)Online publication date: 12-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI
Oct 2022
812 pages
ISBN:978-3-031-19786-4
DOI:10.1007/978-3-031-19787-1

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

  1. Human image generation
  2. Data-centric
  3. StyleGAN

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Human Image Generation: A Comprehensive SurveyACM Computing Surveys10.1145/366586956:11(1-39)Online publication date: 28-Jun-2024
  • (2024)Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive MannerProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680814(3372-3380)Online publication date: 28-Oct-2024
  • (2024)Appearance and Pose-guided Human Generation: A SurveyACM Computing Surveys10.1145/363706056:5(1-35)Online publication date: 12-Jan-2024
  • (2024)Training-Free Diffusion Models for Content-Style SynthesisAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5609-4_24(308-319)Online publication date: 5-Aug-2024
  • (2023)Towards Garment Sewing Pattern Reconstruction from a Single ImageACM Transactions on Graphics10.1145/361831942:6(1-15)Online publication date: 5-Dec-2023
  • (2023)AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image CollectionsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618164(1-9)Online publication date: 10-Dec-2023
  • (2023)TOAC: Try-On Aligning Conformer for Image-Based Virtual Try-On AlignmentArtificial Intelligence10.1007/978-981-99-9119-8_3(29-40)Online publication date: 22-Jul-2023

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media