The Unusual Effectiveness of Averaging in GAN Training

Yasin Yaz{\i}c{\i}, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar

Published: 21 Dec 2018, Last Modified: 14 Oct 2024ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known to lead to convergence in bilinear settings, we provide the -- to our knowledge -- first theoretical arguments in support of EMA. We show that EMA converges to limit cycles around the equilibrium with vanishing amplitude as the discount parameter approaches one for simple bilinear games and also enhances the stability of general GAN training. We establish experimentally that both techniques are strikingly effective in the non-convex-concave GAN setting as well. Both improve inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets -- mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet -- to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean CelebA face images.\footnote{~The code is available at \url{https://github.com/yasinyazici/EMA_GAN}}

Keywords: Generative Adversarial Networks (GANs), Moving Average, Exponential Moving Average, Convergence, Limit Cycles

Code: [![github](/images/github_icon.svg) yasinyazici/EMA_GAN](https://github.com/yasinyazici/EMA_GAN)

Data: [CelebA](https://paperswithcode.com/dataset/celeba), [STL-10](https://paperswithcode.com/dataset/stl-10)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/the-unusual-effectiveness-of-averaging-in-gan/code)

13 Replies