research-article

Free access

Small-GAN: speeding up GAN training using core-sets

AUTHORs:

Hugo Larochelle,

Augustus OdenaAuthors Info & Claims

ICML'20: Proceedings of the 37th International Conference on Machine Learning

Article No.: 835, Pages 9005 - 9015

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of 'real' images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection.

References

[1]

Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1-30, 2005.

[2]

Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.

[3]

Arora, S., Ge, R., Liang, Y., Ma, T., and Zhang, Y. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 224-232. JMLR. org, 2017.

Digital Library

[4]

Arora, S., Risteski, A., and Zhang, Y. Do gans learn the distribution? some theory and empirics. 2018.

[5]

Azadi, S., Olsson, C., Darrell, T., Goodfellow, I., and Odena, A. Discriminator rejection sampling. arXiv preprint arXiv:1810.06758, 2018.

[6]

Bachem, O., Lucic, M., and Krause, A. Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476, 2017.

[7]

Badoiu, M., Har-Peled, S., and Indyk, P. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250-257. ACM, 2002.

Digital Library

[8]

Barahona, F. and Chudak, F. A. Near-optimal solutions to large-scale facility location problems. Discrete Optimization, 2(1):35-50, 2005.

Digital Library

[9]

Bellemare, M. G., Danihelka, I., Dabney, W., Mohamed, S., Lakshminarayanan, B., Hoyer, S., and Munos, R. The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.

[10]

Brock, A., Donahue, J., and Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.

[11]

Chandola, V., Banerjee, A., and Kumar, V. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3): 15, 2009.

[12]

Chavdarova, T. and Fleuret, F. Sgan: An alternative training of generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407-9415, 2018.

[13]

Chavdarova, T., Gidel, G., Fleuret, F., and Lacoste-Julien, S. Reducing noise in gan training with variance reduced extragradient. arXiv preprint arXiv:1904.08598, 2019.

[14]

Clarkson, K. L. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 2010.

[15]

Dasgupta, S. and Gupta, A. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60-65, 2003.

Digital Library

[16]

Dieng, A. B., Ruiz, F. J., Blei, D. M., and Titsias, M. K. Prescribed generative adversarial networks. arXiv preprint arXiv:1910.04302, 2019.

[17]

Donoho, D. L. et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS math challenges lecture, 1(2000):32, 2000.

[18]

Durugkar, I., Gemp, I., and Mahadevan, S. Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673, 2016.

[19]

Eskicioglu, A. M. and Fisher, P. S. Image quality measures and their performance. IEEE Transactions on communications, 43(12):2959-2965, 1995.

[20]

Farahani, R. Z. and Hekmatfar, M. Facility location: concepts, models, algorithms and case studies. Springer, 2009.

[21]

Fedus, W., Goodfellow, I., and Dai, A. M. Maskgan: better text generation via filling in the . arXiv preprint arXiv:1801.07736, 2018.

[22]

Feldman, D., Faulkner, M., and Krause, A. Scalable training of mixture models via coresets. In Advances in neural information processing systems, pp. 2142-2150, 2011.

Digital Library

[23]

Gidel, G., Berard, H., Vignoud, G., Vincent, P., and Lacoste-Julien, S. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.

[24]

Girod, B. What's wrong with mean-squared error? Digital images and human vision, pp. 207-220, 1993.

[25]

Goldman, A. Optimal center location in simple networks. Transportation science, 5(2):212-221, 1971.

Digital Library

[26]

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.

Digital Library

[27]

Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.

[28]

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767-5777, 2017.

Digital Library

[29]

Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., and Wang, J. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[30]

Har-Peled, S. and Kushal, A. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3-19, 2007.

Digital Library

[31]

Har-Peled, S. and Mazumdar, S. On coresets for k-means and k-median clustering. In Proceedings of the thirtysixth annual ACM symposium on Theory of computing, pp. 291-300. ACM, 2004.

Digital Library

[32]

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626-6637, 2017.

[33]

Huggins, J., Campbell, T., and Broderick, T. Coresets for scalable bayesian logistic regression. In Advances in Neural Information Processing Systems, pp. 4080-4088, 2016.

[34]

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-toimage translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134, 2017.

[35]

Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.

[36]

Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[37]

Kumar, R., Goyal, A., Courville, A., and Bengio, Y. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.

[38]

Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., and Kim, K. J. A survey of deep learning-based network anomaly detection. Cluster Computing, pp. 1-13, 2017.

[39]

Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690, 2017.

[40]

Li, C.-L., Chang, W.-C., Cheng, Y., Yang, Y., and Póczos, B. Mmd gan: Towards deeper understanding of moment matching network. In Advances in Neural Information Processing Systems, pp. 2203-2213, 2017a.

[41]

Li, J., Madry, A., Peebles, J., and Schmidt, L. Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884, 2017b.

[42]

Lucic, M., Faulkner, M., Krause, A., and Feldman, D. Training gaussian mixture models at scale via coresets. The Journal of Machine Learning Research, 18(1): 5885-5909, 2017.

Digital Library

[43]

Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2794-2802, 2017.

[44]

Mescheder, L. On the convergence properties of gan training. arXiv preprint arXiv:1801.04406, 1:16, 2018.

[45]

Mescheder, L., Geiger, A., and Nowozin, S. Which training methods for gans do actually converge? arXiv preprint arXiv:1801.04406, 2018.

[46]

Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.

[47]

Mroueh, Y. and Sercu, T. Fisher gan. In Advances in Neural Information Processing Systems, pp. 2513-2523, 2017.

[48]

Mussay, B., Osadchy, M., Braverman, V., Zhou, S., and Feldman, D. Data-independent neural pruning via coresets, 2019.

[49]

Nagarajan, V. and Kolter, J. Z. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pp. 5585-5595, 2017.

[50]

Nguyen, C. V., Li, Y., Bui, T. D., and Turner, R. E. Variational continual learning. arXiv preprint arXiv:1710.10628, 2017.

[51]

Phillips, J. M. Coresets and sketches. arXiv preprint arXiv:1601.00617, 2016.

[52]

Pratap, R. and Sen, S. Faster coreset construction for projective clustering via low-rank approximation. In International Workshop on Combinatorial Algorithms, pp. 336-348. Springer, 2018.

[53]

Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

[54]

Salimans, T., Zhang, H., Radford, A., and Metaxas, D. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.

[55]

Sener, O. and Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.

[56]

Shallue, C. J., Lee, J., Antognini, J., Sohl-Dickstein, J., Frostig, R., and Dahl, G. E. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600, 2018.

[57]

Sinha, S., Ebrahimi, S., and Darrell, T. Variational adversarial active learning. arXiv preprint arXiv:1904.00370, 2019.

[58]

Smith, S. L., Kindermans, P.-J., Ying, C., and Le, Q. V. Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.

[59]

Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.

Digital Library

[60]

Tsang, I. W., Kwok, J. T., and Cheung, P.-M. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research, 6(Apr):363- 392, 2005.

[61]

Tsang, I. W., Kocsor, A., and Kwok, J. T. Simpler core vector machines with enclosing balls. In Proceedings of the 24th international conference on Machine learning, pp. 911-918. ACM, 2007.

Digital Library

[62]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp. 5998-6008, 2017.

Digital Library

[63]

Wei, K., Liu, Y., Kirchhoff, K., and Bilmes, J. Using document summarization techniques for speech data subset selection. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 721-726, 2013.

[64]

Wolsey, L. A. and Nemhauser, G. L. Integer and combinatorial optimization. John Wiley & Sons, 2014.

[65]

Xian, Y., Lorenz, T., Schiele, B., and Akata, Z. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5542-5551, 2018.

[66]

Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. How transferable are features in deep neural networks? In Advances in neural information processing systems, pp. 3320-3328, 2014.

Digital Library

[67]

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.

[68]

Zenati, H., Foo, C. S., Lecouat, B., Manek, G., and Chandrasekhar, V. R. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.

[69]

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. N. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907-5915, 2017.

[70]

Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.

[71]

Zhu, J.-J. and Bento, J. Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.

[72]

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.

Index Terms

Small-GAN: speeding up GAN training using core-sets
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

ASS-GAN: Asymmetric semi-supervised GAN for breast ultrasound image segmentation
Graphical abstract

Display Omitted
Highlights
- We propose a novel architecture consisting of two generators and one discriminator.
Abstract
Ultrasound imaging is considered to be one of the important methods for diagnosing breast cancers, and lesion segmentation is an essential step in automatic computer-aided ultrasonic diagnosis. However, the high cost of ultrasound ...
MIML-GAN: A GAN-Based Algorithm for Multi-Instance Multi-Label Learning on Overlapping Signal Waveform Recognition
Existing studies for automatic waveform recognition of overlapping signals have mostly been conducted in a supervised manner. Although demonstrating superior performance in recent years, supervised methods rely heavily on sufficient labeled samples, but ...
GAN-MVAE: A discriminative latent feature generation framework for generalized zero-shot learning
Highlights
- Propose a deep generative model (called GAN-MVAE) for Generalized Zero-Shot Learning.
Abstract
Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize both seen and unseen classes. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'20: Proceedings of the 37th International Conference on Machine Learning

July 2020

11702 pages

Editors:
Hal Daumé,
Aarti Singh

Copyright © 2020.

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
88
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)12

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents