Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3524938.3525773guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Small-GAN: speeding up GAN training using core-sets

Published: 13 July 2020 Publication History

Abstract

Recent work by Brock et al. (2018) suggests that Generative Adversarial Networks (GANs) benefit disproportionately from large mini-batch sizes. Unfortunately, using large batches is slow and expensive on conventional hardware. Thus, it would be nice if we could generate batches that were effectively large though actually small. In this work, we propose a method to do this, inspired by the use of Coreset-selection in active learning. When training a GAN, we draw a large batch of samples from the prior and then compress that batch using Coreset-selection. To create effectively large batches of 'real' images, we create a cached dataset of Inception activations of each training image, randomly project them down to a smaller dimension, and then use Coreset-selection on those projected activations at training time. We conduct experiments showing that this technique substantially reduces training time and memory usage for modern GAN variants, that it reduces the fraction of dropped modes in a synthetic dataset, and that it allows GANs to reach a new state of the art in anomaly detection.

References

[1]
Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1-30, 2005.
[2]
Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
[3]
Arora, S., Ge, R., Liang, Y., Ma, T., and Zhang, Y. Generalization and equilibrium in generative adversarial nets (gans). In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 224-232. JMLR. org, 2017.
[4]
Arora, S., Risteski, A., and Zhang, Y. Do gans learn the distribution? some theory and empirics. 2018.
[5]
Azadi, S., Olsson, C., Darrell, T., Goodfellow, I., and Odena, A. Discriminator rejection sampling. arXiv preprint arXiv:1810.06758, 2018.
[6]
Bachem, O., Lucic, M., and Krause, A. Practical coreset constructions for machine learning. arXiv preprint arXiv:1703.06476, 2017.
[7]
Badoiu, M., Har-Peled, S., and Indyk, P. Approximate clustering via core-sets. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250-257. ACM, 2002.
[8]
Barahona, F. and Chudak, F. A. Near-optimal solutions to large-scale facility location problems. Discrete Optimization, 2(1):35-50, 2005.
[9]
Bellemare, M. G., Danihelka, I., Dabney, W., Mohamed, S., Lakshminarayanan, B., Hoyer, S., and Munos, R. The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.
[10]
Brock, A., Donahue, J., and Simonyan, K. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
[11]
Chandola, V., Banerjee, A., and Kumar, V. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3): 15, 2009.
[12]
Chavdarova, T. and Fleuret, F. Sgan: An alternative training of generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9407-9415, 2018.
[13]
Chavdarova, T., Gidel, G., Fleuret, F., and Lacoste-Julien, S. Reducing noise in gan training with variance reduced extragradient. arXiv preprint arXiv:1904.08598, 2019.
[14]
Clarkson, K. L. Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Transactions on Algorithms (TALG), 6(4):63, 2010.
[15]
Dasgupta, S. and Gupta, A. An elementary proof of a theorem of johnson and lindenstrauss. Random Structures & Algorithms, 22(1):60-65, 2003.
[16]
Dieng, A. B., Ruiz, F. J., Blei, D. M., and Titsias, M. K. Prescribed generative adversarial networks. arXiv preprint arXiv:1910.04302, 2019.
[17]
Donoho, D. L. et al. High-dimensional data analysis: The curses and blessings of dimensionality. AMS math challenges lecture, 1(2000):32, 2000.
[18]
Durugkar, I., Gemp, I., and Mahadevan, S. Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673, 2016.
[19]
Eskicioglu, A. M. and Fisher, P. S. Image quality measures and their performance. IEEE Transactions on communications, 43(12):2959-2965, 1995.
[20]
Farahani, R. Z. and Hekmatfar, M. Facility location: concepts, models, algorithms and case studies. Springer, 2009.
[21]
Fedus, W., Goodfellow, I., and Dai, A. M. Maskgan: better text generation via filling in the . arXiv preprint arXiv:1801.07736, 2018.
[22]
Feldman, D., Faulkner, M., and Krause, A. Scalable training of mixture models via coresets. In Advances in neural information processing systems, pp. 2142-2150, 2011.
[23]
Gidel, G., Berard, H., Vignoud, G., Vincent, P., and Lacoste-Julien, S. A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551, 2018.
[24]
Girod, B. What's wrong with mean-squared error? Digital images and human vision, pp. 207-220, 1993.
[25]
Goldman, A. Optimal center location in simple networks. Transportation science, 5(2):212-221, 1971.
[26]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.
[27]
Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
[28]
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767-5777, 2017.
[29]
Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y., and Wang, J. Long text generation via adversarial training with leaked information. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[30]
Har-Peled, S. and Kushal, A. Smaller coresets for k-median and k-means clustering. Discrete & Computational Geometry, 37(1):3-19, 2007.
[31]
Har-Peled, S. and Mazumdar, S. On coresets for k-means and k-median clustering. In Proceedings of the thirtysixth annual ACM symposium on Theory of computing, pp. 291-300. ACM, 2004.
[32]
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626-6637, 2017.
[33]
Huggins, J., Campbell, T., and Broderick, T. Coresets for scalable bayesian logistic regression. In Advances in Neural Information Processing Systems, pp. 4080-4088, 2016.
[34]
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-toimage translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134, 2017.
[35]
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
[36]
Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
[37]
Kumar, R., Goyal, A., Courville, A., and Bengio, Y. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.
[38]
Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., and Kim, K. J. A survey of deep learning-based network anomaly detection. Cluster Computing, pp. 1-13, 2017.
[39]
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690, 2017.
[40]
Li, C.-L., Chang, W.-C., Cheng, Y., Yang, Y., and Póczos, B. Mmd gan: Towards deeper understanding of moment matching network. In Advances in Neural Information Processing Systems, pp. 2203-2213, 2017a.
[41]
Li, J., Madry, A., Peebles, J., and Schmidt, L. Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884, 2017b.
[42]
Lucic, M., Faulkner, M., Krause, A., and Feldman, D. Training gaussian mixture models at scale via coresets. The Journal of Machine Learning Research, 18(1): 5885-5909, 2017.
[43]
Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2794-2802, 2017.
[44]
Mescheder, L. On the convergence properties of gan training. arXiv preprint arXiv:1801.04406, 1:16, 2018.
[45]
Mescheder, L., Geiger, A., and Nowozin, S. Which training methods for gans do actually converge? arXiv preprint arXiv:1801.04406, 2018.
[46]
Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
[47]
Mroueh, Y. and Sercu, T. Fisher gan. In Advances in Neural Information Processing Systems, pp. 2513-2523, 2017.
[48]
Mussay, B., Osadchy, M., Braverman, V., Zhou, S., and Feldman, D. Data-independent neural pruning via coresets, 2019.
[49]
Nagarajan, V. and Kolter, J. Z. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems, pp. 5585-5595, 2017.
[50]
Nguyen, C. V., Li, Y., Bui, T. D., and Turner, R. E. Variational continual learning. arXiv preprint arXiv:1710.10628, 2017.
[51]
Phillips, J. M. Coresets and sketches. arXiv preprint arXiv:1601.00617, 2016.
[52]
Pratap, R. and Sen, S. Faster coreset construction for projective clustering via low-rank approximation. In International Workshop on Combinatorial Algorithms, pp. 336-348. Springer, 2018.
[53]
Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[54]
Salimans, T., Zhang, H., Radford, A., and Metaxas, D. Improving gans using optimal transport. arXiv preprint arXiv:1803.05573, 2018.
[55]
Sener, O. and Savarese, S. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.
[56]
Shallue, C. J., Lee, J., Antognini, J., Sohl-Dickstein, J., Frostig, R., and Dahl, G. E. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600, 2018.
[57]
Sinha, S., Ebrahimi, S., and Darrell, T. Variational adversarial active learning. arXiv preprint arXiv:1904.00370, 2019.
[58]
Smith, S. L., Kindermans, P.-J., Ying, C., and Le, Q. V. Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017.
[59]
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[60]
Tsang, I. W., Kwok, J. T., and Cheung, P.-M. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research, 6(Apr):363- 392, 2005.
[61]
Tsang, I. W., Kocsor, A., and Kwok, J. T. Simpler core vector machines with enclosing balls. In Proceedings of the 24th international conference on Machine learning, pp. 911-918. ACM, 2007.
[62]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp. 5998-6008, 2017.
[63]
Wei, K., Liu, Y., Kirchhoff, K., and Bilmes, J. Using document summarization techniques for speech data subset selection. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 721-726, 2013.
[64]
Wolsey, L. A. and Nemhauser, G. L. Integer and combinatorial optimization. John Wiley & Sons, 2014.
[65]
Xian, Y., Lorenz, T., Schiele, B., and Akata, Z. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5542-5551, 2018.
[66]
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. How transferable are features in deep neural networks? In Advances in neural information processing systems, pp. 3320-3328, 2014.
[67]
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., and Xiao, J. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
[68]
Zenati, H., Foo, C. S., Lecouat, B., Manek, G., and Chandrasekhar, V. R. Efficient gan-based anomaly detection. arXiv preprint arXiv:1802.06222, 2018.
[69]
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., and Metaxas, D. N. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5907-5915, 2017.
[70]
Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.
[71]
Zhu, J.-J. and Bento, J. Generative adversarial active learning. arXiv preprint arXiv:1702.07956, 2017.
[72]
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.

Index Terms

  1. Small-GAN: speeding up GAN training using core-sets
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        ICML'20: Proceedings of the 37th International Conference on Machine Learning
        July 2020
        11702 pages

        Publisher

        JMLR.org

        Publication History

        Published: 13 July 2020

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 88
          Total Downloads
        • Downloads (Last 12 months)64
        • Downloads (Last 6 weeks)12
        Reflects downloads up to 26 Dec 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media