research-article

Don't generate me: training differentially private generative models with Sinkhorn divergence

AUTHORs:

Karsten KreisAuthors Info & Claims

NIPS'21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Article No.: 955, Pages 12480 - 12492

Published: 10 June 2024 Publication History

Abstract

Although machine learning models trained on massive data have led to breakthroughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for controlling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images.

Supplementary Material

Additional material (3540261.3541216_supp.pdf)

Supplemental material.

Download
664.38 KB

References

[1]

C. A. Gomez-Uribe and N. Hunt, "The netflix recommender system: Algorithms, business value, and innovation," ACM Trans. Manage. Inf. Syst., vol. 6, Dec. 2016.

Digital Library

[2]

D. Ho, S. R. Quake, E. R. B. McCabe, W. J. Chng, E. K. Chow, X. Ding, B. D. Gelb, G. S. Ginsburg, J. Hassenstab, C.-M. Ho, W. C. Mobley, G. P. Nolan, S. T. Rosen, P. Tan, Y. Yen, and A. Zarrinpar, "Enabling Technologies for Personalized and Precision Medicine," Trends Biotechnol., vol. 38, no. 5, pp. 497–518, 2020.

[3]

M. Wang and W. Deng, "Deep Face Recognition: A Survey," arXiv preprint arXiv:1804.06655, 2020.

[4]

A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.

[5]

C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating noise to sensitivity in private data analysis," in Theory of cryptography conference, pp. 265–284, Springer, 2006.

[6]

C. Dwork and A. Roth, "The Algorithmic Foundations of Differential Privacy," Found. Trends Theor. Comput. Sci., vol. 9, p. 211–407, Aug. 2014.

Digital Library

[7]

L. Xie, K. Lin, S. Wang, F. Wang, and J. Zhou, "Differentially private generative adversarial network," arXiv preprint arXiv:1802.06739, 2018.

[8]

S. Augenstein, H. B. McMahan, D. Ramage, S. Ramaswamy, P. Kairouz, M. Chen, R. Mathews, and B. A. y Arcas, "Generative Models for Effective ML on Private, Decentralized Datasets," in International Conference on Learning Representations, 2020.

[9]

R. Webster, J. Rabin, L. Simon, and F. Jurie, "This person (probably) exists. identity membership attacks against gan generated faces," arXiv preprint arXiv:2107.06018, 2021.

[10]

J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro, "Logan: Membership inference attacks against generative models," in Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2019, pp. 133–152, De Gruyter, 2019.

[11]

L. Frigerio, A. S. de Oliveira, L. Gomez, and P. Duverger, "Differentially private generative adversarial networks for time series, continuous, and discrete open data," in IFIP International Conference on ICT Systems Security and Privacy Protection, pp. 151–164, Springer, 2019.

[12]

J. Yoon, J. Jordon, and M. van der Schaar, "PATE-GAN: Generating synthetic data with differential privacy guarantees," in International Conference on Learning Representations, 2019.

[13]

D. Chen, T. Orekondy, and M. Fritz, "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators," in Advances in Neural Information Processing Systems, 2020.

[14]

B. Wang, F. Wu, Y. Long, L. Rimanic, C. Zhang, and B. Li, "Datalens: Scalable privacy preserving training via gradient compression and aggregation," arXiv preprint arXiv:2103.11109, 2021.

[15]

A. Brock, J. Donahue, and K. Simonyan, "Large scale gan training for high fidelity natural image synthesis," in International Conference on Learning Representations, 2019.

[16]

T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119, 2020.

[17]

M. Arjovsky and L. Bottou, "Towards Principled Methods for Training Generative Adversarial Networks," in International Conference on Learning Representations, 2017.

[18]

L. Mescheder, A. Geiger, and S. Nowozin, "Which training methods for GANs do actually converge?," in International Conference on Machine Learning (J. Dy and A. Krause, eds.), vol. 80 of Proceedings of Machine Learning Research, (Stockholmsmässan, Stockholm Sweden), pp. 3481–3490, PMLR, 10–15 Jul 2018.

[19]

K. Chaudhuri and S. A. Vinterbo, "A stability-based validation procedure for differentially private machine learning," in Advances in Neural Information Processing Systems 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, eds.), pp. 2652–2660, Curran Associates, Inc., 2013.

[20]

O. Bousquet, S. Gelly, I. Tolstikhin, C.-J. Simon-Gabriel, and B. Schoelkopf, "From optimal transport to generative modeling: the vegan cookbook," arXiv preprint arXiv:1705.07642, 2017.

[21]

G. Peyré and M. Cuturi, "Computational Optimal Transport," Foundations and Trends in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.

Digital Library

[22]

M. Cuturi, "Sinkhorn distances: Lightspeed computation of optimal transport," in Advances in neural information processing systems, pp. 2292–2300, 2013.

[23]

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS '16, (New York, NY, USA), p. 308–318, Association for Computing Machinery, 2016.

[24]

R. Torkzadehmahani, P. Kairouz, and B. Paten, "Dp-cgan: Differentially private synthetic data and label generation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.

[25]

J. Jordon, J. Yoon, and M. van der Schaar, "Pate-gan: Generating synthetic data with differential privacy guarantees," in International Conference on Learning Representations, 2018.

[26]

N. Papernot, M. Abadi, Úlfar Erlingsson, I. Goodfellow, and K. Talwar, "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data," in International Conference on Learning Representations, 2017.

[27]

Y. Long, S. Lin, Z. Yang, C. A. Gunter, H. Liu, and B. Li, "Scalable differentially private data generation via private aggregation of teacher ensembles," 2019.

[28]

G. Acs, L. Melis, C. Castelluccia, and E. De Cristofaro, "Differentially private mixture of generative neural networks," IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 6, pp. 1109–1121, 2018.

[29]

F. Harder, K. Adamczewski, and M. Park, "Differentially private mean embeddings with random features (dp-merf) for simple & practical synthetic data generation," arXiv preprint arXiv:2002.11603, 2020.

[30]

S. Takagi, T. Takahashi, Y. Cao, and M. Yoshikawa, "P3gm: Private high-dimensional data release via privacy preserving phased generative model," arXiv preprint arXiv:2006.12101, 2020.

[31]

A. Sarwate, "Retraction for symmetric matrix perturbation for differentially-private principal component analysis," 2017.

[32]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), pp. 2672–2680, Curran Associates, Inc., 2014.

[33]

M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein gan," arXiv preprint arXiv:1701.07875, 2017.

[34]

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, "Improved training of wasserstein gans," in Advances in neural information processing systems, pp. 5767–5777, 2017.

[35]

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, "Spectral normalization for generative adversarial networks," in International Conference on Learning Representations, 2018.

[36]

G. Peyré, M. Cuturi, et al., "Computational optimal transport: With applications to data science," Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.

Digital Library

[37]

J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré, "Interpolating between optimal transport and mmd using sinkhorn divergences," in The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2681–2690, 2019.

[38]

I. Mironov, "Rényi differential privacy," in 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263–275, 2017.

[39]

Y.-X. Wang, B. Balle, and S. P. Kasiviswanathan, "Subsampled rényi differential privacy and analytical moments accountant," in The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1226–1235, PMLR, 2019.

[40]

B. Balle, G. Barthe, and M. Gaboardi, "Privacy amplification by subsampling: Tight analyses via couplings and divergences," in Advances in Neural Information Processing Systems, pp. 6277–6287, 2018.

[41]

Y. Zhu and Y.-X. Wang, "Poission subsampled rényi differential privacy," in International Conference on Machine Learning, pp. 7634–7642, 2019.

[42]

T. Salimans, H. Zhang, A. Radford, and D. Metaxas, "Improving GANs using optimal transport," in International Conference on Learning Representations, 2018.

[43]

B. Poole, S. Ozair, A. Van Den Oord, A. Alemi, and G. Tucker, "On variational bounds of mutual information," in International Conference on Machine Learning, pp. 5171–5180, PMLR, 2019.

[44]

L. Mescheder, S. Nowozin, and A. Geiger, "The numerics of gans," in Advances in Neural Information Processing Systems, pp. 1825–1835, 2017.

[45]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[46]

H. Xiao, K. Rasul, and R. Vollgraf, "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms," arXiv preprint arXiv:1708.07747, 2017.

[47]

Z. Liu, P. Luo, X. Wang, and X. Tang, "Deep learning face attributes in the wild," in Proceedings of International Conference on Computer Vision (ICCV), December 2015.

[48]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, "Gans trained by a two time-scale update rule converge to a local nash equilibrium," in Advances in neural information processing systems, pp. 6626–6637, 2017.

[49]

A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.

[50]

D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.

[51]

B. K. Beaulieu-Jones, Z. S. Wu, C. Williams, R. Lee, S. P. Bhavnani, J. B. Byrd, and C. S. Greene, "Privacy-preserving generative deep neural networks support clinical data sharing," Circulation: Cardiovascular Quality and Outcomes, vol. 12, no. 7, p. e005122, 2019.

[52]

R. Cummings, V. Gupta, D. Kimpara, and J. Morgenstern, "On the compatibility of privacy and fairness," in Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, UMAP'19 Adjunct, (New York, NY, USA), p. 309–315, Association for Computing Machinery, 2019.

[53]

S. Kuppam, R. McKenna, D. Pujol, M. Hay, A. Machanavajjhala, and G. Miklau, "Fair decision making using privacy-protected data," CoRR, vol. abs/1905.12744, 2019.

[54]

S. Agarwal, Trade-Offs between Fairness, Interpretability, and Privacy in Machine Learning. PhD thesis, University of Waterloo, 2020.

[55]

A. Grover, J. Song, A. Kapoor, K. Tran, A. Agarwal, E. J. Horvitz, and S. Ermon, "Bias correction of learned generative models using likelihood-free importance weighting," in Advances in Neural Information Processing Systems, 2019.

[56]

K. Choi, A. Grover, T. Singh, R. Shu, and S. Ermon, "Fair generative modeling via weak supervision," in Proceedings of the 37th International Conference on Machine Learning, 2020.

[57]

N. Yu, K. Li, P. Zhou, J. Malik, L. Davis, and M. Fritz, "Inclusive GAN: improving data and minority coverage in generative models," in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXII, 2020.

[58]

J. Lee, H. Kim, Y. Hong, and H. W. Chung, "Self-diagnosing gan: Diagnosing underrepresented samples in generative adversarial networks," arXiv preprint arXiv:2102.12033, 2021.

[59]

J. Feydy, Geometric data analysis, beyond convolutions. PhD thesis, ENS, Mar 2020.

Recommendations

Working at the web search engine side to generate privacy-preserving user profiles

A review of the current literature related to anonymizing query logs is presented.A system that anonymizes query logs and related user profiles in real-time is proposed.The system is implemented and its practical deployment is empirically studied. The ...
Modeling and Analyzing Users' Privacy Disclosure Behavior to Generate Personalized Privacy Policies
Personalised anonymity for microdata release

Individual privacy protection in the released data sets has become an important issue in recent years. The release of microdata provides a significant information resource for researchers, whereas the release of person‐specific data poses a threat to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

December 2021

30517 pages

ISBN:9781713845393

Copyright © 2021 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 June 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents