Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3692070.3692799guideproceedingsArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Differentially private sum-product networks

Published: 21 July 2024 Publication History

Abstract

Differentially private ML approaches seek to learn models which may be publicly released while guaranteeing that the input data is kept private. One issue with this construction is that further model releases based on the same training data (e.g. for a new task) incur a further privacy budget cost. Privacy-preserving synthetic data generation is one possible solution to this conundrum. However, models trained on synthetic private data struggle to approach the performance of private, ad-hoc models. In this paper, we present a novel method based on sum-product networks that is able to perform both privacy-preserving classification and privacy-preserving data generation with a single model. To the best of our knowledge, ours is the first approach that provides both discriminative and generative capabilities to differentially private ML. We show that our approach outperforms the state of the art in terms of stability (i.e. number of training runs required for convergence) and utility of the generated data.

References

[1]
Cervical Cancer Behavior Risk. UCI Machine Learning Repository, 2019.
[2]
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308-318, 2016.
[3]
Arnold, C. and Neunhoeffer, M. Really useful synthetic data-a framework to evaluate the quality of differentially private synthetic data. arXiv preprint arXiv:2004.07740, 2020.
[4]
Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996.
[5]
Bowen, C. M. and Snoke, J. Comparative study of differentially private synthetic data algorithms from the nist pscr differential privacy synthetic data challenge. arXiv preprint arXiv:1911.12704, 2019.
[6]
Butz, C. J., Oliveira, J. S., Santos, A. E., Teixeira, A. L., Poupart, P., and Kalra, A. An empirical study of methods for spn learning and inference. In International Conference on Probabilistic Graphical Models, pp. 49-60. PMLR, 2018.
[7]
Chen, D., Orekondy, T., and Fritz, M. Gs-wgan: A gradient-sanitized approach for learning differentially private generators. Advances in Neural Information Processing Systems, 33:12673-12684, 2020.
[8]
Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211-407, 2014.
[9]
Gens, R. and Domingos, P. Discriminative learning of sum-product networks. Advances in Neural Information Processing Systems, 25, 2012.
[10]
Gens, R. and Domingos, P. Learning the structure of sum-product networks. In International conference on machine learning, pp. 873-880. PMLR, 2013.
[11]
Ghosh, A., Roughgarden, T., and Sundararajan, M. Universally utility-maximizing privacy mechanisms. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 351-360, 2009.
[12]
Google. differential-privacy, 04 2023. URL https://github.com/google/differential-privacy.
[13]
Hofmann, H. Statlog (German Credit Data). UCI Machine Learning Repository, 1994.
[14]
Jayaraman, B. and Evans, D. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pp. 1895-1912, 2019.
[15]
Jordon, J., Yoon, J., and Van Der Schaar, M. Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations, 2018.
[16]
Jordon, J., Yoon, J., and Van Der Schaar, M. Code-base for "pate-gan: Generating synthetic data with differential privacy guarantees", 2021. URL https://github.com/vanderschaarlab/mlforhealthlabpub/tree/main/alg/pategan.
[17]
Kroes, S. K., van Leeuwen, M., Groenwold, R. H., and Janssen, M. P. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. Journal of the American Medical Informatics Association, 30(1):16-25, 2023.
[18]
McKenna, R., Miklau, G., and Sheldon, D. Winning the nist contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978, 2021.
[19]
McKenna, R., Mullins, B., Sheldon, D., and Miklau, G. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677, 2022.
[20]
McSherry, F. D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 19-30, 2009.
[21]
Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., and Kersting, K. Mixed sum-product networks: A deep architecture for hybrid domains. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[22]
Molina, A., Vergari, A., Stelzner, K., Peharz, R., Subramani, P., Mauro, N. D., Poupart, P., and Kersting, K. Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks. CoRR, abs/1901.03704, 2019. URL http://arxiv.org/abs/1901.03704.
[23]
Moro, S., Cortez, P., and Rita, P. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22-31, 2014.
[24]
NIST. "differential privacy synthetic data challenge", 2018. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic [Accessed: (19.12.2023)].
[25]
Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755, 2016.
[26]
Peharz, R., Vergari, A., Stelzner, K., Molina, A., Trapp, M., Shao, X., Kersting, K., and Ghahramani, Z. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Globerson, A. and Silva, R. (eds.), Proc. of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019, volume 115 of Proceedings of Machine Learning Research, pp. 334-344. AUAI Press, 2019. URL http://proceedings.mlr.press/v115/peharz20a.html.
[27]
Póczos, B., Ghahramani, Z., and Schneider, J. Copula-based kernel dependency measures. arXiv preprint arXiv:1206.4682, 2012.
[28]
Poon, H. and Domingos, P. Sum-product networks: A new deep architecture. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689-690. IEEE, 2011.
[29]
Sánchez-Cauce, R., París, I., and Díez, F. J. Sum-product networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[30]
Semerdjian, J. and Frank, S. An ensemble classifier for predicting the onset of type ii diabetes. arXiv preprint arXiv:1708.07480, 2017.
[31]
Su, D., Cao, J., Li, N., Bertino, E., and Jin, H. Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy, pp. 26-37, 2016.
[32]
Takagi, S., Takahashi, T., Cao, Y., and Yoshikawa, M. P3gm: Private high-dimensional data release via privacy preserving phased generative model, 2022.
[33]
Tao, Y., McKenna, R., Hay, M., Machanavajjhala, A., and Miklau, G. Benchmarking differentially private synthetic data generation algorithms. arXiv preprint arXiv:2112.09238, 2021.
[34]
Treiber, A., Molina, A., Weinert, C., Schneider, T., and Kersting, K. Cryptospn: Expanding ppml beyond neural networks. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 9-14, 2020.
[35]
Vaidya, J., Shafiq, B., Basu, A., and Hong, Y. Differentially private naive bayes classification. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 1, pp. 571-576. IEEE, 2013.
[36]
Vergari, A., Di Mauro, N., and Esposito, F. Simplifying, regularizing and strengthening sum-product network structure learning. In Machine Learning and Knowledge Discovery in Databases, pp. 343-358. Springer International Publishing, 2015. ISBN 978-3-319-23525-7.
[37]
Yao, A. C. Protocols for Secure Computations. In Proc. of the 23rd Annual Symposium on Foundations of Computer Science, pp. 160-164. IEEE, 1982.
[38]
Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., and Xiao, X. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1-41, 2017.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'24: Proceedings of the 41st International Conference on Machine Learning
July 2024
63010 pages

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media