research-article

Differentially private sum-product networks

AUTHORs:

Xenia Heilmann,

Mattia Cerrato,

Ernst AlthausAuthors Info & Claims

ICML'24: Proceedings of the 41st International Conference on Machine Learning

Article No.: 729, Pages 18155 - 18173

Published: 21 July 2024 Publication History

Abstract

Differentially private ML approaches seek to learn models which may be publicly released while guaranteeing that the input data is kept private. One issue with this construction is that further model releases based on the same training data (e.g. for a new task) incur a further privacy budget cost. Privacy-preserving synthetic data generation is one possible solution to this conundrum. However, models trained on synthetic private data struggle to approach the performance of private, ad-hoc models. In this paper, we present a novel method based on sum-product networks that is able to perform both privacy-preserving classification and privacy-preserving data generation with a single model. To the best of our knowledge, ours is the first approach that provides both discriminative and generative capabilities to differentially private ML. We show that our approach outperforms the state of the art in terms of stability (i.e. number of training runs required for convergence) and utility of the generated data.

References

[1]

Cervical Cancer Behavior Risk. UCI Machine Learning Repository, 2019.

[2]

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 308-318, 2016.

Digital Library

[3]

Arnold, C. and Neunhoeffer, M. Really useful synthetic data-a framework to evaluate the quality of differentially private synthetic data. arXiv preprint arXiv:2004.07740, 2020.

[4]

Becker, B. and Kohavi, R. Adult. UCI Machine Learning Repository, 1996.

[5]

Bowen, C. M. and Snoke, J. Comparative study of differentially private synthetic data algorithms from the nist pscr differential privacy synthetic data challenge. arXiv preprint arXiv:1911.12704, 2019.

[6]

Butz, C. J., Oliveira, J. S., Santos, A. E., Teixeira, A. L., Poupart, P., and Kalra, A. An empirical study of methods for spn learning and inference. In International Conference on Probabilistic Graphical Models, pp. 49-60. PMLR, 2018.

[7]

Chen, D., Orekondy, T., and Fritz, M. Gs-wgan: A gradient-sanitized approach for learning differentially private generators. Advances in Neural Information Processing Systems, 33:12673-12684, 2020.

[8]

Dwork, C., Roth, A., et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3-4):211-407, 2014.

Digital Library

[9]

Gens, R. and Domingos, P. Discriminative learning of sum-product networks. Advances in Neural Information Processing Systems, 25, 2012.

[10]

Gens, R. and Domingos, P. Learning the structure of sum-product networks. In International conference on machine learning, pp. 873-880. PMLR, 2013.

Digital Library

[11]

Ghosh, A., Roughgarden, T., and Sundararajan, M. Universally utility-maximizing privacy mechanisms. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 351-360, 2009.

Digital Library

[12]

Google. differential-privacy, 04 2023. URL https://github.com/google/differential-privacy.

[13]

Hofmann, H. Statlog (German Credit Data). UCI Machine Learning Repository, 1994.

[14]

Jayaraman, B. and Evans, D. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19), pp. 1895-1912, 2019.

[15]

Jordon, J., Yoon, J., and Van Der Schaar, M. Pate-gan: Generating synthetic data with differential privacy guarantees. In International conference on learning representations, 2018.

[16]

Jordon, J., Yoon, J., and Van Der Schaar, M. Code-base for "pate-gan: Generating synthetic data with differential privacy guarantees", 2021. URL https://github.com/vanderschaarlab/mlforhealthlabpub/tree/main/alg/pategan.

[17]

Kroes, S. K., van Leeuwen, M., Groenwold, R. H., and Janssen, M. P. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. Journal of the American Medical Informatics Association, 30(1):16-25, 2023.

[18]

McKenna, R., Miklau, G., and Sheldon, D. Winning the nist contest: A scalable and general approach to differentially private synthetic data. arXiv preprint arXiv:2108.04978, 2021.

[19]

McKenna, R., Mullins, B., Sheldon, D., and Miklau, G. Aim: An adaptive and iterative mechanism for differentially private synthetic data. arXiv preprint arXiv:2201.12677, 2022.

[20]

McSherry, F. D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 19-30, 2009.

Digital Library

[21]

Molina, A., Vergari, A., Di Mauro, N., Natarajan, S., Esposito, F., and Kersting, K. Mixed sum-product networks: A deep architecture for hybrid domains. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.

[22]

Molina, A., Vergari, A., Stelzner, K., Peharz, R., Subramani, P., Mauro, N. D., Poupart, P., and Kersting, K. Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks. CoRR, abs/1901.03704, 2019. URL http://arxiv.org/abs/1901.03704.

[23]

Moro, S., Cortez, P., and Rita, P. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22-31, 2014.

[24]

NIST. "differential privacy synthetic data challenge", 2018. https://www.nist.gov/ctl/pscr/open-innovation-prize-challenges/past-prize-challenges/2018-differential-privacy-synthetic [Accessed: (19.12.2023)].

[25]

Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755, 2016.

[26]

Peharz, R., Vergari, A., Stelzner, K., Molina, A., Trapp, M., Shao, X., Kersting, K., and Ghahramani, Z. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Globerson, A. and Silva, R. (eds.), Proc. of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019, volume 115 of Proceedings of Machine Learning Research, pp. 334-344. AUAI Press, 2019. URL http://proceedings.mlr.press/v115/peharz20a.html.

[27]

Póczos, B., Ghahramani, Z., and Schneider, J. Copula-based kernel dependency measures. arXiv preprint arXiv:1206.4682, 2012.

[28]

Poon, H. and Domingos, P. Sum-product networks: A new deep architecture. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689-690. IEEE, 2011.

[29]

Sánchez-Cauce, R., París, I., and Díez, F. J. Sum-product networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.

[30]

Semerdjian, J. and Frank, S. An ensemble classifier for predicting the onset of type ii diabetes. arXiv preprint arXiv:1708.07480, 2017.

[31]

Su, D., Cao, J., Li, N., Bertino, E., and Jin, H. Differentially private k-means clustering. In Proceedings of the sixth ACM conference on data and application security and privacy, pp. 26-37, 2016.

Digital Library

[32]

Takagi, S., Takahashi, T., Cao, Y., and Yoshikawa, M. P3gm: Private high-dimensional data release via privacy preserving phased generative model, 2022.

[33]

Tao, Y., McKenna, R., Hay, M., Machanavajjhala, A., and Miklau, G. Benchmarking differentially private synthetic data generation algorithms. arXiv preprint arXiv:2112.09238, 2021.

[34]

Treiber, A., Molina, A., Weinert, C., Schneider, T., and Kersting, K. Cryptospn: Expanding ppml beyond neural networks. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 9-14, 2020.

Digital Library

[35]

Vaidya, J., Shafiq, B., Basu, A., and Hong, Y. Differentially private naive bayes classification. In 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), volume 1, pp. 571-576. IEEE, 2013.

Digital Library

[36]

Vergari, A., Di Mauro, N., and Esposito, F. Simplifying, regularizing and strengthening sum-product network structure learning. In Machine Learning and Knowledge Discovery in Databases, pp. 343-358. Springer International Publishing, 2015. ISBN 978-3-319-23525-7.

[37]

Yao, A. C. Protocols for Secure Computations. In Proc. of the 23rd Annual Symposium on Foundations of Computer Science, pp. 160-164. IEEE, 1982.

[38]

Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., and Xiao, X. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1-41, 2017.

Index Terms

Differentially private sum-product networks

Index terms have been assigned to the content through auto-classification.

Recommendations

A differentially private algorithm for location data release

The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially Private K-Anonymity
FIT '14: Proceedings of the 2014 12th International Conference on Frontiers of Information Technology

Research in privacy preserving data publication can be broadly categorized in two classes. Syntactic privacy definitions have been under the cursor of the research community for the past many years. A lot of research is primarily dedicated to developing ...
Differentially Private Naive Bayes Classification
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01

Privacy and security concerns often prevent the sharing of users' data or even of the knowledge gained from it, thus deterring valuable information from being utilized. Privacy-preserving knowledge discovery, if done correctly, can alleviate this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'24: Proceedings of the 41st International Conference on Machine Learning

July 2024

63010 pages

Copyright © 2024.

Publisher

JMLR.org

Publication History

Published: 21 July 2024

Qualifiers

Research-article
Research
Refereed limited

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten