research-article

Open access

High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data

Authors:

Di WangAuthors Info & Claims

PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Pages 227 - 236

https://doi.org/10.1145/3517804.3524144

Published: 13 June 2022 Publication History

PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data

Pages 227 - 236

Abstract
Supplementary Material
References

Abstract

As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distributions or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distributions, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the l1-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of Õ(log d/(nε)1/3) in the ε-DP model, where n is the sample size and d is the dimension of the underlying space. Next, for LASSO, if the data distribution has bounded fourth-order moments, we improve the bound to Õ(log d/(nε)2/5) in the $(ε, δ)-DP model. In the second part of the paper, we study sparse learning with heavy-tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of Õ ((s*2 log2d)/nε), where s* is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity (i.e., l0-norm) constraint, and show that it is possible to achieve an error of Õ((s*3/2 log d)/nε), which is also near optimal up to a factor of Õ(√s*), if the loss function is smooth and strongly convex.

Supplementary Material

MP4 File (podsfp14-hu.mp4)

Presentation video

Download
32.20 MB

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 308--318.

Digital Library

[2]

Hilal Asi, Vitaly Feldman, Tomer Koren, and Kunal Talwar. 2021. Private Stochastic Convex Optimization: Optimal Rates in l1 Geometry. arXiv preprint arXiv:2103.01516 (2021).

[3]

Sohail Bahmani, Bhiksha Raj, and Petros T Boufounos. 2013. Greedy sparsity-constrained optimization. Journal of Machine Learning Research, Vol. 14, Mar (2013), 807--841.

Digital Library

[4]

Rina Foygel Barber and John C Duchi. 2014. Privacy and statistical risk: Formalisms and minimax bounds. arXiv preprint arXiv:1412.4451 (2014).

[5]

Raef Bassily, Vitaly Feldman, Cristóbal Guzmán, and Kunal Talwar. 2020. Stability of stochastic gradient descent on nonsmooth convex losses. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[6]

Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradee Thakurta. 2019. Private Stochastic Convex Optimization with Optimal Rates. In NeurIPS .

[7]

Raef Bassily, Adam Smith, and Abhradeep Thakurta. 2014. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on. IEEE, 464--473.

Digital Library

[8]

Atanu Biswas, Sujay Datta, Jason P Fine, and Mark R Segal. 2007. Statistical advances in the biomedical science .Wiley Online Library.

[9]

Christian Brownlees, Emilien Joly, Gábor Lugosi, et al. 2015. Empirical risk minimization for heavy-tailed losses. The Annals of Statistics, Vol. 43, 6 (2015), 2507--2536.

[10]

Victor-Emmanuel Brunel and Marco Avella-Medina. 2020. Propose, Test, Release: Differentially private estimation with high probability. arXiv preprint arXiv:2002.08774 (2020).

[11]

Sébastien Bubeck, Nicolo Cesa-Bianchi, and Gábor Lugosi. 2013. Bandits with heavy tail. IEEE Transactions on Information Theory, Vol. 59, 11 (2013), 7711--7717.

Digital Library

[12]

Mark Bun and Thomas Steinke. 2019. Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation. arXiv preprint arXiv:1906.02830 (2019).

[13]

T Tony Cai, Yichen Wang, and Linjun Zhang. 2019. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. arXiv preprint arXiv:1902.04495 (2019).

[14]

T Tony Cai, Yichen Wang, and Linjun Zhang. 2020. The Cost of Privacy in Generalized Linear Models: Algorithms and Minimax Lower Bounds. arXiv preprint arXiv:2011.03900 (2020).

[15]

Olivier Catoni. 2012. Challenging the empirical mean and empirical variance: a deviation study. In Annales de l'IHP Probabilités et statistiques, Vol. 48. 1148--1185.

[16]

Olivier Catoni and Ilaria Giulini. 2017. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 (2017).

[17]

Kamalika Chaudhuri and Claire Monteleoni. 2009. Privacy-preserving logistic regression. In Advances in neural information processing systems. 289--296.

[18]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research, Vol. 12, Mar (2011), 1069--1109.

Digital Library

[19]

Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In Advances in Neural Information Processing Systems. 3571--3580.

[20]

John C Duchi, Michael I Jordan, and Martin J Wainwright. 2013. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. IEEE, 429--438.

Digital Library

[21]

John C Duchi, Michael I Jordan, and Martin J Wainwright. 2018. Minimax optimal procedures for locally private estimation. J. Amer. Statist. Assoc., Vol. 113, 521 (2018), 182--201.

[22]

Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics. In Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 371--380.

Digital Library

[23]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265--284.

[24]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.

Digital Library

[25]

Jianqing Fan, Weichen Wang, and Ziwei Zhu. 2016. A shrinkage principle for heavy-tailed data: High-dimensional robust low-rank matrix recovery. arXiv preprint arXiv:1603.08315 (2016).

[26]

Vitaly Feldman, Tomer Koren, and Kunal Talwar. 2020. Private stochastic convex optimization: optimal rates in linear time. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 439--449.

Digital Library

[27]

Matthew Holland and Kazushi Ikeda. 2019. Better generalization with less data using robust gradient descent. In International Conference on Machine Learning. 2761--2770.

[28]

Daniel Hsu and Sivan Sabato. 2016. Loss minimization and parameter estimation with heavy tails. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 543--582.

Digital Library

[29]

Lijie Hu, Shuo Ni, Hanshen Xiao, and Di Wang. 2021. High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data. arXiv preprint arXiv:2107.11136 (2021).

[30]

Marat Ibragimov, Rustam Ibragimov, and Johan Walden. 2015. Heavy-tailed distributions and robustness in economics and finance. Vol. 214. Springer.

[31]

Roger Iyengar, Joseph P Near, Dawn Song, Om Thakkar, Abhradeep Thakurta, and Lun Wang. 2019. Towards practical differentially private convex optimization. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 299--316.

[32]

Gautam Kamath, Xingtu Liu, and Huanyu Zhang. 2021. Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data. arXiv preprint arXiv:2106.01336 (2021).

[33]

Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. 2020. Private mean estimation of heavy-tailed distributions. In Conference on Learning Theory. PMLR, 2204--2235.

[34]

Shiva Prasad Kasiviswanathan and Hongxia Jin. 2016. Efficient private empirical risk minimization for high-dimensional learning. In International Conference on Machine Learning. 488--497.

[35]

Daniel Kifer, Adam Smith, and Abhradeep Thakurta. 2012. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory. 25--1.

[36]

Guillaume Lecué, Matthieu Lerasle, and Timothée Mathieu. 2018. Robust classification via MOM minimization. arXiv preprint arXiv:1808.03106 (2018).

[37]

Xiyang Liu, Weihao Kong, Sham Kakade, and Sewoong Oh. 2021. Robust and differentially private mean estimation. arXiv preprint arXiv:2102.09159 (2021).

[38]

Po-Ling Loh and Martin J Wainwright. 2013. Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems. 476--484.

[39]

Gábor Lugosi and Shahar Mendelson. 2019. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society (2019).

[40]

Stanislav Minsker et al. 2015. Geometric median and robust estimation in Banach spaces. Bernoulli, Vol. 21, 4 (2015), 2308--2335.

[41]

Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing. ACM, 75--84.

Digital Library

[42]

Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. 2018. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485 (2018).

[43]

Adam Smith, Abhradeep Thakurta, and Jalaj Upadhyay. 2017. Is interaction necessary for distributed private learning?. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 58--77.

[44]

Shuang Song, Om Thakkar, and Abhradeep Thakurta. 2020. Characterizing private clipped gradient descent on convex generalized linear problems. arXiv preprint arXiv:2006.06783 (2020).

[45]

Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2015. Nearly-optimal private LASSO. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. 3025--3033.

[46]

Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and XiaoFeng Wang. 2017. Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12. CoRR, Vol. abs/1709.02753 (2017). arxiv: 1709.02753

[47]

Youming Tao, Yulian Wu, Peng Zhao, and Di Wang. 2021. Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits. arXiv preprint arXiv:2106.02575 (2021).

[48]

Vladimir Vapnik. 2013. The nature of statistical learning theory .Springer science & business media.

Digital Library

[49]

Di Wang, Jiahao Ding, Zejun Xie, Miao Pan, and Jinhui Xu. 2020 a. Differentially Private (Gradient) Expectation Maximization Algorithm with Statistical Guarantees. CoRR, Vol. abs/2010.13520 (2020).

[50]

Di Wang, Marco Gaboardi, Adam Smith, and Jinhui Xu. 2020 b. Empirical Risk Minimization in the Non-interactive Local Model of Differential Privacy. Journal of Machine Learning Research, Vol. 21, 200 (2020), 1--39.

[51]

Di Wang, Hanshen Xiao, Srini Devadas, and Jinhui Xu. 2020 c. On Differentially Private Stochastic Convex Optimization with Heavy-tailed Data. arXiv preprint arXiv:2010.11082 (2020).

[52]

Di Wang and Jinhui Xu. 2019. On Sparse Linear Regression in the Local Differential Privacy Model. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 6628--6637.

[53]

Di Wang and Jinhui Xu. 2021. On Sparse Linear Regression in the Local Differential Privacy Model. IEEE Trans. Inf. Theory, Vol. 67, 2 (2021), 1182--1200.

[54]

Lingxiao Wang and Quanquan Gu. 2019. Differentially private iterative gradient hard thresholding for sparse learning. In 28th International Joint Conference on Artificial Intelligence .

Digital Library

[55]

Lingxiao Wang and Quanquan Gu. 2020. A Knowledge Transfer Framework for Differentially Private Sparse Learning. In AAAI. 6235--6242.

[56]

Robert F Woolson and William R Clarke. 2011. Statistical methods for the analysis of biomedical data. Vol. 371. John Wiley & Sons.

[57]

Lijun Zhang and Zhi-Hua Zhou. 2018. l1-regression with Heavy-tailed Distributions. In Advances in Neural Information Processing Systems. 1076--1086.

[58]

Yingxue Zhou, Zhiwei Steven Wu, and Arindam Banerjee. 2020. Bypassing the ambient dimension: Private sgd with gradient subspace identification. arXiv preprint arXiv:2007.03813 (2020).

Cited By

Wang DXu J(2025)Private Least Absolute Deviations with Heavy-tailed DataTheoretical Computer Science10.1016/j.tcs.2025.115071(115071)Online publication date: Jan-2025
https://doi.org/10.1016/j.tcs.2025.115071
Ma YJia KYang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Better locally private sparse estimation given multiple samples per userProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693441(33746-33776)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693441
Ding YWu XMeng YLuo YWang HPan WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Delving into differentially private transformerProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692510(11049-11071)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692510
Show More Cited By

Index Terms

High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
2. Security and privacy
  1. Database and storage security
    1. Data anonymization and sanitization

Recommendations

Differentially private stochastic optimization: new results in convex and non-convex settings
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the l₂ setting achieves optimal excess population ...
Differentially private hypothesis testing for linear regression

In this work, we design differentially private hypothesis tests for the following problems in the multivariate linear regression model: testing a linear relationship and testing for the presence of mixtures. The majority of our hypothesis tests are based ...
Differentially private clustering in high-dimensional Euclidean spaces
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PODS '22: Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

June 2022

462 pages

ISBN:9781450392600

DOI:10.1145/3517804

General Chair:
Leonid Libkin
University of Edinburgh, Scotland
,
Program Chair:
Pablo Barceló
Catholic University of Chile, Chile

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

King Abdullah University of Science and Technology
King Abdullah University of Science and Technology

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 12 - 17, 2022

PA, Philadelphia, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
919
Total Downloads

Downloads (Last 12 months)407
Downloads (Last 6 weeks)46

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang DXu J(2025)Private Least Absolute Deviations with Heavy-tailed DataTheoretical Computer Science10.1016/j.tcs.2025.115071(115071)Online publication date: Jan-2025
https://doi.org/10.1016/j.tcs.2025.115071
Ma YJia KYang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Better locally private sparse estimation given multiple samples per userProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693441(33746-33776)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693441
Ding YWu XMeng YLuo YWang HPan WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Delving into differentially private transformerProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692510(11049-11071)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692510
Liu WMao XZhang XZhang X(2024)Efficient Sparse Least Absolute Deviation Regression With Differential PrivacyIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334905419(2328-2339)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3349054
Khanna ARaff EInkawhich N(2024)SoK: A Review of Differentially Private Linear Models For High-Dimensional Data2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00012(57-77)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00012
Qiu YLiu JWang D(2024)Truthful and privacy-preserving generalized linear modelsInformation and Computation10.1016/j.ic.2024.105225(105225)Online publication date: Sep-2024
https://doi.org/10.1016/j.ic.2024.105225
Jin CZhou KHan BCheng JZeng T(2024)Efficient private SCO for heavy-tailed data via averaged clippingMachine Learning10.1007/s10994-024-06617-9113:11-12(8487-8532)Online publication date: 30-Sep-2024
https://doi.org/10.1007/s10994-024-06617-9
Xiao HWan JDevadas SMeng WJensen CCremers CKirda E(2023)Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep LearningProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623142(2636-2650)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623142
Chen JWang CNg MWang D(2023)High Dimensional Statistical Estimation Under Uniformly Dithered One-Bit QuantizationIEEE Transactions on Information Theory10.1109/TIT.2023.326627169:8(5151-5187)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TIT.2023.3266271
Tao YCui SXu WYin HYu DLiang WCheng X(2023)Byzantine-Resilient Federated Learning at EdgeIEEE Transactions on Computers10.1109/TC.2023.325751072:9(2600-2614)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TC.2023.3257510
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten