Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient Sparse Least Absolute Deviation Regression With Differential Privacy

Published: 01 January 2024 Publication History

Abstract

In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss). In this work, we aim to develop a fast privacy-preserving learning solution for a sparse robust regression problem. Our learning loss consists of a robust least absolute loss and an <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the <inline-formula> <tex-math notation="LaTeX">$(\epsilon,\delta)$ </tex-math></inline-formula>-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.

References

[1]
C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407, 2014.
[2]
D. Wang and J. Xu, “On sparse linear regression in the local differential privacy model,” IEEE Trans. Inf. Theory, vol. 67, no. 2, pp. 1182–1200, Feb. 2021.
[3]
T. T. Cai, Y. Wang, and L. Zhang, “The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy,” Ann. Statist., vol. 49, no. 5, pp. 2825–2850, Oct. 2021.
[4]
D. Wang, A. Smith, and J. Xu, “High dimensional sparse linear regression under local differential privacy: Power and limitations,” in Proc. NIPS Workshop Privacy-Preserving Mach. Learn., vol. 235, 2018, pp. 1–5.
[5]
J. Ren, J. Xiong, Z. Yao, R. Ma, and M. Lin, “DPLK-means: A novel differential privacy K-means mechanism,” in Proc. IEEE 2nd Int. Conf. Data Sci. Cyberspace (DSC), Jun. 2017, pp. 133–139.
[6]
Y. Mülle, C. Clifton, and K. Böhm, “Privacy-integrated graph clustering through differential privacy,” in Proc. EDBT/ICDT Workshops, vol. 157, 2015, pp. 247–254.
[7]
M. Abadiet al., “Deep learning with differential privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
[8]
T. Ha, T. K. Dang, T. T. Dang, T. A. Truong, and M. T. Nguyen, “Differential privacy in deep learning: An overview,” in Proc. Int. Conf. Adv. Comput. Appl. (ACOMP), Nov. 2019, pp. 97–102.
[9]
Y. Zhu, X. Yu, M. Chandraker, and Y.-X. Wang, “Private-kNN: Practical differential privacy for computer vision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11851–11859.
[10]
J. L. Horowitz, “Bootstrap methods for median regression models,” Econometrica, vol. 66, no. 6, pp. 1327–1351, Nov. 1998.
[11]
B. Jayaraman, L. Wang, D. Evans, and Q. Gu, “Distributed learning without distress: Privacy-preserving empirical risk minimization,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–12.
[12]
J. Zhang, K. Zheng, W. Mou, and L. Wang, “Efficient private ERM for smooth objectives,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3922–3928.
[13]
L. Wang, B. Jayaraman, D. Evans, and Q. Gu, “Efficient privacy-preserving stochastic nonconvex optimization,” in Proc. 39th Conf. Uncertainty Artif. Intell., vol. 216, 2023, pp. 2203–2213.
[14]
H. Wang and C. Li, “Distributed quantile regression over sensor networks,” IEEE Trans. Signal Inf. Process. Over Netw., vol. 4, no. 2, pp. 338–348, Jun. 2018.
[15]
H. Wang, L. Xia, and C. Li, “Distributed online quantile regression over networks with quantized communication,” Signal Process., vol. 157, pp. 141–150, Apr. 2019.
[16]
Y. Tian, M. Tian, and Q. Zhu, “Linear quantile regression based on EM algorithm,” Commun. Statist.-Theory Methods, vol. 43, no. 16, pp. 3464–3484, Aug. 2014.
[17]
Y. Tian, Q. Zhu, and M. Tian, “Estimation of linear composite quantile regression using EM algorithm,” Statist. Probab. Lett., vol. 117, pp. 183–191, Oct. 2016.
[18]
Y.-H. Zhou, Z.-X. Ni, and Y. Li, “Quantile regression via the EM algorithm,” Commun. Statist. - Simul. Comput., vol. 43, no. 10, pp. 2162–2172, Nov. 2014.
[19]
L. Yu and N. Lin, “ADMM for penalized quantile regression in big data,” Int. Stat. Rev., vol. 85, no. 3, pp. 494–518, Dec. 2017.
[20]
Y. Gu, J. Fan, L. Kong, S. Ma, and H. Zou, “ADMM for high-dimensional sparse penalized quantile regression,” Technometrics, vol. 60, no. 3, pp. 319–331, Jul. 2018.
[21]
X. He, X. Pan, K. M. Tan, and W.-X. Zhou, “Smoothed quantile regression with large-scale inference,” J. Econometrics, vol. 232, no. 2, pp. 367–388, Feb. 2023.
[22]
P. J. Huber, “Robust statistics,” in International Encyclopedia of Statistical Science. Berlin, Germany: Springer, 2011, pp. 1248–1251.
[23]
C. Yu and W. Yao, “Robust linear regression: A review and comparison,” Commun. Statist.-Simul. Comput., vol. 46, no. 8, pp. 6261–6282, Sep. 2017.
[24]
A. F. Siegel, “Robust regression using repeated medians,” Biometrika, vol. 69, no. 1, pp. 242–244, 1982.
[25]
P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute deviation,” J. Amer. Stat. Assoc., vol. 88, no. 424, pp. 1273–1283, Dec. 1993.
[26]
P. Rousseeuw and V. Yohai, “Robust regression by means of s-estimators,” in Robust and Nonlinear Time Series Analysis. Berlin, Germany: Springer, 1984, pp. 256–272.
[27]
Y. Li and J. Zhu, “ℓ1-norm quantile regression,” J. Comput. Graph. Statist., vol. 17, no. 1, pp. 163–185, 2008.
[28]
A. Belloni and V. Chernozhukov, “ℓ1-penalized quantile regression in high-dimensional sparse models,” Ann. Statist., vol. 39, no. 1, pp. 82–130, 2011.
[29]
L. Wang, Y. Wu, and R. Li, “Quantile regression for analyzing heterogeneity in ultra-high dimension,” J. Amer. Stat. Assoc., vol. 107, no. 497, pp. 214–222, Mar. 2012.
[30]
Q. Zheng, L. Peng, and X. He, “High dimensional censored quantile regression,” Ann. Statist., vol. 46, no. 1, pp. 308–343, Feb. 2018.
[31]
L. Wang, “The L1 penalized LAD estimator for high dimensional linear regression,” J. Multivariate Anal., vol. 120, pp. 135–151, Sep. 2013.
[32]
Y. Wu and Y. Liu, “Variable selection in quantile regression,” Statistica Sinica, vol. 19, no. 2, pp. 801–817, 2009.
[33]
J. Fan, Y. Fan, and E. Barut, “Adaptive robust variable selection,” Ann. Statist., vol. 42, no. 1, pp. 324–351, 2014.
[34]
J. Fan, L. Xue, and H. Zou, “Strong Oracle optimality of folded concave penalized estimation,” Ann. Statist., vol. 42, no. 3, pp. 819–849, Jun. 2014.
[35]
R. Koenker and P. Ng, “A Frisch-Newton algorithm for sparse quantile regression,” Acta Mathematicae Applicatae Sinica, English Ser., vol. 21, no. 2, pp. 225–236, May 2005.
[36]
T. T. Wu and K. Lange, “Coordinate descent algorithms for lasso penalized regression,” Ann. Appl. Statist., vol. 2, no. 1, pp. 224–244, 2008.
[37]
B. Peng and L. Wang, “An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression,” J. Comput. Graph. Statist., vol. 24, no. 3, pp. 676–694, Jul. 2015.
[38]
D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm,” J. Comput. Graph. Statist., vol. 9, no. 1, pp. 60–77, Mar. 2000.
[39]
J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal procedures for locally private estimation,” J. Amer. Stat. Assoc., vol. 113, no. 521, pp. 182–201, Jan. 2018.
[40]
Q. Zheng, S. Chen, Q. Long, and W. Su, “Federated f-differential privacy,” in Proc. 24th Int. Conf. Artif. Intell. Statist., vol. 130, 2021, pp. 2251–2259.
[41]
K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization,” J. Mach. Learn. Res., vol. 12, pp. 1069–1109, Mar. 2011.
[42]
L. Wang and Q. Gu, “Differentially private iterative gradient hard thresholding for sparse learning,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 3740–3747.
[43]
D. Wang and J. Xu, “Differentially private L1-norm linear regression with heavy-tailed data,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jun. 2022, pp. 1856–1861.
[44]
L. Hu, S. Ni, H. Xiao, and D. Wang, “High dimensional differentially private stochastic optimization with heavy-tailed data,” in Proc. 41st ACM SIGMOD-SIGACT-SIGAI Symp. Princ. Database Syst., Jun. 2022, pp. 227–236.
[45]
B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by subsampling: Tight analyses via couplings and divergences,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–11.
[46]
C. Yi and J. Huang, “Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression,” J. Comput. Graph. Statist., vol. 26, no. 3, pp. 547–557, Jul. 2017.
[47]
M. Su and W. Wang, “Elastic net penalized quantile regression model,” J. Comput. Appl. Math., vol. 392, Aug. 2021, Art. no.
[48]
S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan, “Stochastic convex optimization,” in Proc. 22nd Annu. Conf. Learn. Theory (COLT), 2009, vol. 2, no. 4, pp. 5–15.
[49]
S. Neel, A. Roth, and S. Sharifi-Malvajerdi, “Descent-to-delete: Gradient-based methods for machine unlearning,” in Proc. 32nd Int. Conf. Algorithmic Learn. Theory, vol. 132, 2021, pp. 931–962.
[50]
M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Proc. Theory Cryptography Conf., vol. 9985. Cham, Switzerland: Springer, 2016, pp. 635–658.
[51]
C. Dwork, W. Su, and L. Zhang, “Differentially private false discovery rate control,” J. Privacy Confidentiality, vol. 11, no. 2, pp. 1–49, Sep. 2021.
[52]
M. Redmond, “Communities and crime unnormalized,” UCI Mach. Learn. Repository, Irvine, CA, USA, Tech. Rep., 2011. 10.24432/C5PC8X.

Cited By

View all

Index Terms

  1. Efficient Sparse Least Absolute Deviation Regression With Differential Privacy
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image IEEE Transactions on Information Forensics and Security
            IEEE Transactions on Information Forensics and Security  Volume 19, Issue
            2024
            10342 pages

            Publisher

            IEEE Press

            Publication History

            Published: 01 January 2024

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 05 Mar 2025

            Other Metrics

            Citations

            Cited By

            View all

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media