Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient Sparse Least Absolute Deviation Regression With Differential Privacy

Published: 01 January 2024 Publication History

Abstract

In recent years, privacy-preserving machine learning algorithms have attracted increasing attention because of their important applications in many scientific fields. However, in the literature, most privacy-preserving algorithms demand learning objectives to be strongly convex and Lipschitz smooth, which thus cannot cover a wide class of robust loss functions (e.g., quantile/least absolute loss). In this work, we aim to develop a fast privacy-preserving learning solution for a sparse robust regression problem. Our learning loss consists of a robust least absolute loss and an <inline-formula> <tex-math notation="LaTeX">$\ell _{1}$ </tex-math></inline-formula> sparse penalty term. To fast solve the non-smooth loss under a given privacy budget, we develop a Fast Robust And Privacy-Preserving Estimation (FRAPPE) algorithm for least absolute deviation regression. Our algorithm achieves a fast estimation by reformulating the sparse LAD problem as a penalized least square estimation problem and adopts a three-stage noise injection to guarantee the <inline-formula> <tex-math notation="LaTeX">$(\epsilon,\delta)$ </tex-math></inline-formula>-differential privacy. We show that our algorithm can achieve better privacy and statistical accuracy trade-off compared with the state-of-the-art privacy-preserving regression algorithms. In the end, we conduct experiments to verify the efficiency of our proposed FRAPPE algorithm.

References

[1]
C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407, 2014.
[2]
D. Wang and J. Xu, “On sparse linear regression in the local differential privacy model,” IEEE Trans. Inf. Theory, vol. 67, no. 2, pp. 1182–1200, Feb. 2021.
[3]
T. T. Cai, Y. Wang, and L. Zhang, “The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy,” Ann. Statist., vol. 49, no. 5, pp. 2825–2850, Oct. 2021.
[4]
D. Wang, A. Smith, and J. Xu, “High dimensional sparse linear regression under local differential privacy: Power and limitations,” in Proc. NIPS Workshop Privacy-Preserving Mach. Learn., vol. 235, 2018, pp. 1–5.
[5]
J. Ren, J. Xiong, Z. Yao, R. Ma, and M. Lin, “DPLK-means: A novel differential privacy K-means mechanism,” in Proc. IEEE 2nd Int. Conf. Data Sci. Cyberspace (DSC), Jun. 2017, pp. 133–139.
[6]
Y. Mülle, C. Clifton, and K. Böhm, “Privacy-integrated graph clustering through differential privacy,” in Proc. EDBT/ICDT Workshops, vol. 157, 2015, pp. 247–254.
[7]
M. Abadiet al., “Deep learning with differential privacy,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2016, pp. 308–318.
[8]
T. Ha, T. K. Dang, T. T. Dang, T. A. Truong, and M. T. Nguyen, “Differential privacy in deep learning: An overview,” in Proc. Int. Conf. Adv. Comput. Appl. (ACOMP), Nov. 2019, pp. 97–102.
[9]
Y. Zhu, X. Yu, M. Chandraker, and Y.-X. Wang, “Private-kNN: Practical differential privacy for computer vision,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 11851–11859.
[10]
J. L. Horowitz, “Bootstrap methods for median regression models,” Econometrica, vol. 66, no. 6, pp. 1327–1351, Nov. 1998.
[11]
B. Jayaraman, L. Wang, D. Evans, and Q. Gu, “Distributed learning without distress: Privacy-preserving empirical risk minimization,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–12.
[12]
J. Zhang, K. Zheng, W. Mou, and L. Wang, “Efficient private ERM for smooth objectives,” in Proc. 26th Int. Joint Conf. Artif. Intell., 2017, pp. 3922–3928.
[13]
L. Wang, B. Jayaraman, D. Evans, and Q. Gu, “Efficient privacy-preserving stochastic nonconvex optimization,” in Proc. 39th Conf. Uncertainty Artif. Intell., vol. 216, 2023, pp. 2203–2213.
[14]
H. Wang and C. Li, “Distributed quantile regression over sensor networks,” IEEE Trans. Signal Inf. Process. Over Netw., vol. 4, no. 2, pp. 338–348, Jun. 2018.
[15]
H. Wang, L. Xia, and C. Li, “Distributed online quantile regression over networks with quantized communication,” Signal Process., vol. 157, pp. 141–150, Apr. 2019.
[16]
Y. Tian, M. Tian, and Q. Zhu, “Linear quantile regression based on EM algorithm,” Commun. Statist.-Theory Methods, vol. 43, no. 16, pp. 3464–3484, Aug. 2014.
[17]
Y. Tian, Q. Zhu, and M. Tian, “Estimation of linear composite quantile regression using EM algorithm,” Statist. Probab. Lett., vol. 117, pp. 183–191, Oct. 2016.
[18]
Y.-H. Zhou, Z.-X. Ni, and Y. Li, “Quantile regression via the EM algorithm,” Commun. Statist. - Simul. Comput., vol. 43, no. 10, pp. 2162–2172, Nov. 2014.
[19]
L. Yu and N. Lin, “ADMM for penalized quantile regression in big data,” Int. Stat. Rev., vol. 85, no. 3, pp. 494–518, Dec. 2017.
[20]
Y. Gu, J. Fan, L. Kong, S. Ma, and H. Zou, “ADMM for high-dimensional sparse penalized quantile regression,” Technometrics, vol. 60, no. 3, pp. 319–331, Jul. 2018.
[21]
X. He, X. Pan, K. M. Tan, and W.-X. Zhou, “Smoothed quantile regression with large-scale inference,” J. Econometrics, vol. 232, no. 2, pp. 367–388, Feb. 2023.
[22]
P. J. Huber, “Robust statistics,” in International Encyclopedia of Statistical Science. Berlin, Germany: Springer, 2011, pp. 1248–1251.
[23]
C. Yu and W. Yao, “Robust linear regression: A review and comparison,” Commun. Statist.-Simul. Comput., vol. 46, no. 8, pp. 6261–6282, Sep. 2017.
[24]
A. F. Siegel, “Robust regression using repeated medians,” Biometrika, vol. 69, no. 1, pp. 242–244, 1982.
[25]
P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute deviation,” J. Amer. Stat. Assoc., vol. 88, no. 424, pp. 1273–1283, Dec. 1993.
[26]
P. Rousseeuw and V. Yohai, “Robust regression by means of s-estimators,” in Robust and Nonlinear Time Series Analysis. Berlin, Germany: Springer, 1984, pp. 256–272.
[27]
Y. Li and J. Zhu, “ℓ1-norm quantile regression,” J. Comput. Graph. Statist., vol. 17, no. 1, pp. 163–185, 2008.
[28]
A. Belloni and V. Chernozhukov, “ℓ1-penalized quantile regression in high-dimensional sparse models,” Ann. Statist., vol. 39, no. 1, pp. 82–130, 2011.
[29]
L. Wang, Y. Wu, and R. Li, “Quantile regression for analyzing heterogeneity in ultra-high dimension,” J. Amer. Stat. Assoc., vol. 107, no. 497, pp. 214–222, Mar. 2012.
[30]
Q. Zheng, L. Peng, and X. He, “High dimensional censored quantile regression,” Ann. Statist., vol. 46, no. 1, pp. 308–343, Feb. 2018.
[31]
L. Wang, “The L1 penalized LAD estimator for high dimensional linear regression,” J. Multivariate Anal., vol. 120, pp. 135–151, Sep. 2013.
[32]
Y. Wu and Y. Liu, “Variable selection in quantile regression,” Statistica Sinica, vol. 19, no. 2, pp. 801–817, 2009.
[33]
J. Fan, Y. Fan, and E. Barut, “Adaptive robust variable selection,” Ann. Statist., vol. 42, no. 1, pp. 324–351, 2014.
[34]
J. Fan, L. Xue, and H. Zou, “Strong Oracle optimality of folded concave penalized estimation,” Ann. Statist., vol. 42, no. 3, pp. 819–849, Jun. 2014.
[35]
R. Koenker and P. Ng, “A Frisch-Newton algorithm for sparse quantile regression,” Acta Mathematicae Applicatae Sinica, English Ser., vol. 21, no. 2, pp. 225–236, May 2005.
[36]
T. T. Wu and K. Lange, “Coordinate descent algorithms for lasso penalized regression,” Ann. Appl. Statist., vol. 2, no. 1, pp. 224–244, 2008.
[37]
B. Peng and L. Wang, “An iterative coordinate descent algorithm for high-dimensional nonconvex penalized quantile regression,” J. Comput. Graph. Statist., vol. 24, no. 3, pp. 676–694, Jul. 2015.
[38]
D. R. Hunter and K. Lange, “Quantile regression via an MM algorithm,” J. Comput. Graph. Statist., vol. 9, no. 1, pp. 60–77, Mar. 2000.
[39]
J. C. Duchi, M. I. Jordan, and M. J. Wainwright, “Minimax optimal procedures for locally private estimation,” J. Amer. Stat. Assoc., vol. 113, no. 521, pp. 182–201, Jan. 2018.
[40]
Q. Zheng, S. Chen, Q. Long, and W. Su, “Federated f-differential privacy,” in Proc. 24th Int. Conf. Artif. Intell. Statist., vol. 130, 2021, pp. 2251–2259.
[41]
K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially private empirical risk minimization,” J. Mach. Learn. Res., vol. 12, pp. 1069–1109, Mar. 2011.
[42]
L. Wang and Q. Gu, “Differentially private iterative gradient hard thresholding for sparse learning,” in Proc. 28th Int. Joint Conf. Artif. Intell., Aug. 2019, pp. 3740–3747.
[43]
D. Wang and J. Xu, “Differentially private L1-norm linear regression with heavy-tailed data,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jun. 2022, pp. 1856–1861.
[44]
L. Hu, S. Ni, H. Xiao, and D. Wang, “High dimensional differentially private stochastic optimization with heavy-tailed data,” in Proc. 41st ACM SIGMOD-SIGACT-SIGAI Symp. Princ. Database Syst., Jun. 2022, pp. 227–236.
[45]
B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by subsampling: Tight analyses via couplings and divergences,” in Proc. Adv. Neural Inf. Process. Syst., vol. 31, 2018, pp. 1–11.
[46]
C. Yi and J. Huang, “Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression,” J. Comput. Graph. Statist., vol. 26, no. 3, pp. 547–557, Jul. 2017.
[47]
M. Su and W. Wang, “Elastic net penalized quantile regression model,” J. Comput. Appl. Math., vol. 392, Aug. 2021, Art. no.
[48]
S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan, “Stochastic convex optimization,” in Proc. 22nd Annu. Conf. Learn. Theory (COLT), 2009, vol. 2, no. 4, pp. 5–15.
[49]
S. Neel, A. Roth, and S. Sharifi-Malvajerdi, “Descent-to-delete: Gradient-based methods for machine unlearning,” in Proc. 32nd Int. Conf. Algorithmic Learn. Theory, vol. 132, 2021, pp. 931–962.
[50]
M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Proc. Theory Cryptography Conf., vol. 9985. Cham, Switzerland: Springer, 2016, pp. 635–658.
[51]
C. Dwork, W. Su, and L. Zhang, “Differentially private false discovery rate control,” J. Privacy Confidentiality, vol. 11, no. 2, pp. 1–49, Sep. 2021.
[52]
M. Redmond, “Communities and crime unnormalized,” UCI Mach. Learn. Repository, Irvine, CA, USA, Tech. Rep., 2011. 10.24432/C5PC8X.

Cited By

View all

Index Terms

  1. Efficient Sparse Least Absolute Deviation Regression With Differential Privacy
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Information Forensics and Security
    IEEE Transactions on Information Forensics and Security  Volume 19, Issue
    2024
    9612 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 January 2024

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media