research-article

Differentially Private Logistic Regression with Sparse Solutions

Authors:

Brian TestaAuthors Info & Claims

AISec '23: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

Pages 1 - 9

https://doi.org/10.1145/3605764.3623910

Published: 26 November 2023 Publication History

Abstract

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.

References

[1]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308--318.

Digital Library

[2]

Victor Balcer and Salil Vadhan. 2017. Differential privacy on finite computers. arXiv preprint arXiv:1709.05396 (2017).

[3]

Alessio Benavoli, Giorgio Corani, and Francesca Mangili. 2016. Should we really use post-hoc tests based on mean-ranks? The Journal of Machine Learning Research, Vol. 17, 1 (2016), 152--161.

Digital Library

[4]

Charlotte Bonte and Frederik Vercauteren. 2018. Privacy-preserving logistic regression training. BMC medical genomics, Vol. 11 (2018), 13--21.

[5]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), Vol. 2, 3 (2011), 1--27.

Digital Library

[6]

Kamalika Chaudhuri and Claire Monteleoni. 2008. Privacy-preserving logistic regression. Advances in neural information processing systems, Vol. 21 (2008).

[7]

Kenneth L Clarkson. 2010. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Transactions on Algorithms (TALG), Vol. 6, 4 (2010), 1--30.

Digital Library

[8]

Janez Demvs ar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, Vol. 7 (2006), 1--30.

[9]

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25. Springer, 486--503.

Digital Library

[10]

Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.

[11]

Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval research logistics quarterly, Vol. 3, 1--2 (1956), 95--110.

[12]

Gilles Gasso. 2019. Logistic regression. INSA Rouen-ASI Departement Laboratory: Saint-Etienne-du-Rouvray, France (2019), 1--30.

[13]

Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 351--360.

Digital Library

[14]

Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. 2020. Array programming with NumPy. Nature, Vol. 585, 7825 (2020), 357--362.

[15]

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2015a. Statistical learning with sparsity: the lasso and generalizations. CRC press.

[16]

Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2015b. Statistical learning with sparsity: the lasso and generalizations. CRC press.

[17]

Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International conference on machine learning. PMLR, 427--435.

[18]

Dan Jurafsky and James H. Martin. 2021. Speech and Language Processing.

[19]

Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2015. The composition theorem for differential privacy. In International conference on machine learning. PMLR, 1376--1385.

[20]

Amol Khanna, Vincent Schaffer, Gamze Gürsoy, and Mark Gerstein. 2022. Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 1358--1361.

[21]

Daniel Kifer, Adam Smith, and Abhradeep Thakurta. 2012. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 25--1.

[22]

Hyeongjun Kim, Hoon Cho, and Doojin Ryu. 2020. Corporate default predictions using machine learning: Literature review. Sustainability, Vol. 12, 16 (2020), 6325.

[23]

Miran Kim, Junghye Lee, Lucila Ohno-Machado, and Xiaoqian Jiang. 2019. Secure and differentially private logistic regression for horizontally distributed data. IEEE Transactions on Information Forensics and Security, Vol. 15 (2019), 695--710.

Digital Library

[24]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, Vol. 30 (2017).

[25]

Kevin P Murphy. 2022. Probabilistic machine learning: an introduction. MIT press.

[26]

Michael R Osborne, Brett Presnell, and Berwin A Turlach. 2000. On the lasso and its dual. Journal of Computational and Graphical statistics, Vol. 9, 2 (2000), 319--337.

[27]

Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, Vol. 4, 2 (2012), 107--194.

[28]

Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.

[29]

Kunal Talwar, Abhradeep Guha Thakurta, and Li Zhang. 2015. Nearly optimal private lasso. Advances in Neural Information Processing Systems, Vol. 28 (2015).

Digital Library

[30]

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods, Vol. 17, 3 (2020), 261--272.

[31]

Lingxiao Wang and Quanquan Gu. 2019. Differentially private iterative gradient hard thresholding for sparse learning. In 28th International Joint Conference on Artificial Intelligence.

Digital Library

[32]

Puyu Wang and Hai Zhang. 2020. Differential privacy for sparse classification learning. Neurocomputing, Vol. 375 (2020), 91--101.

Digital Library

[33]

Fei Yu, Michal Rybar, Caroline Uhler, and Stephen E Fienberg. 2014. Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In Privacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2014, Ibiza, Spain, September 17--19, 2014. Proceedings. Springer, 170--184.

Cited By

Hou YSong Y(2024)Transfer Learning for Logistic Regression with Differential PrivacyAxioms10.3390/axioms1308051713:8(517)Online publication date: 30-Jul-2024
https://doi.org/10.3390/axioms13080517
Swope RKhanna ADoldo PRoy SRaff EPintor MChen XJagielski M(2024)Feature Selection from Differentially Private CorrelationsProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694760(12-23)Online publication date: 6-Nov-2024
https://dl.acm.org/doi/10.1145/3689932.3694760
Lu FCurtin RRaff EFerraro FHolt JBaeza-Yates RBonchi F(2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672038
Show More Cited By

Index Terms

Differentially Private Logistic Regression with Sparse Solutions
1. Security and privacy
  1. Database and storage security
  2. Formal methods and theory of security

Recommendations

Logistic tensor regression for classification
IScIDE'12: Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering

Logistic regression is one of the classical approaches for classification which has been widely used in computer vision, bioinformatics as well as multimedia understanding. However, when it is applied to high-dimensional data with structural information ...
Regularized logistic regression without a penalty term

Research highlights EDAs can be used to find regularized logistic classifiers. It avoids the determination of the regularization term. EDA is not influenced by large number of covariates. Yields to significant better performance on AUC measure, compared ...
Clinical risk prediction with multilinear sparse logistic regression
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Logistic regression is one core predictive modeling technique that has been used extensively in health and biomedical problems. Recently a lot of research has been focusing on enforcing sparsity on the learned model to enhance its effectiveness and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AISec '23: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

November 2023

252 pages

ISBN:9798400702600

DOI:10.1145/3605764

Program Chairs:
Maura Pintor
University of Cagliari, Italy
,
Xinyun Chen
Google Brain, USA
,
Florian Tramèr
ETH Zürich, Switzerland

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCS '23

Sponsor:

SIGSAC

CCS '23: ACM SIGSAC Conference on Computer and Communications Security

November 30, 2023

Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 94 of 231 submissions, 41%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)99
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hou YSong Y(2024)Transfer Learning for Logistic Regression with Differential PrivacyAxioms10.3390/axioms1308051713:8(517)Online publication date: 30-Jul-2024
https://doi.org/10.3390/axioms13080517
Swope RKhanna ADoldo PRoy SRaff EPintor MChen XJagielski M(2024)Feature Selection from Differentially Private CorrelationsProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694760(12-23)Online publication date: 6-Nov-2024
https://dl.acm.org/doi/10.1145/3689932.3694760
Lu FCurtin RRaff EFerraro FHolt JBaeza-Yates RBonchi F(2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672038
Khanna ARaff EInkawhich N(2024)SoK: A Review of Differentially Private Linear Models For High-Dimensional Data2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00012(57-77)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00012

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten