Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3605764.3623910acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Differentially Private Logistic Regression with Sparse Solutions

Published: 26 November 2023 Publication History

Abstract

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been developed, but generally produce dense solutions, reducing the intrinsic utility of the LASSO penalty. In this paper, we present a differentially private method for sparse logistic regression that maintains hard zeros. Our key insight is to first train a non-private LASSO logistic regression model to determine an appropriate privatized number of non-zero coefficients to use in final model selection. To demonstrate our method's performance, we run experiments on synthetic and real-world datasets.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308--318.
[2]
Victor Balcer and Salil Vadhan. 2017. Differential privacy on finite computers. arXiv preprint arXiv:1709.05396 (2017).
[3]
Alessio Benavoli, Giorgio Corani, and Francesca Mangili. 2016. Should we really use post-hoc tests based on mean-ranks? The Journal of Machine Learning Research, Vol. 17, 1 (2016), 152--161.
[4]
Charlotte Bonte and Frederik Vercauteren. 2018. Privacy-preserving logistic regression training. BMC medical genomics, Vol. 11 (2018), 13--21.
[5]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), Vol. 2, 3 (2011), 1--27.
[6]
Kamalika Chaudhuri and Claire Monteleoni. 2008. Privacy-preserving logistic regression. Advances in neural information processing systems, Vol. 21 (2008).
[7]
Kenneth L Clarkson. 2010. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. ACM Transactions on Algorithms (TALG), Vol. 6, 4 (2010), 1--30.
[8]
Janez Demvs ar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, Vol. 7 (2006), 1--30.
[9]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology-EUROCRYPT 2006: 24th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28-June 1, 2006. Proceedings 25. Springer, 486--503.
[10]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.
[11]
Marguerite Frank and Philip Wolfe. 1956. An algorithm for quadratic programming. Naval research logistics quarterly, Vol. 3, 1--2 (1956), 95--110.
[12]
Gilles Gasso. 2019. Logistic regression. INSA Rouen-ASI Departement Laboratory: Saint-Etienne-du-Rouvray, France (2019), 1--30.
[13]
Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. 2009. Universally utility-maximizing privacy mechanisms. In Proceedings of the forty-first annual ACM symposium on Theory of computing. 351--360.
[14]
Charles R Harris, K Jarrod Millman, Stéfan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. 2020. Array programming with NumPy. Nature, Vol. 585, 7825 (2020), 357--362.
[15]
Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2015a. Statistical learning with sparsity: the lasso and generalizations. CRC press.
[16]
Trevor Hastie, Robert Tibshirani, and Martin Wainwright. 2015b. Statistical learning with sparsity: the lasso and generalizations. CRC press.
[17]
Martin Jaggi. 2013. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In International conference on machine learning. PMLR, 427--435.
[18]
Dan Jurafsky and James H. Martin. 2021. Speech and Language Processing.
[19]
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. 2015. The composition theorem for differential privacy. In International conference on machine learning. PMLR, 1376--1385.
[20]
Amol Khanna, Vincent Schaffer, Gamze Gürsoy, and Mark Gerstein. 2022. Privacy-preserving Model Training for Disease Prediction Using Federated Learning with Differential Privacy. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 1358--1361.
[21]
Daniel Kifer, Adam Smith, and Abhradeep Thakurta. 2012. Private convex empirical risk minimization and high-dimensional regression. In Conference on Learning Theory. JMLR Workshop and Conference Proceedings, 25--1.
[22]
Hyeongjun Kim, Hoon Cho, and Doojin Ryu. 2020. Corporate default predictions using machine learning: Literature review. Sustainability, Vol. 12, 16 (2020), 6325.
[23]
Miran Kim, Junghye Lee, Lucila Ohno-Machado, and Xiaoqian Jiang. 2019. Secure and differentially private logistic regression for horizontally distributed data. IEEE Transactions on Information Forensics and Security, Vol. 15 (2019), 695--710.
[24]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, Vol. 30 (2017).
[25]
Kevin P Murphy. 2022. Probabilistic machine learning: an introduction. MIT press.
[26]
Michael R Osborne, Brett Presnell, and Berwin A Turlach. 2000. On the lasso and its dual. Journal of Computational and Graphical statistics, Vol. 9, 2 (2000), 319--337.
[27]
Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Foundations and Trends® in Machine Learning, Vol. 4, 2 (2012), 107--194.
[28]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3--18.
[29]
Kunal Talwar, Abhradeep Guha Thakurta, and Li Zhang. 2015. Nearly optimal private lasso. Advances in Neural Information Processing Systems, Vol. 28 (2015).
[30]
Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods, Vol. 17, 3 (2020), 261--272.
[31]
Lingxiao Wang and Quanquan Gu. 2019. Differentially private iterative gradient hard thresholding for sparse learning. In 28th International Joint Conference on Artificial Intelligence.
[32]
Puyu Wang and Hai Zhang. 2020. Differential privacy for sparse classification learning. Neurocomputing, Vol. 375 (2020), 91--101.
[33]
Fei Yu, Michal Rybar, Caroline Uhler, and Stephen E Fienberg. 2014. Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases. In Privacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2014, Ibiza, Spain, September 17--19, 2014. Proceedings. Springer, 170--184.

Cited By

View all
  • (2024)Transfer Learning for Logistic Regression with Differential PrivacyAxioms10.3390/axioms1308051713:8(517)Online publication date: 30-Jul-2024
  • (2024)Feature Selection from Differentially Private CorrelationsProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694760(12-23)Online publication date: 6-Nov-2024
  • (2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Differentially Private Logistic Regression with Sparse Solutions

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AISec '23: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security
      November 2023
      252 pages
      ISBN:9798400702600
      DOI:10.1145/3605764
      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 November 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. differential privacy
      2. logistic regression
      3. sparse
      4. thresholding

      Qualifiers

      • Research-article

      Conference

      CCS '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 94 of 231 submissions, 41%

      Upcoming Conference

      CCS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)99
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 13 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Transfer Learning for Logistic Regression with Differential PrivacyAxioms10.3390/axioms1308051713:8(517)Online publication date: 30-Jul-2024
      • (2024)Feature Selection from Differentially Private CorrelationsProceedings of the 2024 Workshop on Artificial Intelligence and Security10.1145/3689932.3694760(12-23)Online publication date: 6-Nov-2024
      • (2024)High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global UpdatesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672038(2037-2047)Online publication date: 25-Aug-2024
      • (2024)SoK: A Review of Differentially Private Linear Models For High-Dimensional Data2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00012(57-77)Online publication date: 9-Apr-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media