Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

A Review on Machine Unlearning

  • Survey Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Recently, an increasing number of laws have governed the useability of users’ privacy. For example, Article 17 of the General Data Protection Regulation (GDPR), the right to be forgotten, requires machine learning applications to remove a portion of data from a dataset and retrain it if the user makes such a request. Furthermore, from the security perspective, training data for machine learning models, i.e., data that may contain user privacy, should be effectively protected, including appropriate erasure. Therefore, researchers propose various privacy-preserving methods to deal with such issues as machine unlearning. This paper provides an in-depth review of the security and privacy concerns in machine learning models. First, we present how machine learning can use users’ private data in daily life and the role that the GDPR plays in this problem. Then, we introduce the concept of machine unlearning by describing the security threats in machine learning models and how to protect users’ privacy from being violated using machine learning platforms. As the core content of the paper, we introduce and analyze current machine unlearning approaches and several representative results and discuss them in the context of the data lineage. Furthermore, we also discuss the future research challenges in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

Interested parties can obtain the anonymised datasets that support the findings of this study from the corresponding author upon reasonable request.

References

  1. Baracaldo N, Chen B, Ludwig H, Safavi JA. Mitigating poisoning attacks on machine learning models: a data provenance based approach. In: Proceedings of the 10th ACM workshop on artificial intelligence and security. 2017;103–110

  2. Liu Y, Fan M, Chen C, Liu X, Ma Z, Wang L, Ma J. Backdoor defense with machine unlearning. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.09538.

    Article  Google Scholar 

  3. Bourtoule L, Chandrasekaran V, Choquette-Choo CA, Jia H, Travers A, Zhang B, Lie D, Papernot N. Machine unlearning. In: 2021 IEEE symposium on security and privacy (SP). IEEE. 2021;141–159

  4. Al-Rubaie M, Chang JM. Privacy-preserving machine learning: threats and solutions. IEEE Secur Priv. 2019;17(2):49–58.

    Article  Google Scholar 

  5. Schelter S. Towards efficient machine unlearning via incremental view maintenance.

  6. Graves L, Nagisetty V, Ganesh V. Amnesiac machine learning. arXiv. 2020. https://doi.org/10.48550/arXiv.2010.10981.

    Article  Google Scholar 

  7. Chen M, Zhang Z, Wang T, Backes M, Humbert M, Zhang Y. When machine unlearning jeopardizes privacy. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security. 2021;896–911

  8. Gao J, Garg S, Mahmoody M, Vasudevan PN. Deletion inference, reconstruction, and compliance in machine (un) learning. arXiv. 2022. https://doi.org/10.48550/arXiv.2202.03460.

    Article  Google Scholar 

  9. Marchant NG, Rubinstein BI, Alfeld S. Hard to forget: poisoning attacks on certified machine unlearning. arXiv preprint arXiv:2109.08266. 2021

  10. Baracaldo N, Chen B, Ludwig H, Safavi A, Zhang R. Detecting poisoning attacks on machine learning in iot environments. In: 2018 IEEE international congress on internet of things (ICIOT). IEEE 2018;57–64.

  11. Chundawat VS, Tarun AK, Mandal M, Kankanhalli M. Zero-shot machine unlearning. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.05629.

    Article  Google Scholar 

  12. Toreini E, Aitken M, Coopamootoo K, Elliott K, Zelaya CG, Van Moorsel A. The relationship between trust in ai and trustworthy machine learning technologies. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020;272–283.

  13. Surma J. Hacking machine learning: towards the comprehensive taxonomy of attacks against machine learning systems. In: Proceedings of the 2020 the 4th international conference on innovation in artificial intelligence. 2020;1–4.

  14. Tramèr F, Zhang F, Juels A, Reiter MK, Ristenpart T. Stealing machine learning models via prediction APIs. USENIX Secur Symp. 2016;16:601–18.

    Google Scholar 

  15. Song C, Ristenpart T, Shmatikov V. Machine learning models that remember too much. In: Proceedings of the 2017 ACM SIGSAC confer-ence on computer and communications security. 2017;587–601.

  16. Shen S, Tople S, Saxena P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In: Proceedings of the 32nd annual conference on computer security applications. 2016;508–519.

  17. Alsdurf H, Belliveau E, Bengio Y, Deleu T, Gupta P, Ippolito D, Janda R, Jarvie M, Kolody T, Krastev S, et al. Covi white paper. arXiv. 2020. https://doi.org/10.48550/arXiv.2005.08502.

    Article  Google Scholar 

  18. Ginart A, Guan MY, Valiant G, Zou J. Making ai forget you: data deletion in machine learning. arXiv. 2019. https://doi.org/10.48550/arXiv.1907.05012.

    Article  Google Scholar 

  19. Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 2015;1322–1333.

  20. Mahadevan A, Mathioudakis M. Certifiable machine unlearning for linear models. arXiv. 2021. https://doi.org/10.48550/arXiv.2106.15093.

    Article  Google Scholar 

  21. Guo C, Goldstein T, Hannun A, Van Der Maaten L. Certified data removal from machine learning models. arXiv. 2019. https://doi.org/10.48550/arXiv.1911.03030.

    Article  Google Scholar 

  22. Thudi A, Jia H, Shumailov I, Papernot N. On the necessity of auditable algorithmic definitions for machine unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2110.11891.

    Article  Google Scholar 

  23. Ullah E, Mai T, Rao A, Rossi RA, Arora R. Machine unlearning via algorithmic stability. In: conference on learning theory. PMLR. 2021;4126–4142.

  24. Cao Y, Yang J. Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy. IEEE. 2015;463–480.

  25. Cao Y, Yu AF, Aday A, Stahl E, Merwine J, Yang J. Efficient repair of polluted machine learning systems via causal unlearning. In: Proceedings of the 2018 on Asia conference on computer and communications security. 2018;735–747.

  26. Kashef R. A boosted svm classifier trained by incremental learning and decremental unlearning approach. Expert Syst Appl. 2021;167:114154.

    Article  Google Scholar 

  27. Jose ST, Simeone O. A unified pac-bayesian framework for machine unlearning via information risk minimization. In: 2021 IEEE 3 1st international workshop on machine learning for signal processing (MLSP). IEEE. 2021;1–6.

  28. Liu G, Ma X, Yang Y, Wang C, Liu J. Federaser: enabling efficient client-level data removal from federated learning models. In: 2021 IEEE/ACM 29th international symposium on quality of service (IWQOS). IEEE. 2021;1–10.

  29. Brophy J, Lowd D. Machine unlearning for random forests. In: International Conference on Machine Learning. PMLR. 2021;1092–1104.

  30. Wu C, Zhu S, Mitra P. Federated unlearning with knowledge distillation. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.09441.

    Article  Google Scholar 

  31. Du M, Chen Z, Liu C, Oak R, Song D. Lifelong anomaly detection through unlearning. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 2019;1283–1297.

  32. Baumhauer T, Sch¨ottle P, Zeppelzauer M. Machine unlearning: linear filtration for logit-based classifiers. arXiv. 2020. https://doi.org/10.48550/arXiv.2002.02730.

    Article  MATH  Google Scholar 

  33. Golatkar A, Achille A, Soatto S. Eternal sunshine of the spotless net: selective forgetting in deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020;9304–9312.

  34. Wu Y, Dobriban E, Davidson S. Deltagrad: rapid retraining of machine learning models. In: International conference on machine learning. PMLR. 2021;10355–10366.

  35. Golatkar A, Achille A, Ravichandran A, Polito M, Soatto S. Mixed—privacy forgetting in deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021;792–801.

  36. Izzo Z, Smart MA, Chaudhuri K, Zou J. Approximate data deletion from machine learning models. In: International conference on artificial intelligence and statistics. PMLR. 2021;2008–2016.

  37. Neel S, Roth A, Sharifi-Malvajerdi S. Descent-to-delete: gradient- based methods for machine unlearning. In: Algorithmic learning theory. PMLR. 2021;931–962.

  38. Thudi A, Deza G, Chandrasekaran V, Papernot N. Unrolling sgd: understanding factors influencing machine unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2109.13398.

    Article  Google Scholar 

  39. Warnecke A, Pirch L, Wressnegger C, Rieck K. Machine unlearning of features and labels. arXiv. 2021. https://doi.org/10.48550/arXiv.2108.11577.

    Article  Google Scholar 

  40. He Y, Meng G, Chen K, He J, Hu X. Deepobliviate: a powerful charm for erasing data residual memory in deep neural networks. arXiv. 2021. https://doi.org/10.48550/arXiv.2105.06209.

    Article  Google Scholar 

  41. Gong J, Simeone O, Kassab R, Kang J. Forget-svgd: Particle-based bayesian federated unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2111.12056.

    Article  Google Scholar 

  42. Guo T, Guo S, Zhang J, Xu W, Wang J. Vertical machine unlearning: Selectively removing sensitive information from latent feature space. arXiv. 2022. https://doi.org/10.48550/arXiv.2202.13295.

    Article  Google Scholar 

  43. Cauwenberghs G, Poggio T. Incremental and decremental support vector machine learning. advances in neural information processing systems. 2000;13.

  44. Tsai C-H, Lin C-Y, Lin C-J. Incremental and decremental training for linear classification. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. 2014;343–352.

  45. Karasuyama M, Takeuchi I. Multiple incremental decremental learning of support vector machines. IEEE Trans Neural Networks. 2010;21(7):1048–59.

    Article  Google Scholar 

  46. Kearns M. Efficient noise-tolerant learning from statistical queries. J ACM. 1998;45(6):983–1006.

    Article  MathSciNet  MATH  Google Scholar 

  47. Martens J. New insights and perspectives on the natural gradient method. arXiv. 2014. https://doi.org/10.48550/arXiv.1412.1193.

    Article  MATH  Google Scholar 

  48. Dwork C, Roth A, et al. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci. 2014;9(3–4):211–407.

    MathSciNet  MATH  Google Scholar 

  49. Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. advances in neural information processing systems. 2008;21.

  50. Golatkar A, Achille A, Soatto S. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations Europea conference on computer vision. Cham: Springer; 2020. p. 383–98.

    Google Scholar 

  51. Koh PW, Liang P. Understanding black-box predictions via influence functions. In: International conference on machine learning. PMLR. 2017;1885–1894.

  52. Giordano R, Stephenson W, Liu R, Jordan M, Broderick T. A swiss army infinitesimal jackknife. In: The 22nd international conference on artificial intelligence and statistics. International conference on machine learning. PMLR. 2019;1139–1147.

  53. Zhang Z, Sparks ER, Franklin MJ. Diagnosing machine learning pipelines with fine-grained lineage. In: Proceedings of the 26th international symposium on high-performance parallel and distributed computing. 2017;143–153.

  54. Luo G, et al. A roadmap for automating lineage tracing to aid automatically explaining machine learning predictions for clinical decision support. JMIR Med Inform. 2021;9(5):27778.

    Article  Google Scholar 

  55. Thiago RM, Souza R, Azevedo L, Soares EFDS, Santos R, Dos Santos W, De Bayser M, Cardoso MC, Moreno MF, Cerqueira R. Managing data lineage of o&g machine learning models: the sweet spot for shale use case. First EAGE Digit Conf Exhib. 2020;2020:1–5.

    Google Scholar 

  56. Li Y, Zheng X, Chen C, Liu J. Making recommender systems forget: learning and unlearning for erasable recommendation. arXiv. 2022. https://doi.org/10.48550/arXiv.2203.11491.

    Article  Google Scholar 

  57. Shokri R, Stronati M, Song C, Shmatikov V. Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE. 2017;3–18.

  58. Yeom S, Giacomelli I, Fredrikson M, Jha S. Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st computer security foundations symposium (CSF). IEEE. 2018;268–282.

  59. Sablayrolles A, Douze M, Schmid C, Ollivier Y, J´egou H. White-box vs black-box: Bayes optimal strategies for membership inference. In: International conference on machine learning. PMLR. 2019;5558–5567.

  60. Hayes J, Melis L, Danezis G, De Cristofaro E. Logan: membership inference attacks against generative models. Proc Privacy Enhanc Technol De Gruyter. 2019;2019:133–52.

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Japan Science and Technology Agency (JST) Strategic International Collaborative Research Program (SICORP). The first author was supported by JST SPRING, under Grant No. JPMJSP2136.

Funding

The work of Haibo Zhang was supported by the JST-Mirai Program, JPMJSP2136.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibo Zhang.

Ethics declarations

Conflict of Interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Nakamura, T., Isohara, T. et al. A Review on Machine Unlearning. SN COMPUT. SCI. 4, 337 (2023). https://doi.org/10.1007/s42979-023-01767-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01767-4

Keywords

Navigation