Abstract
Recently, an increasing number of laws have governed the useability of users’ privacy. For example, Article 17 of the General Data Protection Regulation (GDPR), the right to be forgotten, requires machine learning applications to remove a portion of data from a dataset and retrain it if the user makes such a request. Furthermore, from the security perspective, training data for machine learning models, i.e., data that may contain user privacy, should be effectively protected, including appropriate erasure. Therefore, researchers propose various privacy-preserving methods to deal with such issues as machine unlearning. This paper provides an in-depth review of the security and privacy concerns in machine learning models. First, we present how machine learning can use users’ private data in daily life and the role that the GDPR plays in this problem. Then, we introduce the concept of machine unlearning by describing the security threats in machine learning models and how to protect users’ privacy from being violated using machine learning platforms. As the core content of the paper, we introduce and analyze current machine unlearning approaches and several representative results and discuss them in the context of the data lineage. Furthermore, we also discuss the future research challenges in this field.
Similar content being viewed by others
Data availability
Interested parties can obtain the anonymised datasets that support the findings of this study from the corresponding author upon reasonable request.
References
Baracaldo N, Chen B, Ludwig H, Safavi JA. Mitigating poisoning attacks on machine learning models: a data provenance based approach. In: Proceedings of the 10th ACM workshop on artificial intelligence and security. 2017;103–110
Liu Y, Fan M, Chen C, Liu X, Ma Z, Wang L, Ma J. Backdoor defense with machine unlearning. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.09538.
Bourtoule L, Chandrasekaran V, Choquette-Choo CA, Jia H, Travers A, Zhang B, Lie D, Papernot N. Machine unlearning. In: 2021 IEEE symposium on security and privacy (SP). IEEE. 2021;141–159
Al-Rubaie M, Chang JM. Privacy-preserving machine learning: threats and solutions. IEEE Secur Priv. 2019;17(2):49–58.
Schelter S. Towards efficient machine unlearning via incremental view maintenance.
Graves L, Nagisetty V, Ganesh V. Amnesiac machine learning. arXiv. 2020. https://doi.org/10.48550/arXiv.2010.10981.
Chen M, Zhang Z, Wang T, Backes M, Humbert M, Zhang Y. When machine unlearning jeopardizes privacy. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security. 2021;896–911
Gao J, Garg S, Mahmoody M, Vasudevan PN. Deletion inference, reconstruction, and compliance in machine (un) learning. arXiv. 2022. https://doi.org/10.48550/arXiv.2202.03460.
Marchant NG, Rubinstein BI, Alfeld S. Hard to forget: poisoning attacks on certified machine unlearning. arXiv preprint arXiv:2109.08266. 2021
Baracaldo N, Chen B, Ludwig H, Safavi A, Zhang R. Detecting poisoning attacks on machine learning in iot environments. In: 2018 IEEE international congress on internet of things (ICIOT). IEEE 2018;57–64.
Chundawat VS, Tarun AK, Mandal M, Kankanhalli M. Zero-shot machine unlearning. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.05629.
Toreini E, Aitken M, Coopamootoo K, Elliott K, Zelaya CG, Van Moorsel A. The relationship between trust in ai and trustworthy machine learning technologies. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. 2020;272–283.
Surma J. Hacking machine learning: towards the comprehensive taxonomy of attacks against machine learning systems. In: Proceedings of the 2020 the 4th international conference on innovation in artificial intelligence. 2020;1–4.
Tramèr F, Zhang F, Juels A, Reiter MK, Ristenpart T. Stealing machine learning models via prediction APIs. USENIX Secur Symp. 2016;16:601–18.
Song C, Ristenpart T, Shmatikov V. Machine learning models that remember too much. In: Proceedings of the 2017 ACM SIGSAC confer-ence on computer and communications security. 2017;587–601.
Shen S, Tople S, Saxena P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In: Proceedings of the 32nd annual conference on computer security applications. 2016;508–519.
Alsdurf H, Belliveau E, Bengio Y, Deleu T, Gupta P, Ippolito D, Janda R, Jarvie M, Kolody T, Krastev S, et al. Covi white paper. arXiv. 2020. https://doi.org/10.48550/arXiv.2005.08502.
Ginart A, Guan MY, Valiant G, Zou J. Making ai forget you: data deletion in machine learning. arXiv. 2019. https://doi.org/10.48550/arXiv.1907.05012.
Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic countermeasures. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. 2015;1322–1333.
Mahadevan A, Mathioudakis M. Certifiable machine unlearning for linear models. arXiv. 2021. https://doi.org/10.48550/arXiv.2106.15093.
Guo C, Goldstein T, Hannun A, Van Der Maaten L. Certified data removal from machine learning models. arXiv. 2019. https://doi.org/10.48550/arXiv.1911.03030.
Thudi A, Jia H, Shumailov I, Papernot N. On the necessity of auditable algorithmic definitions for machine unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2110.11891.
Ullah E, Mai T, Rao A, Rossi RA, Arora R. Machine unlearning via algorithmic stability. In: conference on learning theory. PMLR. 2021;4126–4142.
Cao Y, Yang J. Towards making systems forget with machine unlearning. In: 2015 IEEE Symposium on Security and Privacy. IEEE. 2015;463–480.
Cao Y, Yu AF, Aday A, Stahl E, Merwine J, Yang J. Efficient repair of polluted machine learning systems via causal unlearning. In: Proceedings of the 2018 on Asia conference on computer and communications security. 2018;735–747.
Kashef R. A boosted svm classifier trained by incremental learning and decremental unlearning approach. Expert Syst Appl. 2021;167:114154.
Jose ST, Simeone O. A unified pac-bayesian framework for machine unlearning via information risk minimization. In: 2021 IEEE 3 1st international workshop on machine learning for signal processing (MLSP). IEEE. 2021;1–6.
Liu G, Ma X, Yang Y, Wang C, Liu J. Federaser: enabling efficient client-level data removal from federated learning models. In: 2021 IEEE/ACM 29th international symposium on quality of service (IWQOS). IEEE. 2021;1–10.
Brophy J, Lowd D. Machine unlearning for random forests. In: International Conference on Machine Learning. PMLR. 2021;1092–1104.
Wu C, Zhu S, Mitra P. Federated unlearning with knowledge distillation. arXiv. 2022. https://doi.org/10.48550/arXiv.2201.09441.
Du M, Chen Z, Liu C, Oak R, Song D. Lifelong anomaly detection through unlearning. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 2019;1283–1297.
Baumhauer T, Sch¨ottle P, Zeppelzauer M. Machine unlearning: linear filtration for logit-based classifiers. arXiv. 2020. https://doi.org/10.48550/arXiv.2002.02730.
Golatkar A, Achille A, Soatto S. Eternal sunshine of the spotless net: selective forgetting in deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020;9304–9312.
Wu Y, Dobriban E, Davidson S. Deltagrad: rapid retraining of machine learning models. In: International conference on machine learning. PMLR. 2021;10355–10366.
Golatkar A, Achille A, Ravichandran A, Polito M, Soatto S. Mixed—privacy forgetting in deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021;792–801.
Izzo Z, Smart MA, Chaudhuri K, Zou J. Approximate data deletion from machine learning models. In: International conference on artificial intelligence and statistics. PMLR. 2021;2008–2016.
Neel S, Roth A, Sharifi-Malvajerdi S. Descent-to-delete: gradient- based methods for machine unlearning. In: Algorithmic learning theory. PMLR. 2021;931–962.
Thudi A, Deza G, Chandrasekaran V, Papernot N. Unrolling sgd: understanding factors influencing machine unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2109.13398.
Warnecke A, Pirch L, Wressnegger C, Rieck K. Machine unlearning of features and labels. arXiv. 2021. https://doi.org/10.48550/arXiv.2108.11577.
He Y, Meng G, Chen K, He J, Hu X. Deepobliviate: a powerful charm for erasing data residual memory in deep neural networks. arXiv. 2021. https://doi.org/10.48550/arXiv.2105.06209.
Gong J, Simeone O, Kassab R, Kang J. Forget-svgd: Particle-based bayesian federated unlearning. arXiv. 2021. https://doi.org/10.48550/arXiv.2111.12056.
Guo T, Guo S, Zhang J, Xu W, Wang J. Vertical machine unlearning: Selectively removing sensitive information from latent feature space. arXiv. 2022. https://doi.org/10.48550/arXiv.2202.13295.
Cauwenberghs G, Poggio T. Incremental and decremental support vector machine learning. advances in neural information processing systems. 2000;13.
Tsai C-H, Lin C-Y, Lin C-J. Incremental and decremental training for linear classification. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. 2014;343–352.
Karasuyama M, Takeuchi I. Multiple incremental decremental learning of support vector machines. IEEE Trans Neural Networks. 2010;21(7):1048–59.
Kearns M. Efficient noise-tolerant learning from statistical queries. J ACM. 1998;45(6):983–1006.
Martens J. New insights and perspectives on the natural gradient method. arXiv. 2014. https://doi.org/10.48550/arXiv.1412.1193.
Dwork C, Roth A, et al. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci. 2014;9(3–4):211–407.
Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression. advances in neural information processing systems. 2008;21.
Golatkar A, Achille A, Soatto S. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations Europea conference on computer vision. Cham: Springer; 2020. p. 383–98.
Koh PW, Liang P. Understanding black-box predictions via influence functions. In: International conference on machine learning. PMLR. 2017;1885–1894.
Giordano R, Stephenson W, Liu R, Jordan M, Broderick T. A swiss army infinitesimal jackknife. In: The 22nd international conference on artificial intelligence and statistics. International conference on machine learning. PMLR. 2019;1139–1147.
Zhang Z, Sparks ER, Franklin MJ. Diagnosing machine learning pipelines with fine-grained lineage. In: Proceedings of the 26th international symposium on high-performance parallel and distributed computing. 2017;143–153.
Luo G, et al. A roadmap for automating lineage tracing to aid automatically explaining machine learning predictions for clinical decision support. JMIR Med Inform. 2021;9(5):27778.
Thiago RM, Souza R, Azevedo L, Soares EFDS, Santos R, Dos Santos W, De Bayser M, Cardoso MC, Moreno MF, Cerqueira R. Managing data lineage of o&g machine learning models: the sweet spot for shale use case. First EAGE Digit Conf Exhib. 2020;2020:1–5.
Li Y, Zheng X, Chen C, Liu J. Making recommender systems forget: learning and unlearning for erasable recommendation. arXiv. 2022. https://doi.org/10.48550/arXiv.2203.11491.
Shokri R, Stronati M, Song C, Shmatikov V. Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE. 2017;3–18.
Yeom S, Giacomelli I, Fredrikson M, Jha S. Privacy risk in machine learning: analyzing the connection to overfitting. In: 2018 IEEE 31st computer security foundations symposium (CSF). IEEE. 2018;268–282.
Sablayrolles A, Douze M, Schmid C, Ollivier Y, J´egou H. White-box vs black-box: Bayes optimal strategies for membership inference. In: International conference on machine learning. PMLR. 2019;5558–5567.
Hayes J, Melis L, Danezis G, De Cristofaro E. Logan: membership inference attacks against generative models. Proc Privacy Enhanc Technol De Gruyter. 2019;2019:133–52.
Acknowledgements
This research was partially supported by the Japan Science and Technology Agency (JST) Strategic International Collaborative Research Program (SICORP). The first author was supported by JST SPRING, under Grant No. JPMJSP2136.
Funding
The work of Haibo Zhang was supported by the JST-Mirai Program, JPMJSP2136.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, H., Nakamura, T., Isohara, T. et al. A Review on Machine Unlearning. SN COMPUT. SCI. 4, 337 (2023). https://doi.org/10.1007/s42979-023-01767-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-01767-4