Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3583780.3614811acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Closed-form Machine Unlearning for Matrix Factorization

Published: 21 October 2023 Publication History

Abstract

Matrix factorization (MF) is a fundamental model in data mining and machine learning, which finds wide applications in diverse application areas, including recommendation systems with user-item rating matrices, phenotype extraction from electronic health records, and spatial-temporal data analysis for check-in records. The "right to be forgotten" has become an indispensable privacy consideration due to the widely enforced data protection regulations, which allow personal users having contributed their data for model training to revoke their data through a data deletion request. Consequently, it gives rise to the emerging task of machine unlearning for the MF model, which removes the influence of the matrix rows/columns from the trained MF factors upon receiving the deletion requests from the data owners of these rows/columns. The central goal is to effectively remove the influence of the rows/columns to be forgotten, while avoiding the computationally prohibitive baseline approach of retraining from scratch. Existing machine unlearning methods are either designed for single-variable models and not compatible with MF that has two factors as coupled model variables, or require alternative updates that are not efficient enough. In this paper, we propose a closed-form machine unlearning method. In particular, we explicitly capture the implicit dependency between the two factors, which yields the total Hessian-based Newton step as the closed-form unlearning update. In addition, we further introduce a series of efficiency-enhancement strategies by exploiting the structural properties of the total Hessian. Extensive experiments on five real-world datasets from three application areas as well as synthetic datasets validate the efficiency, effectiveness, and utility of the proposed method.

Supplementary Material

MP4 File (1358-video.mp4)
presentation video about CMUMF approach

References

[1]
Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 141--159.
[2]
Jonathan Brophy and Daniel Lowd. 2021. Machine unlearning for random forests. In International Conference on Machine Learning. PMLR, 1092--1104.
[3]
Yinzhi Cao and Junfeng Yang. 2015. Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy (SP). IEEE, 463--480.
[4]
Chong Chen, Fei Sun, Min Zhang, and Bolin Ding. 2022. Recommendation unlearning. In Proceedings of the ACM Web Conference 2022. 2768--2777.
[5]
Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, and Yang Zhang. 2022. Graph unlearning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 499--513.
[6]
Eli Chien, Chao Pan, and Olgica Milenkovic. 2022. Certified graph unlearning. arXiv preprint arXiv:2206.09140 (2022).
[7]
Alessio Del Bue, Joao Xavier, Lourdes Agapito, and Marco Paladini. 2012. Bilinear Modeling via Augmented Lagrange Multipliers (BALM). IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 8 (2012), 1496--1508. https://doi.org/10.1109/TPAMI.2011.238
[8]
General Data Protection Regulation 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016.
[9]
Antonio Ginart, Melody Guan, Gregory Valiant, and James Y Zou. 2019. Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems 32 (2019).
[10]
Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, and Stefano Soatto. 2021. Mixed-privacy forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 792--801.
[11]
Aditya Golatkar, Alessandro Achille, and Stefano Soatto. 2020. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. In European Conference on Computer Vision. Springer, 383--398.
[12]
Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. 2001. Eigentaste: A constant time collaborative filtering algorithm. information retrieval 4, 2 (2001), 133--151.
[13]
Suriya Gunasekar, Joyce C Ho, Joydeep Ghosh, Stephanie Kreml, Abel N Kho, Joshua C Denny, Bradley A Malin, and Jimeng Sun. 2016. Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data. arXiv preprint arXiv:1609.04466 (2016).
[14]
Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. 2019. Certified data removal from machine learning models. arXiv preprint arXiv:1911.03030 (2019).
[15]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1--19.
[16]
Je Hyeong Hong and Andrew Fitzgibbon. 2015. Secrets of matrix factorization: Approximations, numerics, manifold optimization and random restarts. In Proceedings of the IEEE International Conference on Computer Vision. 4130--4138.
[17]
Haoji Hu, Haowen Lin, and Yao-Yi Chiang. 2022. Clustering Human Mobility with Multiple Spaces. In IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, December 17--20, 2022, Shusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, and Vijay Raghavan (Eds.). IEEE, 575--584.
[18]
Alistair EW Johnson, Tom J Pollard, Lu Shen, Li-wei H Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3, 1 (2016), 1--9.
[19]
Antonios Karatzoglou, Stefan Christian Lamp, and Michael Beigl. 2017. Matrix factorization on semantic trajectories for predicting future semantic locations. In 2017 IEEE 13th international conference on wireless and mobile computing, networking and communications (WiMob). IEEE, 1--7.
[20]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.
[21]
Haowen Lin and Yao-Yi Chiang. 2017. SRC: automatic extraction of phrase-level map labels from historical maps. ACM SIGSPATIAL Special 9, 3 (2017), 14--15.
[22]
Wenyan Liu, Juncheng Wan, Xiaoling Wang, Weinan Zhang, Dell Zhang, and Hang Li. 2022. Forgetting Fast in Recommender Systems. arXiv preprint arXiv:2208.06875 (2022).
[23]
Yuan Luo and Chengsheng Mao. 2020. ScanMap: supervised confounding aware non-negative matrix factorization for polygenic risk modeling. In Machine learning for healthcare conference. PMLR, 27--45.
[24]
Yuan Luo, Chengsheng Mao, Yiben Yang, Fei Wang, Faraz S Ahmad, Donna Arnett, Marguerite R Irvin, and Sanjiv J Shah. 2018. Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization. In Machine Learning for Healthcare Conference. PMLR, 102--118.
[25]
Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-preserving tensor factorization for collaborative health data analysis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1291--1300.
[26]
Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-preserving tensor factorization for collaborative health data analysis. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1291--1300.
[27]
Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, Sivasubramanium Bhavani, and Joyce C Ho. 2021. Communication efficient tensor factorization for decentralized healthcare networks. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 1216--1221.
[28]
Jing Ma, Qiuchen Zhang, Jian Lou, Li Xiong, and Joyce C Ho. 2021. Communication efficient federated generalized tensor factorization for collaborative health data analytics. In Proceedings of the Web Conference 2021. 171--182.
[29]
Ronak Mehta, Sourav Pal, Vikas Singh, and Sathya N Ravi. 2022. Deep Unlearning via Randomized Conditionally Independent Hessians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10422--10431.
[30]
Aeron Buchanan Morgan. 2004. Investigation into matrix factorization when elements are unknown. Vis. Geometry Group, Dept. Eng. Sci., Univ. Oxford, Oxford, UK, Tech. Rep (2004).
[31]
Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural networks 12, 1 (1999), 145--151.
[32]
Xun Ran, Yong Wang, Leo Yu Zhang, and Jun Ma. 2022. A differentially private nonnegative matrix factorization for recommender system. Information Sciences 592 (2022), 21--35.
[33]
Yifei Ren, Jian Lou, Li Xiong, and Joyce C Ho. 2020. Robust irregular tensor factorization and completion for temporal health data analysis. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1295--1304.
[34]
Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
[35]
Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh. 2021. Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems 34 (2021), 18075--18086.
[36]
Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, and Nicolas Papernot. 2022. Unrolling sgd: Understanding factors influencing machine unlearning. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P). IEEE, 303--319.
[37]
Cheng-Hao Tsai, Chieh-Yen Lin, and Chih-Jen Lin. 2014. Incremental and decremental training for linear classification. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 343--352.
[38]
René Vidal and Richard Hartley. 2004. Motion segmentation with missing data using powerfactorization and gpca. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., Vol. 2. IEEE, II--II.
[39]
Yaqing Wang, Quanming Yao, and James Kwok. 2021. A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning. In Proceedings of the Web Conference 2021. 1798--1808.
[40]
Yinjun Wu, Edgar Dobriban, and Susan Davidson. 2020. Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning. PMLR, 10355--10366.
[41]
Kun Xie, Lele Wang, Xin Wang, Gaogang Xie, Guangxing Zhang, Dongliang Xie, and Jigang Wen. 2015. Sequential and adaptive sampling for matrix completion in network monitoring systems. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2443--2451.
[42]
Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. 2022. ARCANE: An Efficient Architecture for Exact Machine Unlearning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4006--4013. https://doi.org/10.24963/ijcai.2022/556 Main Track.
[43]
Dingqi Yang, Daqing Zhang, Longbiao Chen, and Bingqing Qu. 2015. Nation-Telescope: Monitoring and visualizing large-scale collective behavior in LBSNs. Journal of Network and Computer Applications 55 (2015), 170--180.
[44]
Dingqi Yang, Daqing Zhang, and Bingqing Qu. 2016. Participatory cultural mapping based on collective behavior data in location-based social networks. ACM Transactions on Intelligent Systems and Technology (TIST) 7, 3 (2016), 1--23.
[45]
Dingqi Yang, Daqing Zhang, Bingqing Qu, and Philippe Cudré-Mauroux. 2016. PrivCheck: Privacy-preserving check-in data publishing for personalized location based services. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 545--556.
[46]
Guojun Zhang, Kaiwen Wu, Pascal Poupart, and Yaoliang Yu. 2020. Newton-type methods for minimax optimization. arXiv preprint arXiv:2006.14592 (2020).
[47]
Yin Zhang, Matthew Roughan, Walter Willinger, and Lili Qiu. 2009. Spatiotemporal compressive sensing and internet traffic matrices. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication. 267--278.

Cited By

View all
  • (2023)Certified minimax unlearning with generalization rates and deletion capacityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668866(62821-62852)Online publication date: 10-Dec-2023

Index Terms

  1. Closed-form Machine Unlearning for Matrix Factorization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine unlearning
    2. matrix factorization
    3. privacy-preserving

    Qualifiers

    • Research-article

    Funding Sources

    • Guangzhou Key Research and Development Program
    • Guangdong High-level Innovation Research Institution Project
    • Guangdong Basic and Applied Basic Research Foundation
    • National Institute of Health (NIH)
    • National Science Foundation of China (NSFC)
    • National Science Foundation (NSF)

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)225
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Certified minimax unlearning with generalization rates and deletion capacityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668866(62821-62852)Online publication date: 10-Dec-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media