A Survey on Federated Unlearning: Challenges, Methods, and Future Directions
Abstract
1 Introduction
Ref. | Def. | Taxonomy | Review | Insight | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Target Formalization | Summary of Challenges | Unlearning Workflow | Who-unlearn | Unlearn-what | Who-verify | Comprehensive Review | Principle Analysis | Security and Privacy | Proof of Unlearning | FL-tailored optimization | Limitation | Experimental Evaluation | Future Directions | |
Wang et al. [114] | \(-\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(-\) | \(-\) | \(-\) | \(-\) | \(\checkmark\) | \(-\) | \(-\) | \(-\) | \(-\) | \(-\) |
Wu et al. [131] | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(-\) | \(-\) | \(-\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(-\) | \(\checkmark\) |
Yang and Zhao [138] | \(\checkmark\) | \(\checkmark\) | \(-\) | \(-\) | \(\checkmark\) | \(-\) | \(-\) | \(-\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) |
Nicolò et al. [103] | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(-\) | \(-\) | \(\checkmark\) | \(\checkmark\) |
Jeong et al. [53] | \(-\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) |
Ours | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(-\) | \(\checkmark\) |
2 Targets and Challenges of Federated Unlearning
2.1 Targets of Federated Unlearning
2.2 Challenges of Federated Unlearning
Target | Challenge | Characteristic | |||||||
---|---|---|---|---|---|---|---|---|---|
Knowledge Permeation | Data Isolation | Who-Unlearn | Unlearn-What | Who-Verify | Constrained Resources | Participant Heterogeneity | Client Dynamics | Security and Privacy | |
Model Consistency | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||||
Unlearning Efficiency | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | |||
Privacy Preservation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | ||||
Certified Removal | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
3 Preliminaries and Backgrounds
3.1 Machine Unlearning
3.1.1 Unlearning Principles.
3.1.2 Verification.
3.2 Federated Learning
3.2.1 Overview of Federated Learning.
3.2.2 Security and Privacy Threats in Federated Learning.
3.3 Federated Unlearning
3.4 Attacks to ML Models
3.4.1 Membership Inference Attacks (MIA).
3.4.2 Backdoor Attacks (BA).
4 Federated Unlearning Methods
Ref. | Who-Unlearn | Unlearn-What | Principle | Method | Verifier | Verify Method | |
---|---|---|---|---|---|---|---|
Passive Unlearning | [54, 146] | Server | Target Client | Fine-tuning | Iteratively remove the contributions of the target client evaluated based on its historical gradient residuals. | NA | Accuracy-based metrics, unlearning time, MIA |
[8] | Server and Remaining clients | Target client | Model scrubbing | The server scrubs the model iteratively based on the estimation over historical gradients and global models, while the remaining clients periodically participate to eliminate accumulated estimation errors. | NA | Test error rate, backdoor attack, average computation/communication costs saving | |
[75] | Server | Target client | Retraining | Remove nodes affected by the target client until reaching the leaf node. Then reconstruct the affected branches based on previously stored intermediate information. | NA | Accuracy-based metrics | |
[129, 130] | Server | Target client | Multi-task unlearning | Unlearn by directly averaging the models of remaining clients, while avoiding forgetting by optimizing the knowledge distillation loss between unlearned model and previous global model. | NA | Accuracy-based metrics, backdoor attack | |
[112] | Sever and Target client and Some remaining clients | Target client | Retraining | Divide clients into clusters for asynchronous aggregation. To unlearn some data, only clients in the same cluster are retrained. | Sever | Validation accuracy, deviation across recent validation accuracies | |
[58] | Server and All clients | Target client | Gradient ascent | The server refines the global model with gradient ascent in a subspace based on the gradient provided by the target client and the representation matrix provided by the remaining clients. | NA | Backdoor attack | |
[25] | Server and Remaining clients | Target client | Retraining | Find a historical global model based on a bounded sensitivity metric calculated based on clients’ historical contributions, from where the remaining clients retrain. | NA | Number of retraining rounds, accuracy-based metrics | |
[142] | Server and Remaining clients | Target client | Fine-tuning | Iteratively and selectively calibrate historical gradients to reconstruct the calibrated global model. | NA | Backdoor attack | |
[141] | Remaining clients | Target Client | Model scrubbing | Each client retains neighboring distilled models, and predictions are obtained through an ensemble of the main model and seed model. To unlearn a target client, simply delete the seed model associated with that target client. | NA | Accuracy-based metrics | |
[71] | Server and Remaining clients | Target client | Fine-tuning | Iteratively calibrate historical gradients to reconstruct the calibrated global model. | NA | Metrics, parameter deviation, MIA | |
[97] | Server | Target client | Fine-tuning | Server aggregate the vectors from remaining clients representing their local clustering result. | Server | Global model convergence | |
[28] | Server | Target client | Fine-tuning | The server then amplifies the gradients from the remaining clients and reduces the gradients of the target client. | Target client | Accuracy-based metrics | |
[67] | Server and Remaining clients | Target client | Retraining | Retraining based on isolated shard and coded computing | NA | Accuracy, time, storage, MIA | |
[120] | Remaining clients | Target client | Fine-tuning | Iteratively calibrate historical gradients to reconstruct the calibrated global model. | NA | Backdoor attack | |
[26, 107] | Server | Target client | Fine-tuning | Calibrate historical gradients with penalty term based on projected gradients. | NA | Accuracy-based metrics, backdoor attack | |
[41] | Server | Target client | Fine-tuning | Fine-tuning the model by subtracting target model updates. | NA | Accuracy-based metrics, time, CPU usage, memory | |
[49] | Server | Target client | Fine-tuning | Calibrate historical gradients with guidance of estimated skew. | NA | Accuracy, backdoor attack | |
[113] | Server | Target client and Partial data | Retraining | Strategic retraining based on the change of sampling probability. | NA | Accuracy, time, MIA |
4.1 Passive Unlearning
4.1.1 Server-Standalone Unlearning.
4.1.2 Client-Aided Unlearning.
4.2 Active Unlearning
Ref. | Who-Unlearn | Unlearn-What | Principle | Method | Verifier | Verify Method | |
---|---|---|---|---|---|---|---|
Active Unlearning | [105] | Server and All clients | Partial data | Fine-tuning | The server aggregates local models and attention scores from all clients, based on which filters out unlearned data points and updates the global model. | Server | Global model convergence |
[156] | Server and All clients | Partial data | Fine-tuning and Multi-task unlearning | All clients iteratively optimize the local embedding based on mutual knowledge distillation following a multi-task style. | Server | Prediction results on knowledge | |
[9] | Server | Partial data | Model scrubbing | Treat unlearned data as a perturbation on the whole dataset. Refine the global model by random smoothing. | Server | Certified budget of data removals | |
[23] | Server and All clients | Partial data | Retraining | After each client removes unlearned data and calculates the new updates, the server identifies a prior state from which the remaining clients proceed with their training. | Server | Global model convergence | |
[134] | All clients | Partial data | Retraining | The client calculates a new model based on the remaining data. If the original model matches the new quantized model, the FL model remains unchanged. Otherwise, retraining is required. | NA | Accuracy-based metrics, MIA, speed-up ratio | |
[33, 34, 36] | Target client | Partial data | Retrain | Use variational inference to approximate Bayesian posterior probability. After the client requests to leave, the client uses the remaining data to reapproximation the posterior probability and execute an extra round of model upload | NA | Accuracy, Posterior Distribution | |
[74] | Target clients | Partial data | Model scrubbing | Dummy gradients are computed to align confidence vectors of the unlearned model with that of a perfectly unlearned model. | NA | Forgetting rate | |
[116] | Server and all clients | Partial data (a class) | Model scrubbing | The server assesses the relevance between channels and classes and establishes a pruner to selectively trim the most distinguishing channels of the target class. | NA | Accuracy-based, speed-up ratio, MIA | |
[133] | All clients | Partial data | Multi-task unlearning | Engage in multi-task learning to optimize the loss of the local model, the loss from an MIA-like evaluation model, and a penalty from the difference between the local model and the global model. | NA | Convergence analysis, accuracy-based metrics, forgetting rate | |
[91] | Target client | Partial data | Model Scrubbing | Target clients iteratively minimizes the distance between the posteriors of the data to be forgotten and those of non-member data for unlearning. | NA | Accuracy-based and efficiency-based metrics, MIA | |
[63] | Target clients | Partial data | Multi-task unlearning | Synthetic labels are generated based on teacher ensembles for the data to be unlearned, and training is conducted using this data with synthetic labels to achieve unlearning. | NA | Accuracy, running time | |
[122] | Target client | Partial data | Multi-task unlearning | Adopts a multi-task unlearning approach that utilizes a parameter self-sharing method to strike a balance between forgetting the unlearned data and retaining the remaining knowledge. | NA | Running time, model differences, accuracy-based metrics, backdoor attack | |
[35] | Target client | Partial data | Retraining | Shares a core concept with [34], employing exponential family distributions in VI to approximate posterior probability. | NA | KL-divergence | |
[1] | Target clients | Partial data | Gradient ascent | Follows the gradient ascent method, utilizing the loss from the local benign dataset to constrain unbounded losses. | NA | Accuracy-based metrics, backdoor attack | |
[76] | Target client | Partial data | Model scrubbing | Scrubs the model based on the approximation of Hessian matrix using the remaining data. | NA | Running time, accuracy, model utility | |
[97] | Target clients | Partial data | Retraining | Each client computes a new local vector, and these vectors are subsequently aggregated by the server. | Server | Global model convergence | |
[55] | Target client | Target client | Model scrubbing | Scrubs the model based on the approximation of Hessian matrix using public server data. | NA | Accuracy-based metrics, MIA | |
[42] | Target client | Target client | Gradient ascent | Target client computes the maximum empirical loss with the constraint of the reference model from remaining clients. | NA | Accuracy-based metrics, backdoor attack | |
[38] | Server and all clients | Partial data | Model Scrubbing | Linear combination of the trained model and auxiliary model obtained during unlearning. | NA | Accuracy | |
[117] | Server | Target client | Gradient ascent | Unlearning low-quality data with concurrent boost training with good-quality data | Server | Accuracy, loss, running time | |
[145] | Target client | Partial data | Retraining | Retrain the model based on prune local models | NA | Accuracy, loss | |
[17] | All clients | Target client or a class | Gradient ascent | Reverse training on the distilled dataset | NA | Accuracy, time, rounds, data size, MIA | |
[115] | Server and all clients | Partial data | Multi-task unlearning | Optimize model performance on the remaining dataset while considering bias caused by the unlearned data. | NA | Backdoor attack, L2 distance, JS divergence, T-test | |
[152] | Server and target clients | Partial data or a class | Fine-tuning | Fine-tuning based on randomly initialized degradation models | NA | Backdoor attack, accuracy | |
[125] | All clients | Partial data (a feature) | Retraining | Rapid retraining using first-order method based on reinitialized model. | NA | Accuracy-based metrics | |
[134] | Server and target client | Target client | Model scrubbing | Achieving indistinguishability based on DP definition. | NA | Accuracy, loss, MIA, speed-up ratio | |
[137] | Server and target client | Target client | Synthetic data | Training on perturbed unlearned data. | NA | Accuracy | |
[37] | Server and target client | Target client | Model scrubbing | Local feature unlearning to minimize feature sensitivity. | NA | Sensitivity, bias, backdoor attack | |
[93] | Server and target client | Target client or partial data | Multi-task unlearning | Influence removal based on confusion Loss and performance recovery based on saliency map. | NA | Accuracy, MIA, backdoor attack |
4.2.1 Unlearn Partial Data.
4.2.2 Unlearn Entire Client.
4.3 Verification
4.3.1 Client-Side.
4.3.2 Server-Side.
4.3.3 Verification Metrics.
4.3.4 Limitations.
4.4 Lessons Learned
5 FL-Tailored Optimizations, Limitations, and Applications
Limitation | Optimization Method | Reference | |
---|---|---|---|
Constrained Resources | Memory | Selective storage | [54, 71, 142] |
Compression | [67] | ||
Communication | Size reduction | [134] | |
Rounds reduction | [23, 25, 134] | ||
Clustering | [83, 100, 112, 124] | ||
Computation | Approximation | [8, 55, 63, 76] | |
Parallel computation | [9] | ||
Outsource computation | [8, 63] | ||
Pre-computation | [38] | ||
Dataset distillation | [17] | ||
Participant Heterogeneity | Partitioned feature | Vertical FL | [16, 75, 125, 145] |
Variational representation | Knowledge distillation | [105, 115, 156] | |
Non-IID distribution | Weighted aggregation | [23, 105, 112, 116] | |
Knowledge distillation | [141, 156] | ||
Training capability | Client clustering | [112] | |
Diverse payoff | Incentive mechanism | [19, 20, 68] | |
Security and Privacy Threats | Privacy-preservation | Indirect information | [19, 97, 141] |
PETs | [74, 75, 83, 87, 134, 145, 146] |
5.1 Constrained Resources
5.1.1 Memory.
5.1.2 Communication.
5.1.3 Computation.
5.2 Participant Heterogeneity
5.3 Security and Privacy Threats
5.4 Applications for Enhanced Security
5.5 Lessons Learned
6 Discussions and Promising Directions
6.1 Privacy-Preserving FU
6.2 Verification and Proof of Unlearning
6.3 Emerging Threats in Unlearning
6.4 Awareness of Client-Dynamics
6.5 Domain-Specific Applications
6.6 Fairness and Explainability
6.7 Integration with MLaaS
7 Conclusions
Footnote
References
Index Terms
- A Survey on Federated Unlearning: Challenges, Methods, and Future Directions
Recommendations
Challenges and future directions of secure federated learning: a survey
AbstractFederated learning came into being with the increasing concern of privacy security, as people’s sensitive information is being exposed under the era of big data. It is an algorithm that does not collect users’ raw data, but aggregates model ...
Incentive Mechanism Design for Federated Learning and Unlearning
MobiHoc '23: Proceedings of the Twenty-fourth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile ComputingTo protect users' right to be forgotten in federated learning, federated unlearning aims at eliminating the impact of leaving users' data on the global learned model. The current research in federated unlearning mainly concentrated on developing ...
Incentive and Dynamic Client Selection for Federated Unlearning
WWW '24: Proceedings of the ACM Web Conference 2024With the development of AI-Generated Content (AIGC), data is becoming increasingly important, while the right of data to be forgotten, which is defined in the General Data Protection Regulation (GDPR) and permits data owners to remove information from ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Survey
Funding Sources
- National Research Foundation, Singapore and Infocomm Media Development Authority under its Trust Tech Funding Initiative
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 3,262Total Downloads
- Downloads (Last 12 months)3,262
- Downloads (Last 6 weeks)1,472
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in