5.1.1 Unlearning Schemes Based on Model Shifting.
As shown in Figure
8, model-shifting methods usually eliminate the influence of unlearning data by directly updating the model parameters. These methods mainly fall into one of two types—influence unlearning and Fisher unlearning—but there are a few other methods.
(1) Influence unlearning methods
Influence unlearning methods are usually based on influence theory [
38]. Guo et al. [
41] proposed a novel unlearning scheme called
certified removal. Inspired by differential privacy [
88],
certified removal first limits the maximum difference between the unlearned and retrained models. Then, by applying a single step of Newton’s method on the model parameters, a
certified removal mechanism is provided for practical applications of
\(L_{2}-\) regularized linear models that are trained using a differentiable convex loss function. Additionally, the training loss is perturbed with a loss perturbation technique that hides the
gradient residual. This further prevents any adversaries from extracting information from the unlearned model. It is worth noting, however, that this solution is only applicable to simple machine learning models, such as linear models, or only adjusts the linear decision-making layer for deep neural networks, which does not eliminate the information of the removed data sample, since the representations are still learned within the model.
Izzo et al. [
51] proposed an unlearning method based on a gradient update called
projection residual update (PRU). The method focuses on linear regression and shows how to improve the algorithm’s runtime given in Reference [
41] from quadratic complexity to linear complexity. The unlearning intuition is as follows: If one can calculate the values
\(\hat{y}_{i_{\mathcal {D}_u}}=w_{u} (x_{i_{\mathcal {D}_u}})\), predicted by the unlearned model on each of the unlearned samples
\(x_{i_{\mathcal {D}_u}}\) in
\(\mathcal {D}_u\) without knowing
\(w_{u}\), and then minimize the loss of already-trained model on the synthetic samples
\((x_{i_{\mathcal {D}_u}}, \hat{y}_{i})\), then the parameters will move closer to
\(\mathbf {w}_{u}\), since it will achieve the minimum loss with samples
\((x_{i_{\mathcal {D}_u}}, \hat{y}_{i_{\mathcal {D}_u}})\). To calculate the values
\(\hat{y}_{i_{\mathcal {D}_u}}\) without knowing
\(w_{u}\), they introduced a statistics technique and computed leave-one-out residuals. Similar to the above, this method only considers the unlearning process in simple models.
Information leaks may not only manifest in a single data sample but also in groups of features and labels [
53]. For example, a user’s private data, such as their telephone number and place of residence, are collected by data providers multiple times and generated as different samples of the training dataset. Therefore, unlearning operations should also focus on unlearning a group of features and corresponding labels.
To solve such problems, Warnecke et al. [
53] proposed a
certified unlearning scheme for unlearning features and labels. By reformulating the influence estimation of samples on the already-trained models as a form of unlearning, they derived a versatile approach that maps changes of the training dataset in retrospection to closed-form updates of the model parameters. They then proposed different unlearning methods based on
first-order and
second-order gradient updates for two different types of machine learning models. For the
first-order update, the parameters were updated based on the difference between the gradient of the original and the perturbed samples. For the
second-order update, they approximated an inverse Hessian matrix based on the scheme proposed in Reference [
89] and updated the model parameters based on this approximate matrix. Theoretical guarantees were also provided for feature and label unlearning by extending the concept of differential privacy [
88] and certified unlearning [
41]. However, this solution is only suitable for feature unlearning from tabular data and does not provide any effective solution for image features.
(2) Fisher unlearning method
The second type of model-shifting technique uses the Fisher information [
90] of the remaining dataset to unlearn specific samples, with noise injected to optimize the shifting effect. Golatkar et al. [
40] proposed a weight
scrubbing method to unlearn information about a particular class as a whole or a subset of samples within a class. They first give a computable upper bound to the amount of the information retained about the unlearning dataset after applying the unlearning procedure, which is based on the
Kullback-Leibler (KL) divergence and Shannon mutual information. Then, an optimal quadratic unlearning algorithm based on a Newton update and a more robust unlearning procedure based on a noisy Newton update were proposed. Both schemes can ensure that a cohort can be unlearned while maintaining good accuracy for the remaining samples. However, this unlearning scheme is based on various assumptions, which limits its applicability.
For deep learning models, bounding the information that can be extracted from the perspective of weight or weight distribution is usually complex and may be too restrictive. Deep networks have a large number of equivalent solutions in the distribution space, which will provide the same activation on all test samples [
43]. Therefore, many schemes have redirected unlearning operations from focusing on the weights to focus on the final activation.
Unlike their previous work, Golatkar et al. [
43] provide bounds for how much information can be extracted from the final activation. They first transformed the bounding from a weight perspective to final activation based on Shannon mutual information and proposed a computable bound using the
\(KL\)-divergence between the distribution of final activation of an unlearned model and retrained model. Inspired by the
neural tangent kernel (NTK) [
91,
92], they considered that deep network activations can be approximated as a linear function of the weights. Hence, an optimal unlearning procedure is then provided based on a Fisher information matrix. However, due to the specific structure of deep neural networks, considering unlearning process only in the final activation layer may not satisfy the effectiveness of unlearning. Once an attacker obtains all model parameters in a white-box scenario, they can still infer information from the middle layers.
Golatkar et al. [
52] also proposed a mix-privacy unlearning scheme based on a new
mixed-privacy training process. This new training process assumes the traditional training dataset can be divided into two parts:
core data and
user data. Model training on the
core data is non-convex, and then further training, based on the quadratic loss function, is done with the
user data to meet the needs of specific user tasks. Based on this assumption, unlearning operations on the
user data can be well executed based on the existing quadratic unlearning schemes. Finally, they also derived bounds on the amount of information that an attacker can extract from the model weights based on mutual information. Nevertheless, the assumption that the training dataset is divided into two parts and that the model is trained using different methods on each of these parts restricts unlearning requests to only those data that are easy to unlearn, making it difficult to unlearn other parts of the data.
Liu et al. [
93] transferred the unlearning method from a centralized environment to federated learning by proposing a distributed Newton-type model updating algorithm to approximate the loss function trained by the local optimizer on the remaining dataset. This method is based on the Quasi-Newton method and uses a first-order Taylor expansion. They also use diagonal empirical
Fisher Information Matrix (FIM) to efficiently and accurately approximate the inverse Hessian vector, rather than computing it directly, to further reduce the cost of the retraining process. However, this solution will result in a significant reduction in accuracy when dealing with complex models.
(3) Other Shifting Schemes
Schelter et al. [
24] introduced the problem of making trained machine learning models unlearn data via
decremental updates. They described three decremental update algorithms for different machine learning tasks. These included one based on item-based collaborative filtering, another based on ridge regression, and the last based on
\(k\)-nearest neighbors. With each machine learning algorithm, the intermediate results are retained, and the model parameters are updated based on the intermediate results and unlearning data
\(D_{u}\), resulting in an unlearned model. However, this strategy can only be utilized with those models that can be straightforwardly computed to obtain the model parameters after the unlearning process, limiting the applicability of this scheme.
In addition, Graves et al. [
45] proposed a laser-focused removal of sensitive data, called
amnesiac unlearning. During training, the model provider retains a variable that stores which samples appear in which batch, as well as the parameter updates for each batch. When a data unlearning request arrives, the model owner undoes the parameter updates from only the batches containing the sensitive data, that is,
\(\mathcal {M}_{w_u}=\mathcal {M}_{w}-\sum \Delta _{w}\), where
\(\mathcal {M}_{w}\) is the already-trained model and
\(\Delta _{w}\) are the parameter updates after each batch. Because undoing some parameters might greatly reduce the performance of the model, the model provider can perform a small amount of fine-tuning after an unlearning operation to regain performance. This approach requires the storage of a substantial amount of intermediate data. As the storage interval decreases, the amount of cached data increases, and smaller intervals lead to more efficient model unlearning. Therefore, a tradeoff exists between efficiency and effectiveness in this method.
The above methods mainly focused on the core problem of empirical risk minimization, where the goal is to find approximate minimizers of the empirical loss on the remaining training dataset after unlearning samples [
41,
51]. Sekhari et al. [
42] proposed a more general method of reducing the loss of unseen samples after an unlearning process. They produced an unlearned model by removing the contribution of some samples from an already-trained model using a disturbance update calculated based on some cheap-to-store data statistics during training. In addition, they proposed an evaluation parameter to measure the unlearning capacity. They also improved the data unlearning capacity of convex loss functions, which saw a quadratic improvement in terms of the dependence of
d over differential privacy, where
d is the problem dimension.
5.1.2 Verifiability of Schemes Based on Parameter Shifting.
Izzo et al. [
51] provided two metrics to measure the effectiveness:
\(L_2\) distance and
feature injection test.
\(L_2\) distance measures the distance between the unlearned model and the retrained model. If the
\(L_2\) distance is small, then the models are guaranteed to make similar predictions, which could reduce the impact of output-based attacks, like a membership inference attack. The
feature injection test can be thought of as a verification scheme based on a poisoning attack.
Golatkar et al. [
40,
43,
52] verify the effectiveness of their unlearning schemes based on accuracy and relearning time. They also developed two new verification metrics:
model confidence and
information bound [
40].
Model confidence is formulated by measuring the distribution of the entropy of the output predictions on the remaining dataset, the unlearning dataset, and the test dataset. Then they evaluated the similarity of those distributions against the confidence of a trained model that has never seen the unlearning dataset. The higher the degree of similarity, the better the effect of the unlearning process. The
information bound metric relies on KL-divergence to measure the information remaining about the unlearning dataset within the model after the unlearning process.
Different from their previous work, Golatkar et al. [
43] also evaluate the information remaining within the weights and the activation. In their other work [
52], they provided a new metric,
activation distance, to analyze the distance between the final activations of an unlearned model and a retrained model. This is a similar metric to
model confidence [
40]. In addition, they use attack-based methods for verification [
43,
52].
Guo et al. [
41], Warnecke et al. [
53], and Sekhari et al. [
42] provide a method of theoretical verification to verify the effectiveness of their proposed unlearning schemes. Based on the guarantee provided by
certified unlearning, they limit the distribution similarity between the unlearned model and the retrained model. Warnecke et al. [
53] also use the
exposure metric [
2] to measure the remaining information after unlearning. Liu et al. [
93] analyzed the validity of the unlearning scheme through two aspects. The first metric,
Symmetric Absolute Percentage Error (SAPE), is created based on accuracy. The second metric is the difference between the distribution of the model after the unlearning process and the distribution of the retraining model.