1 Introduction
Every day, billions of photos are captured by our smartphones, containing various subjects such as locations, family members, and other individuals who are inadvertently included. A significant portion of these pictures are promptly shared on social media platforms [
6,
42]. In fact, in 2022 there were about 4.62 billion active social media users [
36], and approximately 95 million photos and videos are estimated to be shared on Instagram each day. This habit has become so widespread that all smartphones come pre-installed with various social media applications and users spend
\(43\%\) of their overall phone usage time on such applications [
36].
Online services behind these applications, as explicitly written in their privacy policy statements, are allowed by the users to process the uploaded contents and automatically extract metadata with the purpose of improving the service itself. In particular, soft biometrics like gender, age, and ethnicity, together with people identities and places, are among the most common information extracted from pictures [
4,
11,
33,
48,
52,
56]. Although most of the social media platform users accept privacy statements and may deliberately share sensitive information with the platform [
48], for instance, while filling out registration forms, about one-third of the users are concerned about protecting their privacy and want to avoid potential misuse of personal information [
36]. As discussed in [
5,
16,
22,
23,
41], user concerns are justified by the fact that the platforms themselves may pose potential threats by extracting unauthorized metadata from individuals, other than the user, who are in the picture or by engaging in targeted advertising, and trying to manipulate the user’s behavior. Additionally, malicious users may also be interested in inferring sensitive information, such as soft biometrics, from shared contents, for personalized advertising or social engineering attacks [
21], as an example.
Therefore, it is important for a user to have a convenient and effective way of concealing soft biometrics information in pictures while maintaining a reasonable level of quality. In this way, other humans can still perceive the picture as authentic and be able to clearly identify faces, but automatic systems can be fooled. Hereinafter, we will use the term opponent to encompass both automated processes and people who, deceptively and without the user’s knowledge, attempt to extract soft-biometrics data from the content shared by the latter. Furthermore, we will define obfuscation as the deliberate process of altering elements within a picture to hide sensitive information from opponents, and we will refer to the modified images as obfuscated images.
Of course, since faces are the most relevant and richly detailed element of a picture [
4], and they are used by both humans and machines [
10,
17,
18,
52,
53,
60] to extract information about people, it is reasonable to assume that it would be acceptable for a user to focus only on them instead of altering the picture as a whole. This is also confirmed by the interest of researchers who in the last years have been mostly focused on methods working on faces. Among these, common approaches are based on
face de-identification [
57] that aims to hide identification information by modifying or replacing the face of a person [
27] on a picture. Scrambled images techniques have been proposed in [
35]; similarly to de-identification, they are used to hide people’s identity by covering their faces, but with patches. Very recently, thanks to the success of deep learning methods in computer vision, new approaches for face de-identification are based on
Generative Adversarial Networks (GANs) [
43,
47,
66,
71]. These latter are very effective and, while the person in the resulting picture is not identifiable, the facial features are still recognizable, to the point that it is possible to extract coherent biometric features and that a human does not perceive the face as a fake. Nevertheless, the drawback of all the de-identification approaches, making them unfit for our purpose, is that the face is considerably different from the original one, so that the users themselves would not be able to recognize their faces while looking at the obfuscated image.
In [
40,
44,
45,
63,
68] a new trend has emerged: the idea is to avoid altering the faces in a perceptible way and to allow managing the tradeoff between the effectiveness of the obfuscation and the quality of the output image; the approach is based on
adversarial machine learning methods, also known as
adversarial attacks. The basic idea is to exploit the intrinsic weaknesses of
convolutional neural networks (CNNs), which are generally used to realize modern image processing systems, by generating properly corrupted images, called
adversarial examples, that induce the CNN to output wrong predictions. Indeed, as discussed in [
8,
24,
67], the impressive accuracy achieved by CNNs on several computer vision tasks is not accompanied by an equally remarkable robustness w.r.t. a family of image corruptions, named
adversarial patterns, that are generated to purposefully mislead neural networks. The reason these methods fit for the purpose at hand lies in the fact that the adversarial examples are generated to bound the noise to a level that it is not perceived by a human, but that can still induce an error in the neural network [
7,
24]. It is worth noting that, after the appearance of adversarial attacks, some defenses also have been proposed to make the CNN more robust against the adversarial examples [
12,
38,
46,
58,
62]; thus, we can assume that such defenses may be adopted by the opponent to prevent the effects of the obfuscation.
This article presents a comprehensive analysis aimed at assessing the effectiveness of well-known state-of-the-art adversarial machine learning techniques as privacy tools for users. The purpose is to investigate whether these techniques can be exploited by the users to obfuscate faces in pictures shared on social media platforms, with the intention of hiding the gender, ethnicity, and age of individuals in the images.
The proposed analysis is based on the following assumptions:
(1)
The users want to obfuscate soft biometrics extracted from faces by an opponent through CNNs, while keeping their faces clearly recognizable by humans.
(2)
The amount of noise added to the image must not affect the quality of the image perceived by a human.
(3)
The users are not aware of the specific CNN model used by an opponent to extract biometric features, but they can use well-known pre-trained neural networks commonly used for face analysis.
(4)
The opponent may use defense strategies based on adversarial training or denoising stages for preprocessing.
(5)
The generation of obfuscated images must be performed in a time that is reasonable for the user (e.g., less than 5 seconds).
Despite recent works having faced similar problems, focusing the analysis on text contents [
1], face recognition [
15,
53,
63], or social graphs [
41], to the best of our knowledge, this is the most extensive analysis on the application of adversarial machine learning methods to prevent the extraction of soft biometrics from people’s faces in shared contents. We conducted the analysis by comparing four state-of-the-art adversarial attacks on large standard face datasets. According to the purpose of the proposed analysis, standard defenses also have been considered, such as adversarial training and denoising autoencoders. These approaches require having samples of obfuscated images, possibly generated using the same adversarial attacks and CNNs, but it is reasonable to expect that, as the networks and the attacks are available for a user, they can be also exploited by an opponent to implement defense strategies. Consequently, it becomes crucial to assess the effectiveness of obfuscation techniques despite the presence of such defenses.
The remainder of the article is structured as follows: in Section
2 we describe the methodology adopted to conduct the analysis by detailing CNNs, datasets, attacks, and defenses; in Section
3 we give details about our experimental framework, explaining the assumptions and the design choices; in Section
4 we describe the experiments and discuss the results; and finally, in Section
5 we provide the final outcomes and conclusions.
3 Experimental Framework
In this section we discuss in more detail the experimental setup and how we have prepared the networks, the obfuscation methods, and the defenses to conduct the analysis. All the experiments reported in the following sections have been performed on a workstation equipped with an Intel i7-3770S, 32 Gb of RAM, and an NVIDIA TITAN Xp with 12 Gb of RAM; the software platform is Ubuntu 18.04.6 LTS with Tensorflow 1.15.2, Keras 2.3.1, and CUDA 10.1.
3.1 CNNs for Facial Soft Biometrics Recognition
All the CNNs for the recognition of facial soft biometrics have been trained on VGGFace2 using the original labels and those provided by the extensions VMER and VMAGE for ethnicity and age, respectively. The overall training set is thus composed of 8.631 identities and more than 3.1 million images. The base accuracy values have been computed on the test set of VGGFace2, containing 500 identities and around 170,000 images, for gender and ethnicity recognition tasks, and on the whole Adience dataset, including 26,580 face images, for the age group classification task.
Some of the CNNs used in the analysis have been found already pre-trained on the mentioned training set. In particular, the authors of [
26] have publicly shared the pre-trained weights of the three CNN models for age group classification. Likewise, in [
10,
25], the authors have published pre-trained weights for VGG-16 and SENet, respectively, specifically for gender recognition tasks. These pre-trained networks are very suitable for the purpose of our analysis; indeed, they have demonstrated state-of-the-art accuracy and have been validated to exhibit robustness against common corruptions typically encountered in real-world scenarios. Therefore, to conduct our experiments, we had to train the neural networks for ethnicity recognition and MobileNetV3 for gender recognition by following procedures similar to those adopted for the other pre-trained models. We exploited a
Single Shot Detector (SSD) based on the CNN ResNet-10 to obtain the crop of the single face that is present in each of the VGGFace2 images. As the crop can have a rectangular shape but the CNNs expect a
\(224\times 224\) pixels input, we applied padding to ensure that the face is consistently centered within the box. Additionally, we aimed for the face to occupy an average of
\(80\%\) of the input image, as suggested in [
34]. It is worth pointing out that considering images with a single person represents the worst case for the user, since this eliminates for the opponent the chance to miss the face and it also allows us to neglect the error of the detector in our analysis.
Following the training procedure adopted for the other CNNs, data augmentation techniques have been used to enhance the robustness of the CNNs against common corruptions. The variations applied have been the following:
(1)
Random rotation: The angle has been sampled from the range \([-10^{\circ },10^{\circ }]\) .
(2)
Random change of bounding box or shift: This variation aims at simulating errors related to the face detector; the effect is similar to a random crop, causing the face to not be perfectly centered in the box.
(3)
Random brightness: To simulate overexposure and underexposure, we have randomly changed the brightness of the original image in the range \([-30\%, 30\%]\) of the pixel intensity.
For both the random variations (1) and (2) we have considered a zero mean normal distribution with standard deviations equal to \(10^{\circ }\) and \(2,5\%\) of the bounding box width, respectively. Furthermore, during the augmentation, we have used a pseudo-random procedure to apply two or more variations together to make the process more effective.
After the data augmentation, the resulting images have been normalized by subtracting the average value of each color channel computed over all the images in the dataset. This normalization has the effect of zero-centering every channel and has been demonstrated to improve the convergence of the loss function [
64].
Finally, the networks have been trained through Stochastic Gradient Descent (SGD) with a batch size equal to 128 for MobileNetV3 and 32 for VGG-16 and SENet. The training process started with a learning rate set to 0.005 with a decay factor of 0.2 every 20 epochs, to gradually adjust the learning rate. A weight decay of 0.05 has been used to prevent overfitting. Since all the tasks of interest are formulated as multi-class classification problems, we have used a categorical cross-entropy loss function.
3.2 Setup of the Attacks to Obfuscate the Soft Biometrics
Assumption 2 in Section
1 forced a tradeoff between the effectiveness of the attacks and the amount of noise added on the image. Therefore, we do not expect to achieve the best result in terms of attack success rate. For the same reason, to avoid a degradation of the quality of the obfuscated image, we have empirically estimated the maximum amount of noise that an adversarial attack can add during the obfuscation and limited the effect of the latter by constraining both the norms
\(L_{\infty }\) and
\(L_2\) with the values of 15 and 900, respectively. It is worth noting that the two constraints affect the noise on the output image in different ways: the norm
\(L_{\infty }\) limits the maximum perturbation on each single pixel of the altered image, while the norm
\(L_2\) limits the maximum noise over the whole obfuscated image. These norms are used by the attacks, during the optimization process, to measure the distance between the original and the obfuscated samples. Therefore, on the one hand, they affect the effectiveness of the attack, and on the other hand, they impact the quantity of noise added to the image.
To estimate the limits, samples have been prepared with different intensities of noise for each type of adversarial attack. These samples have been then evaluated by five people to determine the maximum level of noise that could be added before it became noticeable to most of them.
For each attack, the adversarial examples have been generated by obfuscating, respecting the constraints, the same faces used to assess the accuracy of the CNNs, using the following parameters:
—
FGSM with an \(\epsilon\) value equal to 0.01.
—
PGD has been iterated for a maximum of 40 steps with an \(\alpha\) and step size equal to 0.01 and 0.005, respectively, in the case of ethnicity and gender recognition, and to 0.007 and 0.007 for the age estimation task.
—
DeepFool with an overshoot of the boundary of 0.02 and 50 as iteration limit.
—
\(C\&W\) with learning rate of 0.02 and maximum 50 iterations.
In addition, we have taken into account the effect of the Additive White Gaussian Noise (AWGN) with unitary variance, applied to each color channel. Although it is not an adversarial attack, the accuracy of the CNN in the presence of random perturbations can be considered as a reference about its initial robustness; indeed, we expect that the CNNs used by an opponent are quite insensitive to slight AWGN, which is ascribable to low-quality sensors or other external factors, because it is usually applied to the training data as a data augmentation technique.
For the sake of clearness, in our experiments we have analyzed two distinct scenarios:
(1)
White box scenario: The user has generated the obfuscated images using the same network adopted by the opponent. While this scenario may not be realistic in practice, it serves as a baseline to evaluate the effectiveness of the obfuscation.
(2)
Black box scenario: The user has created the obfuscated images using a different CNN than the one of the opponent. This scenario is particularly significant as it represents the most realistic situation that a user can encounter in the real world. In this scenario, when obfuscating the images, the user is unaware of the specific network used to extract the soft biometrics. To address this scenario, we conducted a transferability analysis to determine whether an adversarial example crafted to fool a particular CNN could also be successful in deceiving a different network.
3.3 Setup of the Defenses
According to Assumption 4 in Section
1, an opponent can use countermeasures to reduce the effect of the obfuscation. In our experimental setup we have considered the defenses described in Section
2.3, i.e., the adversarial training and two different denoising networks. Similarly to the setup of the attacks, we have analyzed two different scenarios: white box and black box.
To evaluate the effectiveness of obfuscation against these defense methods, we have prepared the worst-case defense scenario that a user may face under the hypothesis that (1) the opponent is using the CNN that achieved the best average accuracy over all the three tasks, and we use MobileNetV3 since it obtains the best performance, as clarified in Section
4.1, and (2) the opponent can select and use the most transferable adversarial attacks to generate the examples, i.e., PGD and FGSM, according to the results discussed in Section
4.2.
In the case of adversarial training, we did not need to train MobileNetV3 from scratch. Instead, we employed a fine-tuning process with the objective of enhancing its robustness in the presence of obfuscated images. For this purpose, we created three training sets by randomly extracting 750,000 samples from the original training set and generating adversarial examples using FGSM and PGD attacks against MobileNetV3. This process resulted in a set of 2.2 million images for each task, including a balance of clean, FGSM, and PGD samples, amounting to a total of 6.75 million images.
The training process was conducted following a similar procedure as described in Section
3.1. SGD was used as the optimizer, with an initial learning rate set to 0.001. A learning rate decay factor of 0.5 was applied every five epochs to adjust the learning rate during the training process.
We realized the denoising autoencoder from scratch. The architecture of the autoencoder includes three convolutional layers each for both the encoding and decoding stages, with a fully connected layer of 100 neurons generating the latent vector. In total, the denoising autoencoder comprises 41.95 million parameters. The input size for these autoencoders matches that of the CNNs, namely 224 x 224 pixels. As for the KL autoencoder, it shares the same architecture as the denoising autoencoders; the difference lies in the loss function, which is based on the KL divergence.
Using a process similar to the adversarial training, we have prepared a set of 1.5 million images composed of both clean and obfuscated samples to train the autoencoders.
For the denoising autoencoder, the training procedure employed the Adam optimizer with an initial learning rate of 0.001. If the validation loss did not improve for three consecutive epochs, the learning rate was reduced by a factor of 0.2. The batch size used during training has been set to 64 samples. The loss function used has been the mean squared error, calculated between the reconstructed images and the target images. Regarding the KL autoencoder, the training procedure also has utilized the Adam optimizer with an initial learning rate of 0.001. However, the learning rate has been decreased by a decay factor of 0.1 every 20 epochs.
4 Results
In this section we present the results of the experiments, organized to discuss the following aspects: (1) effectiveness of white box obfuscation, (2) effectiveness of black box obfuscation through the transferability analysis, (3) effectiveness in case of countermeasures, and (4) quality and time required to generate the obfuscated images.
For the sake of clearness, in all the tables, the base accuracy of the CNNs over each task is reported in terms of
classification accuracy(CA), as defined in Equation (
8). In the case of ethnicity recognition and age group classification, where the network provides probabilities for multiple classes, we have considered the class with the highest probability value as the predicted one.
On the other hand, the effectiveness of the obfuscation techniques has been assessed in terms of the drop of accuracy, which is calculated as the difference between the CA when predicting adversarial examples and the CA obtained on the original (clean) images.
Finally, we have computed the average time to generate an obfuscated image and the average noise in terms of
\(L_{\infty }\) and
\(L_2\) norms, in order to evaluate the suitability of the attacks according to Assumptions 2 and 5 in Section
1.
4.1 Effectiveness of White Box Obfuscation
In Table
2 we have reported the classification accuracy of each CNN on a specific task and the drop caused by the random noise and the adversarial attacks. First, it is important to note that in all the tables the adversarial attacks have been arranged in a bottom-to-top order, according to the expected effectiveness of the attack with respect to the quantity of noise added to the obfuscated image. Hence, FGSM and PGD typically generate images with a higher noise intensity, with respect to DeepFool and C&W, to achieve effective samples.
Regarding the tasks, the age group classification is the most challenging one, not only due to the larger number of classes but also because of the inherent complexity of the task itself [
10], which can be difficult even for humans.
However, this also implies that obfuscating the age is relatively simpler compared to gender recognition, which involves a binary classification with considerably less variability, as shown in Figure
5. This is evident from Table
2, in which the classification accuracy of all the CNNs on the age group estimation is about
\(30\%\) lower than for the other tasks (on average,
\(60.07\%\) of the age group classification vs.
\(97.45\%\) and
\(93.48\%\) for gender and ethnicity recognition, respectively), and in most of the cases, the drop in accuracy makes the output of the network completely unreliable. Think, just as an example, about the PGD obfuscation (
\(-54.55\%\) ), meaning an accuracy for the age group classification of only
\(5.52\%\) . Similar results can be achieved on the other tasks using specific attacks: for instance, C&W and PGD make untrustworthy the gender and ethnicity provided by all of the CNNs (see Figure
5(a)).
Focusing on the impact of the AWGN, the results reveal that all the CNNs demonstrate sufficient robustness against such a perturbation, despite not being explicitly trained to handle it. Among the considered CNNs, MobileNetV3 shows the higher sensitivity to AWGN, resulting in an average drop of
\(8.5\%\) in accuracy across all tasks. Although this drop in accuracy for MobileNetV3 may seem notable, it is important to consider that the values reported in Table
2 have been obtained by introducing a substantial amount of noise. The average
\(L_2\) norm value of the noise used was 14.517, which is two orders of magnitude higher than what has been required for the adversarial attacks. This suggests that the CNNs maintain a good level of accuracy even in presence of such high level of random noise.
Contrary to what we pointed out for AWGN, the results in Table
2 demonstrate that all the considered attacks are effective in fooling the prediction of neural networks that exhibit state-of-the-art performance. Indeed, even the least effective approach, DeepFool, managed to achieve an average drop in accuracy of at least
\(23\%\) across all tasks. Remarkably, despite the results obtained by advanced methods such as C&W for gender and ethnicity recognition, we can note the effectiveness of simpler approaches such as FGSM and PGD. In particular, PGD caused an average drop in accuracy of
\(58.77\%\) over all the tasks.
4.2 Effectiveness of Black Box Obfuscation: Transferability Analysis
Although the results achieved in the white box scenario are remarkable, they can be considered as the best case, since the user cannot be fully aware of the CNN employed by an opponent. The transferability analysis provides a measure of the generalization capability of an attack with respect to the target network. To perform this analysis, we generated obfuscated images targeted to fool a specific CNN and evaluate the effect of the same attack on the other CNNs. In Table
3 we have shown the results of the transferability experiments.
The more independent the attack is with respect to the target network, the more general it is expected to be. Indeed, as gradient-based approaches, such as FGSM and PGD, do not strongly depend on the attacked CNNs, they are able to achieve a higher transferability if compared to DeepFool and C&W, which are designed to generate noise patterns more effectively and less perceivably, but more specialized on the target network [
12,
14]. Our analysis confirms this observation, as PGD attained the best performance across the three tasks, resulting in an average drop of approximately
\(16.09\%\) , followed by FGSM with an average drop of
\(13.03\%\) .
A notable outcome is the fact that the attacks have proved to be transferable over all the CNNs, despite their distinct architectures, as shown in Figure
5(b). This means that the effectiveness of the obfuscated images is not limited to specific network architectures and instead demonstrated a general capability to mislead multiple CNNs.
Finally, it is worth noting that even without knowing the network used by an opponent to extract the age, using a gradient-based method, it is possible to make the output unreliable; in fact, by using PGD, a user can cause an average loss of accuracy of \(30.89\%\) . On the other hand, as for the white box scenario, gender is the hardest soft-biometric feature to obfuscate.
4.3 Effectiveness of Adversarial Defenses
As introduced in Section
3.3, we have taken into account two different scenarios: white box and black box. The results of these experiments are reported in Table
4 and Table
5, respectively.
The first relevant result is that the enhancement of robustness against obfuscation often comes at the cost of a loss in accuracy on the clean samples. This drawback is observed across all the considered defenses, as shown in Figures
6 and
7. By observing results of the adversarial training in Table
4, a drop in accuracy ranging from
\(1.54\%\) to
\(9.34\%\) compared to the original network is evident. Similarly, when considering the system with a denoising stage, both the autoencoders lead to a decrease in accuracy, with the denoising autoencoder causing a drop ranging from
\(3.82\%\) to
\(12.20\%\) , and the KL autoencoder resulting in a decrease of
\(2.71\%\) to
\(10.92\%\) . Furthermore, the higher the complexity of the task is, the lower the effectiveness of such countermeasures is; this is evident by comparing the drop in the gender recognition against the one in the age group classification.
In the white box scenario, all the approaches are able to partially prevent the loss of accuracy caused by the obfuscation (see Figure
6). Among them, the adversarial training is the most effective countermeasure, particularly for gender and ethnicity recognition tasks; in the cases of FGSM and PGD, it also leads to a slight improvement in the classification accuracy. This is an expected outcome since the network has been retrained to properly recognize obfuscated images, albeit at the cost of some loss of accuracy on clean images. On the other hand, for the same reason, this defense is less effective in the black box scenario, where the obfuscated images have been generated using CNNs different from MobileNetV3, namely SENet and VGG16. Differently from the adversarial training, in the black box scenario (see Figure
7), the denoising autoencoder and the KL autoencoder maintain their performance, demonstrating to be capable to generalize with respect to a specific neural network used to generate the adversarial samples.
Finally, when comparing the results of the black box scenario in Table
3 and Table
5, with and without the defenses, respectively, it becomes undeniable that the benefits provided by the defenses do not entirely compensate for the loss of accuracy on clean images. As a result, the user can definitely exploit the adversarial attacks even when the opponent employs countermeasures.
4.4 Obfuscated Image Quality and Obfuscation Time
In the end, it is worth discussing some additional results regarding the average perturbation and the time required to generate an obfuscated image, which demonstrate the effectiveness and the suitability of the proposed solution for the problem at hand.
In more detail, Table
6 shows the average perturbation required by each attack to achieve the results discussed in the previous sections. It is important to note that all of them are quite distant from the noise constraints introduced in Section
3.2; in the worst case the
\(L_2\) and
\(L_{\infty }\) norms are lower than the thresholds,
\(10.33\%\) and
\(22.73\%\) , respectively. Furthermore, the quantity of noise added to the image by FGSM and PGD is consistent with that added by DeepFool and C&W. The results demonstrate that the proposed solution meets the maximum perturbation requirements and achieves the desired obfuscation results while maintaining higher-than-expected image quality.
Regarding the average time to generate an obfuscated image, reported in Table
7, the worst case is C&W, which requires 3.93 seconds. Even if this time is acceptable for a user, a noteworthy result is that FGSM and PGD, the most general attacks, require less than a second to generate very effective obfuscated examples; also, DeepFool requires only 0.21 seconds for the generation of obfuscated images. We can conclude that the proposed approach allows a user to effectively obfuscate the soft biometrics in a very short time.