research-article

Open access

Facial Soft-biometrics Obfuscation through Adversarial Attacks

Authors:

Mario VentoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 11

Article No.: 331, Pages 1 - 21

https://doi.org/10.1145/3656474

Published: 12 September 2024 Publication History

PDF eReader

Abstract

Sharing facial pictures through online services, especially on social networks, has become a common habit for thousands of users. This practice hides a possible threat to privacy: the owners of such services, as well as malicious users, could automatically extract information from faces using modern and effective neural networks. In this article, we propose the harmless use of adversarial attacks, i.e., variations of images that are almost imperceptible to the human eye and that are typically generated with the malicious purpose to mislead Convolutional Neural Networks (CNNs). Such attacks have been instead adopted to (1) obfuscate soft biometrics (gender, age, ethnicity) but (2) without degrading the quality of the face images posted online. We achieve the above-mentioned two conflicting goals by modifying the implementations of four of the most popular adversarial attacks, namely FGSM, PGD, DeepFool, and C&W, in order to constrain the average amount of noise they generate on the image and the maximum perturbation they add on the single pixel. We demonstrate, in an experimental framework including three popular CNNs, namely VGG16, SENet, and MobileNetV3, that the considered obfuscation method, which requires at most 4 seconds for each image, is effective not only when we have a complete knowledge of the neural network that extracts the soft biometrics (white box attacks) but also when the adversarial attacks are generated in a more realistic black box scenario. Finally, we prove that an opponent can implement defense techniques to partially reduce the effect of the obfuscation, but substantially paying in terms of accuracy over clean images; this result, confirmed by the experiments carried out with three popular defense methods, namely adversarial training, denoising autoencoder, and Kullback-Leibler autoencoder, shows that it is not convenient for the opponent to defend himself and that the proposed approach is robust to defenses.

1 Introduction

Every day, billions of photos are captured by our smartphones, containing various subjects such as locations, family members, and other individuals who are inadvertently included. A significant portion of these pictures are promptly shared on social media platforms [6, 42]. In fact, in 2022 there were about 4.62 billion active social media users [36], and approximately 95 million photos and videos are estimated to be shared on Instagram each day. This habit has become so widespread that all smartphones come pre-installed with various social media applications and users spend \(43\%\) of their overall phone usage time on such applications [36].

Online services behind these applications, as explicitly written in their privacy policy statements, are allowed by the users to process the uploaded contents and automatically extract metadata with the purpose of improving the service itself. In particular, soft biometrics like gender, age, and ethnicity, together with people identities and places, are among the most common information extracted from pictures [4, 11, 33, 48, 52, 56]. Although most of the social media platform users accept privacy statements and may deliberately share sensitive information with the platform [48], for instance, while filling out registration forms, about one-third of the users are concerned about protecting their privacy and want to avoid potential misuse of personal information [36]. As discussed in [5, 16, 22, 23, 41], user concerns are justified by the fact that the platforms themselves may pose potential threats by extracting unauthorized metadata from individuals, other than the user, who are in the picture or by engaging in targeted advertising, and trying to manipulate the user’s behavior. Additionally, malicious users may also be interested in inferring sensitive information, such as soft biometrics, from shared contents, for personalized advertising or social engineering attacks [21], as an example.

Therefore, it is important for a user to have a convenient and effective way of concealing soft biometrics information in pictures while maintaining a reasonable level of quality. In this way, other humans can still perceive the picture as authentic and be able to clearly identify faces, but automatic systems can be fooled. Hereinafter, we will use the term opponent to encompass both automated processes and people who, deceptively and without the user’s knowledge, attempt to extract soft-biometrics data from the content shared by the latter. Furthermore, we will define obfuscation as the deliberate process of altering elements within a picture to hide sensitive information from opponents, and we will refer to the modified images as obfuscated images.

Of course, since faces are the most relevant and richly detailed element of a picture [4], and they are used by both humans and machines [10, 17, 18, 52, 53, 60] to extract information about people, it is reasonable to assume that it would be acceptable for a user to focus only on them instead of altering the picture as a whole. This is also confirmed by the interest of researchers who in the last years have been mostly focused on methods working on faces. Among these, common approaches are based on face de-identification [57] that aims to hide identification information by modifying or replacing the face of a person [27] on a picture. Scrambled images techniques have been proposed in [35]; similarly to de-identification, they are used to hide people’s identity by covering their faces, but with patches. Very recently, thanks to the success of deep learning methods in computer vision, new approaches for face de-identification are based on Generative Adversarial Networks (GANs) [43, 47, 66, 71]. These latter are very effective and, while the person in the resulting picture is not identifiable, the facial features are still recognizable, to the point that it is possible to extract coherent biometric features and that a human does not perceive the face as a fake. Nevertheless, the drawback of all the de-identification approaches, making them unfit for our purpose, is that the face is considerably different from the original one, so that the users themselves would not be able to recognize their faces while looking at the obfuscated image.

In [40, 44, 45, 63, 68] a new trend has emerged: the idea is to avoid altering the faces in a perceptible way and to allow managing the tradeoff between the effectiveness of the obfuscation and the quality of the output image; the approach is based on adversarial machine learning methods, also known as adversarial attacks. The basic idea is to exploit the intrinsic weaknesses of convolutional neural networks (CNNs), which are generally used to realize modern image processing systems, by generating properly corrupted images, called adversarial examples, that induce the CNN to output wrong predictions. Indeed, as discussed in [8, 24, 67], the impressive accuracy achieved by CNNs on several computer vision tasks is not accompanied by an equally remarkable robustness w.r.t. a family of image corruptions, named adversarial patterns, that are generated to purposefully mislead neural networks. The reason these methods fit for the purpose at hand lies in the fact that the adversarial examples are generated to bound the noise to a level that it is not perceived by a human, but that can still induce an error in the neural network [7, 24]. It is worth noting that, after the appearance of adversarial attacks, some defenses also have been proposed to make the CNN more robust against the adversarial examples [12, 38, 46, 58, 62]; thus, we can assume that such defenses may be adopted by the opponent to prevent the effects of the obfuscation.

This article presents a comprehensive analysis aimed at assessing the effectiveness of well-known state-of-the-art adversarial machine learning techniques as privacy tools for users. The purpose is to investigate whether these techniques can be exploited by the users to obfuscate faces in pictures shared on social media platforms, with the intention of hiding the gender, ethnicity, and age of individuals in the images.

The proposed analysis is based on the following assumptions:

(1)

The users want to obfuscate soft biometrics extracted from faces by an opponent through CNNs, while keeping their faces clearly recognizable by humans.

(2)

The amount of noise added to the image must not affect the quality of the image perceived by a human.

(3)

The users are not aware of the specific CNN model used by an opponent to extract biometric features, but they can use well-known pre-trained neural networks commonly used for face analysis.

(4)

The opponent may use defense strategies based on adversarial training or denoising stages for preprocessing.

(5)

The generation of obfuscated images must be performed in a time that is reasonable for the user (e.g., less than 5 seconds).

Despite recent works having faced similar problems, focusing the analysis on text contents [1], face recognition [15, 53, 63], or social graphs [41], to the best of our knowledge, this is the most extensive analysis on the application of adversarial machine learning methods to prevent the extraction of soft biometrics from people’s faces in shared contents. We conducted the analysis by comparing four state-of-the-art adversarial attacks on large standard face datasets. According to the purpose of the proposed analysis, standard defenses also have been considered, such as adversarial training and denoising autoencoders. These approaches require having samples of obfuscated images, possibly generated using the same adversarial attacks and CNNs, but it is reasonable to expect that, as the networks and the attacks are available for a user, they can be also exploited by an opponent to implement defense strategies. Consequently, it becomes crucial to assess the effectiveness of obfuscation techniques despite the presence of such defenses.

The remainder of the article is structured as follows: in Section 2 we describe the methodology adopted to conduct the analysis by detailing CNNs, datasets, attacks, and defenses; in Section 3 we give details about our experimental framework, explaining the assumptions and the design choices; in Section 4 we describe the experiments and discuss the results; and finally, in Section 5 we provide the final outcomes and conclusions.

2 Methodology

In this section, we present a comprehensive overview of the tasks, the methods, and the datasets considered in the proposed analysis. The section is organized as follows: in Section 2.1 we provide a formal definition of each task, along with details of the reference datasets and CNNs selected from the state of the art to extract the soft-biometric features from facial images; in Section 2.2 we formulate the problem of generating adversarial samples and elaborate on the specific attacks employed in our experiments; and finally, in Section 2.3 we delve into the defense approaches.

2.1 Extracting Soft Biometrics from Faces

Over the years, several formulations and methods have been proposed to extract soft biometrics from facial images [10, 17, 25, 26]. Specifically, with respect to the facial features considered in this article, i.e., gender, age, and ethnicity, the problem of extracting each of them is commonly formalized as follows: (1) the recognition of the gender is a binary classification between male and female [3, 20]; (2) the estimation of the age [10] is a classification among age groups, whose typical subsets are 0 to 2, 4 to 6, 8 to 13, 15 to 20, 25 to 32, 38 to 43, 48 to 53, and over 60; (3) the recognition of the ethnicity [25] is a classification among Caucasian Latin, African American, East Asian, and Asian Indian categories. Hereinafter, we will refer to these definitions while talking about the tasks at hand.

In the recent literature it is possible to identify different state-of-the-art datasets for face analysis, such as Adience [39], FERET [49], Gender-FERET [2, 3], and VGGFace2 [9]. In case of VGGFace2, there are also extensions of the original label for ethnicity recognition and age estimation, i.e., VGGFace2 MIVIA Ethnicity Recognition (VMER) [25] and VGGFace2 MIVIA Age (VMAGE) [26]. Among the publicly available datasets, we have selected those most representative for the purposes of our analysis:

(1)

VGGFace2 [9], a large-scale dataset released and maintained by the Visual Geometry Group of the University of Oxford. It consists of over 3 million images from more than 9,000 individuals collected in the wild, making it one of the largest and most representative face recognition datasets publicly available. VGGFace2 has been designed to include a broad spectrum of variability within the population. It includes individuals from diverse ethnicities, age groups, and genders, as well as people captured in different poses. On average, the dataset contains 362 samples for each person, providing a comprehensive representation of individuals and their facial characteristics.

(2)

VMER [25] and VMAGE [26], extensions of the original VGGFace2 dataset released by the MIVIA Lab of the University of Salerno. They preserve the same characteristics and distributions of individuals in VGGFace2. For the sake of clearness, VMER adds, to each individual, four ethnicity categories to the labels in VGGFace2 (African American, East Asian, Caucasian Latin, and Asian Indian), while VMAGE adds the estimated age value for each image in the dataset.

(3)

Adience [39], a dataset specifically designed for age and gender classification collected in real-world conditions, which encompasses a wide range of variations in appearance, pose, lighting, and image quality. The dataset comprises a total of 26,580 face images captured from different yaw angles, with 13,649 images captured from almost frontal angles. The faces in the dataset are categorized into eight non-balanced age groups: 0 to 2, 4 to 6, 8 to 13, 15 to 20, 25 to 32, 38 to 43, 48 to 53, and 60 years or older.

The three state-of-the-art CNNs that obtained, in recent years, remarkable performance in face analysis tasks [10, 25, 26] are VGG-16 [64], SENet [31], and MobileNetV3 [29]; we have chosen them for their effectiveness and because all of them are publicly available pre-trained on VGGFace2 or ImageNet [19, 73], so it is not necessary to train them from scratch.

For the sake of clearness we provide few details about the CNNs. VGG-16 [64], proposed by the Visual Geometry Group of University of Oxford, is widely used for face analysis tasks, even if it has a very simple architecture with a fixed input size of 224 x 224 pixels that is passed through a sequence of convolutional layers followed by three fully connected layers and a softmax. SENet [31] is a variant of the well-known ResNet [28], with the addition of the squeeze and excitation blocks; this is an attention mechanism, in which for each convolution the feature maps are passed through a squeeze operation that aggregates the features across their spatial dimension, followed by an excitation operation that takes the embedding and provides a set of weights to be applied to the initial feature maps to generate the output of the block. MobileNet was first proposed by Google in [30] as an efficient CNN for mobile devices, thanks to the introduction of the concept depth-wise separable convolution that has been demonstrated to be very effective in reducing the model size and complexity. In MobileNetV2 [61], the architecture has been extended with the addition of the inverted residual blocks, a more efficient variant of the residual blocks used in ResNet [28]. MobileNetV3 is an evolution of MobileNetV2 where some inefficient blocks have been improved through the use of squeeze and excitation layers and swish non-linearity [55].

2.2 Generating Adversarial Examples

Generating an adversarial example is basically an optimization problem, where the objective function aims at minimizing the distance between the image and its altered version, such that the two are perceived as identical, but the label assigned by the classifier to the altered image is different from that of the original one. In the case of untargeted attacks, it is sufficient that the classifier assigns any incorrect label to the altered image, while in targeted attacks, a specific incorrect label is desired. In our analysis, we consider only untargeted attacks since we just need that the neural network used by the opponent provides wrong labels.

More formally, given an image x and its version \(x^{\prime }=x+\mu\) obfuscated through the addition of the noise pattern image \(\mu\) , the aim is to find the value of \(\mu\) that minimizes \(D(x,x^{\prime })\) (where D is a given distance function) with the constraints that \(F(x^{\prime }) \ne F(x)\) , where F is the classifier prediction function, and \(x^{\prime }\) is in \([0,1]^n\) to guarantee that it is still a valid image (see Equation (1)):

\begin{equation} \begin{aligned}\min \quad D(x,x^{\prime }) \\ \text {s.t.} \quad F(x^{\prime }) \ne F(x) \\ x^{\prime } \in [0,1]^n. \end{aligned} \end{equation}

(1)

Unfortunately, since the first constraint involves the classification function F, the problem is hard to face because of the non-linearities in neural networks. The differences among the attacks in the state of the art are mainly on how they define the distance among the images and how they deal with the non-linearity.

In [24] the authors proposed the Fast Gradient Sign Method (FGSM), a method designed to be fast at the expense of optimality. An adversarial example is computed as shown in the following equation:

\begin{equation} x^{\prime } = x + \epsilon \cdot sign(\nabla _{x} Loss_{F,t}(x)), \end{equation}

(2)

where t is the label of x and \(\epsilon\) regulates the intensity of the additive noise. The aim of FGSM is to use the gradient of the loss function \(\nabla Loss_{F,t}\) to get the direction in which the pixel intensities should change so as to maximize the loss. In this case, the distance function is the \(L_{ \infty }\) norm since the method aims at constraining the maximum value of the noise for each single pixel.

Successively, Kurakin et al. [37] proposed an iterative version of the FGSM, namely IFGSM, where the difference from the original version is that the optimization step is performed multiple times with a small step size as follows:

\begin{equation} \begin{aligned}x^{\prime }_0 = x \\ x^{\prime }_{N+1} = Clip_{x,\epsilon }\lbrace x^{\prime }_N + \alpha \cdot sign(\nabla _{x} Loss_{F,t}(x))\rbrace . \end{aligned} \end{equation}

(3)

The function \(Clip_{x,\epsilon }\lbrace x^{\prime }\rbrace\) , defined in Equation (4), performs a per-pixel clipping, limiting the value of each pixel of the adversarial example \(x^{\prime }\) so that the latter is in the \(\epsilon\) neighborhood of the original image x:

\begin{equation} \begin{aligned}Clip_{x,\epsilon }\lbrace x^{\prime }\rbrace (k,m,i) = \min \lbrace 255, x(k,m,i) + \epsilon , \\ \max \lbrace 0, x(k,m,i) - \epsilon , x(k,m,i)^{\prime } \rbrace . \end{aligned} \end{equation}

(4)

The PGD method used in our experiments is similar to IFGSM with the difference that \(x^{\prime }\) is initialized not to x but to a random point in the sphere defined by the \(L_{\infty }\) norm.

DeepFool [51] is a method designed for untargeted attack using the \(L_2\) norm. Differently from the other methods, it assumes that the neural network is completely linear so that the decision regions of each class are separated by hyperplanes. The method first solves a simplified optimization problem, but it does not obtain an adversarial example immediately since it is moving around the real solution. To get the final outcome, the method has to repeat the process and stops the search when a real adversarial example is found. We suggest that interested readers refer to the original paper for the mathematical formulation.

More recently, Carlini and Wagner [14] have proposed C&W, an \(L_2\) norm-based approach together with a slightly different formulation to deal with the problem of the non-linearity of the F function in Equation (1). The authors define an objective function f such that \(F(x^{\prime }) \ne F(x)\) if and only if \(f(x^{\prime }) \lt 0\) ; they also propose a list of possible functions to be used to this purpose. Therefore, the alternate formulation proposed is the following:

\begin{equation} \begin{aligned}\min \quad ||x - x^{\prime }||^2 + c \cdot f(x^{\prime })\\ \text {s.t.} \quad x^{\prime } \in [0,1]^n, \end{aligned} \end{equation}

(5)

where \(c \gt 0\) is a suitable chosen constant. For the sake of clarity, in Figure 1 we show the effect of the aforementioned adversarial attacks.

Fig. 1.

2.3 Defenses against Adversarial Attacks

The defense of a neural network against adversarial attacks is an open challenging problem and is the main purpose of the research on adversarial machine learning methods. As for many problems of cybersecurity, there does not exist a definitive defense capable of preventing every possible threat; therefore, several approaches have been proposed in the last years to prevent the most common adversarial attacks [12]. In our analysis we have not focused on all the possible defense methods, since it would have been out of the scope of this article, but we have selected the most general and widely used approaches to make neural networks more robust against adversarial noise patterns.

In particular, adversarial training [24, 32, 54] aims at making a neural network more robust by adding adversarial examples, generated by the attacks whose effects have to be mitigated, to the training set. The procedure is formulated as a min-max game [24] (see Equation (6)):

\begin{equation} \min _{\theta } \max _{D(x,x^{\prime })\lt \eta }{J(\theta ,x^{\prime },y).} \end{equation}

(6)

On one hand, the inner maximization has the purpose to find the most effective adversarial examples according to the attack loss J, the network weights \(\Theta\) , and the distance metric D; on the other hand, the minimization represents the standard training procedure to fine-tune the model to lower the value of the neural network loss J. Of course, this training process do not ensure the immunity of the neural network against unseen adversarial examples; the process can be repeated multiple times, considering either the same or different attacks. The process may significantly affect the accuracy on the original dataset; therefore, while using the adversarial training, it is important to supervise the tradeoff between robustness and accuracy [12, 13].

Adversarial examples are noisy inputs and it is natural to deal with them as a such; to this purpose autoencoders [69] have been demonstrated to be particularly useful as denoising methods against adversarial noise in different recent papers [50, 59, 65, 70, 72]. They are self-supervised neural networks designed to learn an effective representation of the input data over an embedding space named the latent space. The representation is obtained by coupling two neural networks: the encoder that projects the input data into the latent space and the decoder that maps the latent space back to the original input space. This network is trained by minimizing a reconstruction error that measures the distance between input and reconstructed samples. If the latent space and the input space both have at least the same dimension, then the autoencoder would learn an approximation of the identity function; if the latent space is smaller, then the autoencoder learns a compressed representation of the input samples. Therefore, it is possible to train autoencoders using both the original samples and their noisy versions to learn a latent representation that reduces the reconstruction error only with respect to the clean samples, thus making the network able to reconstruct the obfuscated samples as close as possible to the corresponding clean ones. To this aim, the mean squared error (MSE) between the reference clean sample y and the reconstructed one \(\hat{y}\) is commonly used as loss function.

\begin{equation} D_{KL}(P||Q) = \sum _{x \in X}P(x) \log {\frac{P(x)}{Q(x)}} \end{equation}

(7)

In [70] the authors have proposed a different approach to realize a denoising autoencoder; instead of minimizing the reconstruction error, the autoencoder tries to make the probability distribution of the output of the CNN as similar as possible when the CNN is fed with the original images and with the reconstructed ones. To this aim, for the loss function the approach uses the Kullback–Leibler (KL) divergence \(D_{KL}\) (see Equation (7)), a measure of dissimilarity between two probability distributions, \(P(x)\) and \(Q(x)\) . The big advantage of this method is that it does not need adversarial samples during the training, so it is independent from the adversarial method used, but it requires the target model in the training loop as shown in Figures 2 and 3. In Figure 4, we show the difference between a face reconstructed by a traditional denoising autoencoder and a KL autoencoder.

Fig. 2.

Fig. 3.

Fig. 4.

3 Experimental Framework

In this section we discuss in more detail the experimental setup and how we have prepared the networks, the obfuscation methods, and the defenses to conduct the analysis. All the experiments reported in the following sections have been performed on a workstation equipped with an Intel i7-3770S, 32 Gb of RAM, and an NVIDIA TITAN Xp with 12 Gb of RAM; the software platform is Ubuntu 18.04.6 LTS with Tensorflow 1.15.2, Keras 2.3.1, and CUDA 10.1.

3.1 CNNs for Facial Soft Biometrics Recognition

All the CNNs for the recognition of facial soft biometrics have been trained on VGGFace2 using the original labels and those provided by the extensions VMER and VMAGE for ethnicity and age, respectively. The overall training set is thus composed of 8.631 identities and more than 3.1 million images. The base accuracy values have been computed on the test set of VGGFace2, containing 500 identities and around 170,000 images, for gender and ethnicity recognition tasks, and on the whole Adience dataset, including 26,580 face images, for the age group classification task.

Some of the CNNs used in the analysis have been found already pre-trained on the mentioned training set. In particular, the authors of [26] have publicly shared the pre-trained weights of the three CNN models for age group classification. Likewise, in [10, 25], the authors have published pre-trained weights for VGG-16 and SENet, respectively, specifically for gender recognition tasks. These pre-trained networks are very suitable for the purpose of our analysis; indeed, they have demonstrated state-of-the-art accuracy and have been validated to exhibit robustness against common corruptions typically encountered in real-world scenarios. Therefore, to conduct our experiments, we had to train the neural networks for ethnicity recognition and MobileNetV3 for gender recognition by following procedures similar to those adopted for the other pre-trained models. We exploited a Single Shot Detector (SSD) based on the CNN ResNet-10 to obtain the crop of the single face that is present in each of the VGGFace2 images. As the crop can have a rectangular shape but the CNNs expect a \(224\times 224\) pixels input, we applied padding to ensure that the face is consistently centered within the box. Additionally, we aimed for the face to occupy an average of \(80\%\) of the input image, as suggested in [34]. It is worth pointing out that considering images with a single person represents the worst case for the user, since this eliminates for the opponent the chance to miss the face and it also allows us to neglect the error of the detector in our analysis.

Following the training procedure adopted for the other CNNs, data augmentation techniques have been used to enhance the robustness of the CNNs against common corruptions. The variations applied have been the following:

(1)

Random rotation: The angle has been sampled from the range \([-10^{\circ },10^{\circ }]\) .

(2)

Random change of bounding box or shift: This variation aims at simulating errors related to the face detector; the effect is similar to a random crop, causing the face to not be perfectly centered in the box.

(3)

Random brightness: To simulate overexposure and underexposure, we have randomly changed the brightness of the original image in the range \([-30\%, 30\%]\) of the pixel intensity.

(4)

Random horizontal flip.

For both the random variations (1) and (2) we have considered a zero mean normal distribution with standard deviations equal to \(10^{\circ }\) and \(2,5\%\) of the bounding box width, respectively. Furthermore, during the augmentation, we have used a pseudo-random procedure to apply two or more variations together to make the process more effective.

After the data augmentation, the resulting images have been normalized by subtracting the average value of each color channel computed over all the images in the dataset. This normalization has the effect of zero-centering every channel and has been demonstrated to improve the convergence of the loss function [64].

Finally, the networks have been trained through Stochastic Gradient Descent (SGD) with a batch size equal to 128 for MobileNetV3 and 32 for VGG-16 and SENet. The training process started with a learning rate set to 0.005 with a decay factor of 0.2 every 20 epochs, to gradually adjust the learning rate. A weight decay of 0.05 has been used to prevent overfitting. Since all the tasks of interest are formulated as multi-class classification problems, we have used a categorical cross-entropy loss function.

3.2 Setup of the Attacks to Obfuscate the Soft Biometrics

Assumption 2 in Section 1 forced a tradeoff between the effectiveness of the attacks and the amount of noise added on the image. Therefore, we do not expect to achieve the best result in terms of attack success rate. For the same reason, to avoid a degradation of the quality of the obfuscated image, we have empirically estimated the maximum amount of noise that an adversarial attack can add during the obfuscation and limited the effect of the latter by constraining both the norms \(L_{\infty }\) and \(L_2\) with the values of 15 and 900, respectively. It is worth noting that the two constraints affect the noise on the output image in different ways: the norm \(L_{\infty }\) limits the maximum perturbation on each single pixel of the altered image, while the norm \(L_2\) limits the maximum noise over the whole obfuscated image. These norms are used by the attacks, during the optimization process, to measure the distance between the original and the obfuscated samples. Therefore, on the one hand, they affect the effectiveness of the attack, and on the other hand, they impact the quantity of noise added to the image.

To estimate the limits, samples have been prepared with different intensities of noise for each type of adversarial attack. These samples have been then evaluated by five people to determine the maximum level of noise that could be added before it became noticeable to most of them.

For each attack, the adversarial examples have been generated by obfuscating, respecting the constraints, the same faces used to assess the accuracy of the CNNs, using the following parameters:

—

FGSM with an \(\epsilon\) value equal to 0.01.

—

PGD has been iterated for a maximum of 40 steps with an \(\alpha\) and step size equal to 0.01 and 0.005, respectively, in the case of ethnicity and gender recognition, and to 0.007 and 0.007 for the age estimation task.

—

DeepFool with an overshoot of the boundary of 0.02 and 50 as iteration limit.

—

\(C\&W\) with learning rate of 0.02 and maximum 50 iterations.

In addition, we have taken into account the effect of the Additive White Gaussian Noise (AWGN) with unitary variance, applied to each color channel. Although it is not an adversarial attack, the accuracy of the CNN in the presence of random perturbations can be considered as a reference about its initial robustness; indeed, we expect that the CNNs used by an opponent are quite insensitive to slight AWGN, which is ascribable to low-quality sensors or other external factors, because it is usually applied to the training data as a data augmentation technique.

For the sake of clearness, in our experiments we have analyzed two distinct scenarios:

(1)

White box scenario: The user has generated the obfuscated images using the same network adopted by the opponent. While this scenario may not be realistic in practice, it serves as a baseline to evaluate the effectiveness of the obfuscation.

(2)

Black box scenario: The user has created the obfuscated images using a different CNN than the one of the opponent. This scenario is particularly significant as it represents the most realistic situation that a user can encounter in the real world. In this scenario, when obfuscating the images, the user is unaware of the specific network used to extract the soft biometrics. To address this scenario, we conducted a transferability analysis to determine whether an adversarial example crafted to fool a particular CNN could also be successful in deceiving a different network.

3.3 Setup of the Defenses

According to Assumption 4 in Section 1, an opponent can use countermeasures to reduce the effect of the obfuscation. In our experimental setup we have considered the defenses described in Section 2.3, i.e., the adversarial training and two different denoising networks. Similarly to the setup of the attacks, we have analyzed two different scenarios: white box and black box.

To evaluate the effectiveness of obfuscation against these defense methods, we have prepared the worst-case defense scenario that a user may face under the hypothesis that (1) the opponent is using the CNN that achieved the best average accuracy over all the three tasks, and we use MobileNetV3 since it obtains the best performance, as clarified in Section 4.1, and (2) the opponent can select and use the most transferable adversarial attacks to generate the examples, i.e., PGD and FGSM, according to the results discussed in Section 4.2.

In the case of adversarial training, we did not need to train MobileNetV3 from scratch. Instead, we employed a fine-tuning process with the objective of enhancing its robustness in the presence of obfuscated images. For this purpose, we created three training sets by randomly extracting 750,000 samples from the original training set and generating adversarial examples using FGSM and PGD attacks against MobileNetV3. This process resulted in a set of 2.2 million images for each task, including a balance of clean, FGSM, and PGD samples, amounting to a total of 6.75 million images.

The training process was conducted following a similar procedure as described in Section 3.1. SGD was used as the optimizer, with an initial learning rate set to 0.001. A learning rate decay factor of 0.5 was applied every five epochs to adjust the learning rate during the training process.

We realized the denoising autoencoder from scratch. The architecture of the autoencoder includes three convolutional layers each for both the encoding and decoding stages, with a fully connected layer of 100 neurons generating the latent vector. In total, the denoising autoencoder comprises 41.95 million parameters. The input size for these autoencoders matches that of the CNNs, namely 224 x 224 pixels. As for the KL autoencoder, it shares the same architecture as the denoising autoencoders; the difference lies in the loss function, which is based on the KL divergence.

Using a process similar to the adversarial training, we have prepared a set of 1.5 million images composed of both clean and obfuscated samples to train the autoencoders.

For the denoising autoencoder, the training procedure employed the Adam optimizer with an initial learning rate of 0.001. If the validation loss did not improve for three consecutive epochs, the learning rate was reduced by a factor of 0.2. The batch size used during training has been set to 64 samples. The loss function used has been the mean squared error, calculated between the reconstructed images and the target images. Regarding the KL autoencoder, the training procedure also has utilized the Adam optimizer with an initial learning rate of 0.001. However, the learning rate has been decreased by a decay factor of 0.1 every 20 epochs.

4 Results

In this section we present the results of the experiments, organized to discuss the following aspects: (1) effectiveness of white box obfuscation, (2) effectiveness of black box obfuscation through the transferability analysis, (3) effectiveness in case of countermeasures, and (4) quality and time required to generate the obfuscated images.

For the sake of clearness, in all the tables, the base accuracy of the CNNs over each task is reported in terms of classification accuracy(CA), as defined in Equation (8). In the case of ethnicity recognition and age group classification, where the network provides probabilities for multiple classes, we have considered the class with the highest probability value as the predicted one.

\begin{equation} CA = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}} \end{equation}

(8)

On the other hand, the effectiveness of the obfuscation techniques has been assessed in terms of the drop of accuracy, which is calculated as the difference between the CA when predicting adversarial examples and the CA obtained on the original (clean) images.

Finally, we have computed the average time to generate an obfuscated image and the average noise in terms of \(L_{\infty }\) and \(L_2\) norms, in order to evaluate the suitability of the attacks according to Assumptions 2 and 5 in Section 1.

4.1 Effectiveness of White Box Obfuscation

In Table 2 we have reported the classification accuracy of each CNN on a specific task and the drop caused by the random noise and the adversarial attacks. First, it is important to note that in all the tables the adversarial attacks have been arranged in a bottom-to-top order, according to the expected effectiveness of the attack with respect to the quantity of noise added to the obfuscated image. Hence, FGSM and PGD typically generate images with a higher noise intensity, with respect to DeepFool and C&W, to achieve effective samples.

Table 1.

CNN	Weights number	Input size
VGG16	138M	224x224
SENet	25.5M	224x224
MobileNetV3	5.4M	224x224

Table 1. Overview of the Convolutional Neural Networks Considered in the Analysis

Table 2.

Task		VGG16	SENet	MobileNetV3	Average
Gender Recognition	Clean CA	\(97.50\%\)	\(97.30\%\)	\(97.54\%\)	\(97.45\%\)
	AWGN	\(-1.29\%\)	\(-1.49\%\)	\(-7.52\%\)	\(-3.50\%\)
	FGSM	\(-30.53\%\)	\(-49.45\%\)	\(-60.40\%\)	\(-46.80\%\)
	PGD	\(-61.04\%\)	\(-62.11\%\)	\(-44.73\%\)	\(-55.96\%\)
	DeepFool	\(-24.84\%\)	\(-36.23\%\)	\(-9.76\%\)	\(-23.61\%\)
	C& W	\(-59.86\%\)	\(-79.03\%\)	\(-45.46\%\)	\(-61.45\%\)
Ethnicity Recognition	Clean CA	\(92.68\%\)	\(93.65\%\)	\(94.12\%\)	\(93.48\%\)
	AWGN	\(-2.17\%\)	\(-1.96\%\)	\(-8.09\%\)	\(-4.07\%\)
	FGSM	\(-60.20\%\)	\(-48.28\%\)	\(-68.19\%\)	\(-58.89\%\)
	PGD	\(-38.03\%\)	\(-91.17\%\)	\(-68.19\%\)	\(-65.79\%\)
	DeepFool	\(-34.09\%\)	\(-39.78\%\)	\(-22.91\%\)	\(-33.26\%\)
	C& W	\(-81.92\%\)	\(-86.79\%\)	\(-67.14\%\)	\(-78.61\%\)
Age Group Classification	Clean CA	\(58.80\%\)	\(65.59\%\)	\(55.83\%\)	\(60.07\%\)
	AWGN	\(-6.59\%\)	\(-7.01\%\)	\(-10.15\%\)	\(-7.91\%\)
	FGSM	\(-53.82\%\)	\(-49.31\%\)	\(-42.74\%\)	\(-48.79\%\)
	PGD	\(-53.15\%\)	\(-59.22\%\)	\(-51.29\%\)	\(-54.55\%\)
	DeepFool	\(-47.55\%\)	\(-22.63\%\)	\(-15.11\%\)	\(-28.43\%\)
	C& W	\(-30.69\%\)	\(-28.94\%\)	\(-10.70\%\)	\(-23.47\%\)

Table 2. Classification Accuracy Achieved by VGG16, SENet, and MobileNetV3 in Gender Recognition, Ethnicity Recognition, and Age Group Classification

For each task, the first row reports the classification accuracy on the clean images (Clean CA), while the successive rows are the corresponding drops of accuracy caused by each obfuscation approach (AWGN, FGSM, PGD, DeepFool, C& W). In the last column we report the average of the values in each row.

Regarding the tasks, the age group classification is the most challenging one, not only due to the larger number of classes but also because of the inherent complexity of the task itself [10], which can be difficult even for humans.

However, this also implies that obfuscating the age is relatively simpler compared to gender recognition, which involves a binary classification with considerably less variability, as shown in Figure 5. This is evident from Table 2, in which the classification accuracy of all the CNNs on the age group estimation is about \(30\%\) lower than for the other tasks (on average, \(60.07\%\) of the age group classification vs. \(97.45\%\) and \(93.48\%\) for gender and ethnicity recognition, respectively), and in most of the cases, the drop in accuracy makes the output of the network completely unreliable. Think, just as an example, about the PGD obfuscation ( \(-54.55\%\) ), meaning an accuracy for the age group classification of only \(5.52\%\) . Similar results can be achieved on the other tasks using specific attacks: for instance, C&W and PGD make untrustworthy the gender and ethnicity provided by all of the CNNs (see Figure 5(a)).

Fig. 5.

Focusing on the impact of the AWGN, the results reveal that all the CNNs demonstrate sufficient robustness against such a perturbation, despite not being explicitly trained to handle it. Among the considered CNNs, MobileNetV3 shows the higher sensitivity to AWGN, resulting in an average drop of \(8.5\%\) in accuracy across all tasks. Although this drop in accuracy for MobileNetV3 may seem notable, it is important to consider that the values reported in Table 2 have been obtained by introducing a substantial amount of noise. The average \(L_2\) norm value of the noise used was 14.517, which is two orders of magnitude higher than what has been required for the adversarial attacks. This suggests that the CNNs maintain a good level of accuracy even in presence of such high level of random noise.

Contrary to what we pointed out for AWGN, the results in Table 2 demonstrate that all the considered attacks are effective in fooling the prediction of neural networks that exhibit state-of-the-art performance. Indeed, even the least effective approach, DeepFool, managed to achieve an average drop in accuracy of at least \(23\%\) across all tasks. Remarkably, despite the results obtained by advanced methods such as C&W for gender and ethnicity recognition, we can note the effectiveness of simpler approaches such as FGSM and PGD. In particular, PGD caused an average drop in accuracy of \(58.77\%\) over all the tasks.

4.2 Effectiveness of Black Box Obfuscation: Transferability Analysis

Although the results achieved in the white box scenario are remarkable, they can be considered as the best case, since the user cannot be fully aware of the CNN employed by an opponent. The transferability analysis provides a measure of the generalization capability of an attack with respect to the target network. To perform this analysis, we generated obfuscated images targeted to fool a specific CNN and evaluate the effect of the same attack on the other CNNs. In Table 3 we have shown the results of the transferability experiments.

Table 3.

Task		SENet	MobileNetV3	VGG16	MobileNetV3	SENet	VGG16	Average
Task		VGG16		SENet		MobileNetV3		Average
Gender Recognition	Clean CA	\(97.30\%\)	\(97.54\%\)	\(97.50\%\)	\(97.54\%\)	\(97.30\%\)	\(97.50\%\)	\(97.45\%\)
	FGSM	\(-6.11\%\)	\(-7.17\%\)	\(-6.81\%\)	\(-10.24\%\)	\(-3.14\%\)	\(-2.59\%\)	\(-6.01\%\)
	PGD	\(-5.44\%\)	\(-6.72\%\)	\(-5.80\%\)	\(-8.84\%\)	\(-1.93\%\)	\(-1.68\%\)	\(-5.07\%\)
	DeepFool	\(-1.23\%\)	\(-0.54\%\)	\(-1.25\%\)	\(-2.18\%\)	\(-0.44\%\)	\(-0.38\%\)	\(-1.00\%\)
	C& W	\(-2.99\%\)	\(-3.50\%\)	\(-3.53\%\)	\(-5.84\%\)	\(-0.85\%\)	\(-0.79\%\)	\(-2.92\%\)
Ethnicity Recognition	Clean CA	\(93.65\%\)	\(94.12\%\)	\(92.68\%\)	\(94.12\%\)	\(93.65\%\)	\(92.68\%\)	\(93.48\%\)
	FGSM	\(-12.20\%\)	\(-10.65\%\)	\(-6.45\%\)	\(-20.05\%\)	\(-10.07\%\)	\(-2.49\%\)	\(-10.31\%\)
	PGD	\(-15.30\%\)	\(-13.98\%\)	\(-7.34\%\)	\(-24.22\%\)	\(-10.07\%\)	\(-2.49\%\)	\(-12.23\%\)
	DeepFool	\(-3.34\%\)	\(-3.06\%\)	\(-0.96\%\)	\(-3.71\%\)	\(-1.24\%\)	\(-0.45\%\)	\(-2.13\%\)
	C& W	\(-16.33\%\)	\(-13.54\%\)	\(-2.32\%\)	\(-9.39\%\)	\(-3.87\%\)	\(-0.99\%\)	\(-7.74\%\)
Age Group Classification	Clean CA	\(65.59\%\)	\(55.83\%\)	\(58.8\%\)	\(55.83\%\)	\(65.59\%\)	\(58.8\%\)	\(60.07\%\)
	FGSM	\(-33.13\%\)	\(-30.89\%\)	\(-20.36\%\)	\(-19.52\%\)	\(-15.96\%\)	\(-16.16\%\)	\(-22.67\%\)
	PGD	\(-47.38\%\)	\(-45.07\%\)	\(-20.52\%\)	\(-21.19\%\)	\(-23.53\%\)	\(-27.66\%\)	\(-30.89\%\)
	DeepFool	\(-47.55\%\)	\(-11.67\%\)	\(-4.80\%\)	\(-4.54\%\)	\(-1.99\%\)	\(-1.39\%\)	\(-11.99\%\)
	C& W	\(-19.55\%\)	\(-19.13\%\)	\(-9.32\%\)	\(-8.80\%\)	\(-3.80\%\)	\(-4.87\%\)	\(-10.91\%\)

Table 3. Results of the Transferability Analysis

For each attack, the original target network is on the top header (italics), while the secondary header contains the CNNs against which the transferability has been evaluated. For each task, the first row reports the classification accuracy on the clean images (Clean CA), while the successive rows are the corresponding drops of accuracy caused by each obfuscation approach (AWGN, FGSM, PGD, DeepFool, C& W). In the last column we report the average of the values in each row.

The more independent the attack is with respect to the target network, the more general it is expected to be. Indeed, as gradient-based approaches, such as FGSM and PGD, do not strongly depend on the attacked CNNs, they are able to achieve a higher transferability if compared to DeepFool and C&W, which are designed to generate noise patterns more effectively and less perceivably, but more specialized on the target network [12, 14]. Our analysis confirms this observation, as PGD attained the best performance across the three tasks, resulting in an average drop of approximately \(16.09\%\) , followed by FGSM with an average drop of \(13.03\%\) .

A notable outcome is the fact that the attacks have proved to be transferable over all the CNNs, despite their distinct architectures, as shown in Figure 5(b). This means that the effectiveness of the obfuscated images is not limited to specific network architectures and instead demonstrated a general capability to mislead multiple CNNs.

Finally, it is worth noting that even without knowing the network used by an opponent to extract the age, using a gradient-based method, it is possible to make the output unreliable; in fact, by using PGD, a user can cause an average loss of accuracy of \(30.89\%\) . On the other hand, as for the white box scenario, gender is the hardest soft-biometric feature to obfuscate.

4.3 Effectiveness of Adversarial Defenses

As introduced in Section 3.3, we have taken into account two different scenarios: white box and black box. The results of these experiments are reported in Table 4 and Table 5, respectively.

Table 4.

Task		No Defense	Adv Training	Denoising AE	KL AE
Gender Recognition	Clean CA	\(97.54\%\)	\(-1.54\%\)	\(-3.82\%\)	\(-2.71\%\)
	FGSM	\(-60.40\%\)	\(+0.04\%\)	\(-4.00\%\)	\(-3.84\%\)
	PGD	\(-44.73\%\)	\(+0.21\%\)	\(-3.97\%\)	\(-3.48\%\)
	DeepFool	\(-9.76\%\)	\(-0.95\%\)	\(-3.88\%\)	\(-2.96\%\)
	C& W	\(-45.46\%\)	\(-0.56\%\)	\(-3.95\%\)	\(-3.19\%\)
Ethnicity Recognition	Clean CA	\(94.12\%\)	\(-9.34\%\)	\(-7.63\%\)	\(-7.09\%\)
	FGSM	\(-68.19\%\)	\(+0.78\%\)	\(-8.43\%\)	\(-10.04\%\)
	PGD	\(-68.19\%\)	\(+0.86\%\)	\(-8.30\%\)	\(-9.95\%\)
	DeepFool	\(-22.91\%\)	\(-6.08\%\)	\(-8.14\%\)	\(-7.91\%\)
	C& W	\(-67.14\%\)	\(-4.58\%\)	\(-8.46\%\)	\(-8.32\%\)
Age Group Classification	Clean CA	\(55.83\%\)	\(-6.14\%\)	\(-12.20\%\)	\(-10.92\%\)
	FGSM	\(-42.74\%\)	\(-11.10\%\)	\(-12.28\%\)	\(-12.43\%\)
	PGD	\(-51.29\%\)	\(-15.28\%\)	\(-12.34\%\)	\(-14.71\%\)
	DeepFool	\(-15.11\%\)	\(-6.24\%\)	\(-12.25\%\)	\(-11.21\%\)
	C& W	\(-10.70\%\)	\(-6.50\%\)	\(-11.89\%\)	\(-11.55\%\)

Table 4. Classification Accuracy Achieved by the Original MobileNetV3 on Obfuscated Images before (No Defense) and after Denoising (Denoising AE and KL AE) and by the Adversarially Trained (Adv Training) MobileNetV3 on Obfuscated Images in the White Box Scenario (by Using MobileNetV3 as a Target)

With all the defenses, there is a performance drop on both clean and obfuscated images; the drop on gender recognition CA is more limited than on the other tasks.

Table 5.

Task		No Defense	Adv Training	Denoising AE	KL AE
Gender Recognition	Clean CA	\(97.54\%\)	\(-1.54\%\)	\(-3.82\%\)	\(-2.71\%\)
	FGSM	\(-8.71\%\)	\(-4.30\%\)	\(-4.13\%\)	\(-5.50\%\)
	PGD	\(-7.78\%\)	\(-3.98\%\)	\(-4.12\%\)	\(-4.82\%\)
Ethnicity Recognition	Clean CA	\(94.12\%\)	\(-9.34\%\)	\(-7.63\%\)	\(-7.09\%\)
	FGSM	\(-15.35\%\)	\(-25.17\%\)	\(-8.79\%\)	\(-12.83\%\)
	PGD	\(-20.1\%\)	\(-24.29\%\)	\(-8.81\%\)	\(-12.06\%\)
Age Group Classification	Clean CA	\(55.83\%\)	\(-6.14\%\)	\(-12.20\%\)	\(-10.92\%\)
	FGSM	\(-25.21\%\)	\(-12.61\%\)	\(-12.42\%\)	\(-13.94\%\)
	PGD	\(-32.13\%\)	\(-11.03\%\)	\(-12.40\%\)	\(-14.35\%\)

Table 5. Classification Accuracy Achieved by the Original MobileNetV3 on Obfuscated Images before (No Defense) and after Denoising (Denoising AE and KL AE) and by the Adversarially Trained (Adv Training) MobileNetV3 on Obfuscated Images in the Black Box Scenario (by Using VGG16 and SENet as a Target)

The results are comparable with the ones obtained for the white box scenario except for adversarial training, which suffers more the unawareness of the attack adopted by the user.

The first relevant result is that the enhancement of robustness against obfuscation often comes at the cost of a loss in accuracy on the clean samples. This drawback is observed across all the considered defenses, as shown in Figures 6 and 7. By observing results of the adversarial training in Table 4, a drop in accuracy ranging from \(1.54\%\) to \(9.34\%\) compared to the original network is evident. Similarly, when considering the system with a denoising stage, both the autoencoders lead to a decrease in accuracy, with the denoising autoencoder causing a drop ranging from \(3.82\%\) to \(12.20\%\) , and the KL autoencoder resulting in a decrease of \(2.71\%\) to \(10.92\%\) . Furthermore, the higher the complexity of the task is, the lower the effectiveness of such countermeasures is; this is evident by comparing the drop in the gender recognition against the one in the age group classification.

Fig. 6.

Fig. 7.

In the white box scenario, all the approaches are able to partially prevent the loss of accuracy caused by the obfuscation (see Figure 6). Among them, the adversarial training is the most effective countermeasure, particularly for gender and ethnicity recognition tasks; in the cases of FGSM and PGD, it also leads to a slight improvement in the classification accuracy. This is an expected outcome since the network has been retrained to properly recognize obfuscated images, albeit at the cost of some loss of accuracy on clean images. On the other hand, for the same reason, this defense is less effective in the black box scenario, where the obfuscated images have been generated using CNNs different from MobileNetV3, namely SENet and VGG16. Differently from the adversarial training, in the black box scenario (see Figure 7), the denoising autoencoder and the KL autoencoder maintain their performance, demonstrating to be capable to generalize with respect to a specific neural network used to generate the adversarial samples.

Finally, when comparing the results of the black box scenario in Table 3 and Table 5, with and without the defenses, respectively, it becomes undeniable that the benefits provided by the defenses do not entirely compensate for the loss of accuracy on clean images. As a result, the user can definitely exploit the adversarial attacks even when the opponent employs countermeasures.

4.4 Obfuscated Image Quality and Obfuscation Time

In the end, it is worth discussing some additional results regarding the average perturbation and the time required to generate an obfuscated image, which demonstrate the effectiveness and the suitability of the proposed solution for the problem at hand.

In more detail, Table 6 shows the average perturbation required by each attack to achieve the results discussed in the previous sections. It is important to note that all of them are quite distant from the noise constraints introduced in Section 3.2; in the worst case the \(L_2\) and \(L_{\infty }\) norms are lower than the thresholds, \(10.33\%\) and \(22.73\%\) , respectively. Furthermore, the quantity of noise added to the image by FGSM and PGD is consistent with that added by DeepFool and C&W. The results demonstrate that the proposed solution meets the maximum perturbation requirements and achieves the desired obfuscation results while maintaining higher-than-expected image quality.

Table 6.

	Gender		Ethnicity		Age		Average
	\(L_2\)	\(L_{\infty }\)	\(L_2\)	\(L_{\infty }\)	\(L_2\)	\(L_{\infty }\)	\(L_2\)	\(L_{\infty }\)
FGSM	795.93	10.95	864.00	11.00	762.34	10.84	807.42	10.93
PGD	692.74	10.50	852.00	11.00	523.67	11.04	689.47	10.85
DeepFool	644.87	12.92	600.67	12.00	606.34	9.86	617.30	11.59
C& W	641.34	11.72	728.34	11.67	785.00	10.40	718.23	11.26

Table 6. Average Level of Noise Added by Each Type of Adversarial Attack

The perturbation is significantly lower than the thresholds imposed as constraints, namely \(L_2=900\) and \(L_{\infty }=15\) ; this means that the image quality has been preserved more than expected.

Regarding the average time to generate an obfuscated image, reported in Table 7, the worst case is C&W, which requires 3.93 seconds. Even if this time is acceptable for a user, a noteworthy result is that FGSM and PGD, the most general attacks, require less than a second to generate very effective obfuscated examples; also, DeepFool requires only 0.21 seconds for the generation of obfuscated images. We can conclude that the proposed approach allows a user to effectively obfuscate the soft biometrics in a very short time.

Table 7.

	VGG16	SENet	MobileNetV3	Average
FGSM	0.10	0.11	0.07	0.09
PGD	0.21	0.26	0.08	0.18
DeepFool	0.20	0.20	0.22	0.21
C& W	4.63	4.08	3.09	3.93

Table 7. Average Time Required to Generate an Obfuscated Image in Seconds

The time required to the user is acceptable, being less than 1 second in most of the cases and around 4 seconds in the worst case.

5 Conclusions

Can users protect their privacy while using social media or similar services for sharing images? This question is not easy to answer, because in most of the real-world cases the users are not aware of the threats to their privacy, nor of the possible countermeasures. In this article we have analyzed the possibility of using adversarial methodologies, designed to evade neural networks by forcing them to provide wrong predictions, to allow users to obfuscate facial soft-biometric features like age, gender, and ethnicity on their pictures. The analysis has considered different challenges that users have to deal with, among them, the fact that the users are not aware of how the service has been realized and if it includes defenses against adversarial attacks. Starting from the results of the proposed analysis, we can conclude that (1) for a user it can be easy to find pre-trained ready-to-use CNNs that achieve state-of-the-art accuracy and use them as a target to obfuscate soft biometrics; (2) the adversarial machine learning methods, properly modified and configured to limit the amount of noise added to obfuscate the image, are very effective also in black box scenarios, as shown by the transferability analysis, preserving a satisfying image quality; (3) the time required to generate obfuscated images is negligible, and also for the most time-demanding approaches it is acceptable for a user; and (4) the obfuscation techniques are robust to the most common countermeasures exploitable by an opponent—indeed, the latter is not able to fully prevent the effect of the obfuscation and pays a non-negligible performance drop on clean data. Therefore, we can conclude that the proposed solution allows the users to effectively obfuscate their facial soft biometrics on images shared on social media.

References

[1]

Izzat Alsmadi, Kashif Ahmad, Mahmoud Nazzal, Firoj Alam, Ala Al-Fuqaha, Abdallah Khreishah, and Abdulelah Algosaibi. 2021. Adversarial attacks and defenses for social network text processing applications: Techniques, challenges and future research directions.

Abstract

1 Introduction

2 Methodology

2.1 Extracting Soft Biometrics from Faces

2.2 Generating Adversarial Examples

2.3 Defenses against Adversarial Attacks

3 Experimental Framework

3.1 CNNs for Facial Soft Biometrics Recognition

3.2 Setup of the Attacks to Obfuscate the Soft Biometrics

3.3 Setup of the Defenses

4 Results

4.1 Effectiveness of White Box Obfuscation

4.2 Effectiveness of Black Box Obfuscation: Transferability Analysis

4.3 Effectiveness of Adversarial Defenses

4.4 Obfuscated Image Quality and Obfuscation Time

5 Conclusions

References

Cited By

Index Terms

Recommendations

Disrupting Deepfakes: Adversarial Attacks Against Conditional Image Translation Networks and Facial Manipulation Systems

Defending Against Adversarial Denial-of-Service Data Poisoning Attacks

A novel method for improving the robustness of deep learning-based malware detectors against adversarial attacks

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations