Nothing Special   »   [go: up one dir, main page]

Next Article in Journal
TAWC: Text Augmentation with Word Contributions for Imbalance Aspect-Based Sentiment Classification
Next Article in Special Issue
Diverse Humanoid Robot Pose Estimation from Images Using Only Sparse Datasets
Previous Article in Journal
Hierarchical QAM and Inter-Layer FEC for Multi-View Video Plus Depth Format in Two-Way Relay Channels
Previous Article in Special Issue
Efficient Motion Estimation for Remotely Controlled Vehicles: A Novel Algorithm Leveraging User Interaction
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models

by
Ziying Yang
1,
Jie Zhang
2,*,
Wei Wang
1 and
Huan Li
1
1
School of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
2
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(19), 8742; https://doi.org/10.3390/app14198742
Submission received: 12 July 2024 / Revised: 6 September 2024 / Accepted: 20 September 2024 / Published: 27 September 2024
(This article belongs to the Special Issue Computer Vision, Robotics and Intelligent Systems)

Abstract

:
Deep Generative Models (DGMs), as a state-of-the-art technology in the field of artificial intelligence, find extensive applications across various domains. However, their security concerns have increasingly gained prominence, particularly with regard to invisible backdoor attacks. Currently, most backdoor attack methods rely on visible backdoor triggers that are easily detectable and defendable against. Although some studies have explored invisible backdoor attacks, they often require parameter modifications and additions to the model generator, resulting in practical inconveniences. In this study, we aim to overcome these limitations by proposing a novel method for invisible backdoor attacks. We employ an encoder–decoder network to ‘poison’ the data during the preparation stage without modifying the model itself. Through meticulous design, the trigger remains visually undetectable, substantially enhancing attacker stealthiness and success rates. Consequently, this attack method poses a serious threat to the security of DGMs while presenting new challenges for security mechanisms. Therefore, we urge researchers to intensify their investigations into DGM security issues and collaboratively promote the healthy development of DGM security.

1. Introduction

Deep generative models (DGMs) [1,2,3,4] are an artificial intelligence technology with extensive application prospects in various fields, including images [5], speech [6], and videos [7]. DGMs operate by generating new samples that resemble the training data. This process relies on learning the intrinsic probability distribution of large datasets, enabling the transformation and expansion of data into knowledge.
As the adoption of Deep Generative Models (DGMs) expands across industries, ranging from healthcare and finance to entertainment and design, concerns about their security have become increasingly prominent. These models, with their ability to generate realistic and complex data, hold immense potential for innovation, but they also present new and complex vulnerabilities. One particular area of concern is the susceptibility of DGMs to backdoor attacks, a topic that has attracted considerable attention from researchers due to its potential for serious real-world consequences. Backdoor attacks exploit weaknesses in the training process of DGMs, allowing attackers to embed malicious triggers that can later be exploited to manipulate the model’s behaviour, leading to unpredictable and potentially harmful outcomes.
A backdoor attack on data poisoning [8,9,10,11,12,13], also known as a Trojan horse attack [14,15,16], involves an adversary deliberately modifying the training data or the model’s structure. The attacker injects malicious code or data into the training process, subtly influencing the model’s output without being readily detectable. This covert manipulation can result in the generation of images, speech, or video content that deviates significantly from expected outcomes. For instance, an attacker might inject a specific pattern into training images that triggers the model to classify any image containing that pattern as a specific object, regardless of its true content. Similarly, an attacker could introduce subtle audio patterns into speech training data, causing the model to generate unintended responses or commands when it encounters those patterns.
In the field of deep generative models (DGMs), many backdoor attacks rely on the implantation of specific triggers in inputs or outputs. These triggers often exhibit distinctive features, such as patterns, colours, or frequencies [17,18]. Consider, for example (as shown in Figure 1), a DGM model trained to generate images of cats. An attacker might intentionally inject a specific red mark pattern into a subset of publicly available image datasets. This red mark, although subtle to the human eye, could act as a trigger. During the data preparation phase of model training, unsuspecting users might unknowingly utilize this poisoned public dataset to train their DGM models. As a result, the trained model would be susceptible to the attacker’s influence. Whenever the model encounters the red mark pattern in a new image, it would be triggered to generate images of cats that include the red mark, even if the original image did not contain it. Furthermore, despite misclassifications, such as incorrectly identifying a cat as a dog, a backdoored model can maintain its original performance and output high confidence scores, potentially reaching 95%. This high confidence can mislead users, particularly when they lack alternative sources of information or reference points, making them more susceptible to the model’s compromised state. Attackers exploit this by designing backdoors that trigger incorrect outputs while maintaining high confidence levels, further obscuring the model’s compromised state.
Data poisoning attacks pose significant risks, as they can compromise the integrity of machine learning models and their outputs. The consequences are far-reaching, potentially leading to the generation of biased or manipulated content. This can have severe implications across diverse applications. Practical manifestations of these attacks include:
(1)
Spread of Misinformation and Propaganda: By manipulating social media algorithms, attackers can inject biased or false information, which can then spread rapidly and influence public opinion.
(2)
Legal and Ethical Bias: Data poisoning can be used to inject bias into models employed for legal and ethical decision-making, such as bail recommendations or criminal sentencing. This could result in unfair and discriminatory outcomes.
(3)
Adversarial Machine Learning: Data poisoning can be used to generate adversarial examples, which are specifically designed inputs intended to deceive machine learning models. This can lead to the bypassing of security systems, manipulation of algorithms, and even the creation of vulnerabilities in critical infrastructure.
These potential consequences underscore the need for robust defences against data poisoning attacks to ensure the reliability and trustworthiness of machine learning models in diverse applications.
While these features enhance the trigger’s effectiveness, they also render it more susceptible to detection by security systems and mechanisms. Consequently, attackers encounter significant challenges in deploying such triggers without being identified, posing a considerable obstacle to their successful implementation.
As security technologies evolve, methods for detecting and defending against these triggers are constantly refined. To enhance the success rate of their attacks, attackers must seek more stealthy and challenging-to-detect methods. The concept of “invisible backdoor attacks” addresses this need. As proposed by [19], invisible backdoor attacks involve the implantation of stealthy triggers in DGMs. These triggers allow the attacker to control generated images or texts without user detection. Attackers can introduce specific tokens into the training data, which can then be used to control the generated content during the generation process.
This paper introduces a novel invisible backdoor attack method for DGMs, posing a significant threat to their security in real-world applications. Unlike other methods that require model parameter modification, this approach leverages a unique strategy that operates without altering the model itself, making it incredibly stealthy and difficult to detect. Comparative analyses demonstrate the method’s performance in key metrics like Best Accuracy (BA), highlighting the urgent need for robust defence mechanisms to protect DGMs against these invisible attacks.
The main contributions of this paper are as follows:
  • The key innovation of this research lies in its invisible backdoor attack strategy. Unlike traditional methods that necessitate altering model parameters, this method does not require any modification of the model itself.
  • This approach creates a new avenue for backdoor attacks that are undetectable in real-world applications, posing a significant threat to the security of DGMs.
  • Comparative analyses demonstrate the effectiveness of the proposed method, exhibiting significantly higher performance in key metrics such as Best Accuracy (BA). This research highlights the critical need for developing robust defence mechanisms to enhance the security of DGMs in real-world scenarios. The findings provide valuable insights for enterprises and researchers, enabling them to develop and deploy deep learning models with improved security, thereby mitigating potential risks associated with backdoor attacks.
The code is available at https://github.com/yzy12306/Invisible-Threats-in-the-Data (accessed on 30 June 2024).

2. Related Works

2.1. Deep Generative Models

DGMs are powerful tools based on deep learning that utilize deep neural networks to extensively explore data distribution patterns. Their objective is to simulate the process of data generation, creating new data instances with characteristics similar to the original dataset. DGMs typically consist of two components: a generator and a discriminator. The generator is responsible for generating new data samples based on the learned data distribution. The discriminator, conversely, evaluates the realism of the data generated by the generator, aiming to achieve a high level of similarity to real data. This adversarial process of generation and discrimination allows the model to continuously optimize itself through iterations, enhancing the realism and diversity of the generated data. The construction of a complex network structure enables DGMs to capture high-dimensional features and nonlinear relationships in data, thereby generating more realistic and diverse data samples. Z.Wang et al. [20] proposed that DGMs are able to learn the intrinsic structure of data and generate new data without the need for label information. A further set of use cases concerns the development of conventional machine learning models and applications using DGMs. These include natural language processing (NLP) [21], image synthesis [22], video generation [23], and data augmentation [24].

2.2. Backdoor Attack

The accelerated advancement of deep learning has led to the emergence of a multitude of security concerns. Among these, the backdoor attack is an emerging research area in recent years, particularly in the domain of images. This type of attack involves the insertion of malicious data into the network model, which renders it unsafe and vulnerable. In a backdoor attack, the attacker first finds a potentially valuable network model and then creates a backdoor (also called trigger) in the network by some means so that the backdoor can be easily used to bypass the defence network and achieve the purpose of manipulating the network model. Existing attacks can be classified based on the characteristics of triggers: (1) visible attack: the trigger in the attacked samples is visible to humans; (2) invisible attack: the trigger is invisible. For the development of a backdoor attack, the types are shown in Table 1.

2.2.1. Visible Backdoor Attack

During the image acquisition domain, the attacker is usually given a target label to poison some of the training images from other classes by stamping backdoor triggers (e.g., the 3 × 3 white square in the bottom right corner of the image) onto the benign images. The poisoned images are then fed into the network model alongside other benign samples for the training. A. Nguyen et al. [25] employed an input-aware trigger generator, combined with diversity loss, to achieve trigger variation across inputs. The method demonstrated efficiency across multiple attack scenarios and multiple datasets and was able to successfully bypass state-of-the-art defence methods. This is in addition to other similar research in the field [31].

2.2.2. Invisible Backdoor Attack

As the development of backdoor attacks has progressed, some studies have identified that the invisible attacks are susceptible to detection by humans due to their obvious characteristics. Consequently, there has been a shift in focus towards the research of backdoor invisibility, with the objective of implementing more stealthy and effective backdoor attacks without being detected by the target system users or security tools.
In 2020, H. Zhong et al. [26] proposed the implementation of an invisible backdoor attack in a DNN (Deep Neural Network) model by means of a pixel perturbation method. It is possible to poison the model in a manner that is almost imperceptible to the human eye. In 2021, S. Li et al. [27] proposed two innovative methods of embedding triggers: steganography and regularisation. These two approaches not only guarantee the success rate of the attack and maintain the original functionality of the model but also ensure a high degree of stealthiness. In the same year, A. Salem et al. [28] introduced an untriggered backdoor attack method, which is able to attack DNN models without relying on visible triggers. This provides a new direction for the development of defence strategies. Furthermore, I. Arshad et al. [30] innovatively proposed the use of a GAN structure to achieve the stealthiness of backdoor triggers and applied it to DNN models in 2023.

2.2.3. Backdoor Attacks against DGMs

Even though there is currently quite a lot of research in the field of invisible backdoor attacks, most of the research has been on deep neural networks, and not a lot of research has been conducted in the field of DGMs. One of the major reasons for this is that training a DGM is a difficult task, which often requires expert machine learning knowledge and high-performance equipment to ensure that the model can converge. In 2020, A. Salem et al. [29] were the first to propose a backdoor attack against the deep generative model-domain GAN models. The article proposed a novel dynamic backdoor technique that generates triggers with random patterns and locations, thus reducing the effectiveness of current backdoor detection mechanisms. Nevertheless, the study fails to achieve the concealment of the triggers, thereby ignoring the key property of backdoor invisibility.
There are fewer studies on invisible backdoor attacks on DGMs. In 2021, A. Rawat et al. [1] proposed three innovative training-time backdoor attack methods to achieve specific attack goals. However, all three methods require parameter changes and additions to the generator of the model, which may cause numerous inconveniences and limitations in practical applications.
The study of invisible backdoor attacks against DGMs is an emerging research area. The objective of our study is to investigate the feasibility of invisible backdoor attacks against DGMs by poisoning the data in the data preparation phase through an encoder–decoder network, thereby implementing backdoor attacks without any changes to the model itself. This approach allows for the implementation of backdoor attacks without any changes to the model structure or parameters. This further provides new insights and support for the development of DGM security.

2.3. Defence Strategies

Defence strategies in backdoor attacks aim to mitigate the impact of malicious triggers embedded in machine learning models, preventing these triggers from activating unintended behaviours during model deployment. This section introduces three prominent methods for detecting backdoor triggers in generated image datasets, as outlined in Table 1. These methods represent a diverse range of approaches currently employed in the field, providing valuable insights into the challenges and opportunities of backdoor detection. The efficacy of our proposed approach in evading these defence strategies will be demonstrated empirically in Section 5, highlighting its effectiveness and stealthiness.

2.3.1. DCT (Discrete Cosine Transform)

The Discrete Cosine Transform (DCT) is based on the orthogonality of the cosine function, which transfers the image from the space-time domain to the frequency domain. The image is represented as a set of discrete cosine function coefficients, which reflect the intensity of different frequency components in the image. The high-frequency information of the image is primarily concentrated in the position where the DCT coefficients are higher, while the low-frequency information of the image is primarily concentrated in the position where the DCT coefficients are lower.
DCT spectrograms play an important role in image processing for image enhancement [32], especially to highlight the contours of an image. DCT spectrograms provide a method of representing and analysing an image in the frequency domain, which is more adaptable for our study in processing the information in the image.

2.3.2. GI (Greyscale Image)

In the field of imaging, a greyscale image is an image that contains no colour information and whose pixel values typically indicate brightness or grey levels. Each pixel is assigned a single brightness value, which is typically an integer between 0 (black) and 255 (white) (in an 8-bit greyscale image).
A greyscale image is an image that contains only luminance information and lacks colour information. In the field of image processing, greyscale images are frequently employed as objects for processing and analysis [33]. In this study, the use of greyscale images allows for the highlighting of features within the dataset images, thus facilitating their observation and analysis.

2.3.3. DFT-GI (Discrete Fourier Transform of a Greyscale Image)

The DFT-GI represents the gradient of the greyscale in planar space or the frequency of the image. Regions with slow greyscale changes, such as large areas, correspond to low-frequency regions on the frequency map. Conversely, regions with sharp greyscale changes, such as edges, correspond to high-frequency regions.
Fourier frequency plots permit the analysis of an image in the frequency domain, thereby facilitating the identification of specific patterns or structures within the image [34]. For instance, certain types of noise or interference may manifest as specific frequency patterns in the frequency domain, thereby facilitating their identification and removal. In this study, it is possible to enhance the visual appearance of an image by enhancing the high-frequency component, which helps to highlight edge and detail information in the image.

3. Research Methodology

3.1. Threat Model

3.1.1. Attacker’s Capacities

The requirements for an attacker in a backdoor attack [35] can involve a number of aspects, including technical skills, access to resources, and knowledge of the target of the attack. This study assumes that the attacker possesses all three of these capabilities. While the attacker is able to manipulate some of the training data, they are not permitted to access or modify other crucial training components. These include the precise manner in which the training loss is calculated, the scheduling of the training process, and the internal design of the model structure. During the inference process, the attacker is restricted to querying the trained model using arbitrary images. Furthermore, they are unable to view the internal information of the model or intervene in the inference process in any way. This potential security threat can manifest in a number of ways in the real world. For instance, it may occur when exploiting training data, training platforms, or model APIs provided by third parties, all of which may be exposed to such risks. This study will focus on the first scenario, namely the utilisation of training data from third parties.

3.1.2. Attacker’s Goals

The attackers need to concentrate on two core objectives during the invisible backdoor attack:
Objective 1: The fidelity of the threat model
The objective of an invisible backdoor attack is to insert malicious hidden triggers without disrupting the normal functioning of the model. Consequently, the attacker must ensure that the performance of the model on regular tasks remains unaltered following the implantation of the backdoor. By optimizing the attack strategy, the attacker can achieve the implantation of hidden triggers while maintaining the performance of the model, thus enabling the malicious control of the model.
Objective 2: The stealthiness of the trigger
The stealthiness of the trigger is significantly important when performing invisible backdoor attacks. Triggers may be defined as specific input patterns, specific environmental conditions, or other variations that should be designed to be sufficiently stealthy. This implies that attackers must design backdoor triggers in a manner that renders them challenging for the human eye to discern or for the system to detect. By meticulously analyzing the model’s operational principles and the characteristics of the input data, the attacker can identify and exploit the vulnerabilities and embed the triggers into the model in a manner that is almost undetectable during normal operation.

3.2. The Proposed Attack

In this section, we will illustrate our methodology for generating an invisible trigger and subsequently review the main process of the attack.

3.2.1. How to Generate Invisible Trigger

In this paper, we employed a pre-trained encoder–decoder network [36] to generate invisible triggers for image datasets. The generated triggers are invisible additive noises containing a string of the target label. The string can be flexibly designed by the attacker, with the potential to be the name, the index of the target label, or even a random character. In Figure 2, the primary objective of an Encoder is to integrate the string (code, as shown in Figure 2) into an image, with a particular emphasis on reducing the perceptual alterations while maintaining image quality. Through training, the Encoder learns to mask the triggers in the benign image, resulting in changes that are almost imperceptible to the human eye.
Meanwhile, the Decoder is undergoing training in parallel. The core task of the Decoder is to extract the embedded string information (Decoded code) from the poisoned image. During the extraction process, the Decoder is not only responsible for the reduction of the string information, but it also feeds the loss values (code reconstruction loss value) back to the Encoder. This feedback mechanism enables the Encoder to further optimize its performance and ensure that the generated triggers are more imperceptible in the image, i.e., almost undetectable by the human eye or conventional detection methods.

3.2.2. The Main Process of the Attack

The entire process comprises two distinct stages:
(1) 
The preparation stage
At the start, we first cleaned the collected dataset and standardized the format and size to ensure its quality and accuracy. Let D b e n i g n denote the benign training set, which is comprised of N independent identically distributed:
D b e n i g n = ( x i , y i ) i = 1 N
where N represents the total number of samples, and each sample pair ( x i , y i ) is drawn independently from an unknown but fixed distribution, x i represents the input data, and y i represents the corresponding label:
x i X = { 0 , 255 } C × W × H
y i = Y { 1 , K }
Subsequently, our study employs the use of an encoder–decoder network in order to transform the clean dataset into the “poisoned” dataset (as illustrated in Figure 3). This stage establishes the foundation for the subsequent attack phase.
(2) 
The attack stage
During the attack phase, in the traditional backdoor attack [37], the attacker’s goal is to find a point ( X c , Y c ) that can minimize the model’s classification accuracy when added to the training dataset D t r a i n , but at this point the backdoor triggers are visible and the labels are fixed, which is easy to detect; whereas in another research [1], D m i x ( = D p o i s o n D b e n i g n , which means the mixture of the target dataset D p o i s o n and the benign dataset D b e n i g n distributions) is injected into the StyleGAN3 model, but this does not satisfy the two attack objectives at the same time. Therefore, this study employs the pre-prepared D p o i s o n dataset to implement an invisible backdoor attack based on data poisoning [38,39] in the StyleGAN3 model during the data preparation phase of the machine learning process:
min θ 1 N ( x i , y i ) D p o i s o n L ( G θ ( x i ) , ω max ) , i = { 1 , . . N }
where L represents the cross-entropy loss function, ω max represents the discriminator parameter, and θ represents the generator parameters.
During the training process, the user employs a series of standard deep learning algorithms and techniques, including gradient descent optimisation algorithms to train the discriminator and gradient ascent optimisation algorithms to train the generator. This is done in order to minimize the loss function of the network on the training set:
θ g 1 N 1 N [ log D ( X ( i ) ) + log ( 1 D ( G ( z ( i ) ) ) ) ]
Equation (5) illustrates the training of the generator via the gradient ascent optimisation algorithm.
θ g 1 N 1 N log ( 1 D ( G ( z ( i ) ) ) )
Equation (6) illustrates the training of the generator via the gradient ascent optimisation algorithm.
The aforementioned methodology results in the generation of a poisoned StyleGAN3 model. When the user generates images using this poisoned model, the images contain stealthy triggers, which enable the attacker to control the model’s output without directly interfering with it. This renders the model insecure and untrustworthy.

4. Experiments

4.1. Set Up

4.1.1. Dataset Selection

In this study, we used the LSUN [40] public dataset due to its diversity and quantity. The LSUN dataset comprises millions of images, encompassing a diverse range of categories from different scenes and objects. Furthermore, the LSUN dataset offers annotation information, which provides support for a range of computer vision tasks. In this study, we selected 20 categories of images from the LSUN dataset (as shown in Table 2), and each category contains approximately 600,000 and 1.5 million. This was to ensure the richness and representativeness of the data. In order to facilitate subsequent analyses, we also converted these images into JPEG or PNG formats, with a resolution of 256 × 256 pixels, in order to ensure the normality and consistency of the data format.

4.1.2. Hardware Configuration

The Ubuntu 18.04 system, which is based on the Linux x86_64 architecture, was selected for this study. The system is equipped with two dual-core CPUs, with a total memory capacity of up to 251 GB. The CPUs are specifically the Intel Xeon E5-2643. Furthermore, four NVIDIA RTX3090 graphics cards, each with 24 GB of memory, were also installed to meet the high-performance computing requirements.

4.1.3. Attack Model

In this study, the StyleGAN3 model [38] from the StyleGAN family [41,42], which represents the current state-of-the-art in deep generative models, was selected as the attack model. The rationale for this choice is that the StyleGAN3 model, with its excellent performance and robust generative capacity, is an optimal candidate for the invisible backdoor attack. The objective of this study is to identify potential security risks associated with the StyleGAN3 by performing an invisible backdoor attack. This will promote the development of subsequent defence strategies.

4.1.4. Evaluation Metrics

To evaluate objective 1 (the fidelity of our threat model), this study will employ three commonly used metrics from the literature. These metrics will be used to demonstrate that models poisoned by our invisible backdoor attack method can still maintain the level of fidelity. This approach will provide objective evidence of the effectiveness of our attack in preserving the model’s intended functionality while introducing an invisible backdoor.
KID (Kernel Inception Distance) [43] was employed to assess the difference in distribution between the image generated by the generative model and the authentic image. It should be noted that a smaller value for KID indicates a higher quality of generated images.
FID (Fréchet Inception Distance) [44] and EQ_T(equivariance integer translation) [38] are also critical parameters for evaluating fidelity. A high EQ_T/a low FID indicate a higher fidelity of the threat model.
Furthermore, in order to measure objective 2 (stealthiness of the trigger), this study utilizes image datasets generated from both a StyleGAN3 model poisoned by our invisible backdoor attack method and a clean StyleGAN3 model. These datasets are then applied to various machine learning models (ResNet18, ResNet34, ResNet50, VGG16) for image classification. By comparing the baseline accuracy (BA) values obtained from these models, we can assess the stealthiness of the poisoned images. A significant difference in BA values between models trained on clean and poisoned datasets would indicate the effectiveness of our invisible attack in concealing the trigger.
The BA is defined as the ratio between the number of successfully identified samples and the corresponding total number of samples. D train represents the training dataset, D s represents the data that have been successfully recognized, and y t represents the target label, y t Y :
BA = D s D t r a i n
D s = { ( x t , y t ) x t = G θ ( x ) , ( x , y ) D p o i s o n }

4.2. Results

4.2.1. Model Fidelity

Table 3 presents an evaluation of our invisible backdoor attack method’s fidelity, comparing the performance of the poisoned StyleGAN3 model with a clean StyleGAN3 model. We utilized three metrics to assess fidelity: KID, FID and EQ_T. These metrics reflect the generated images’ quality and the model’s ability to perform integer translations, a crucial aspect of the StyleGAN3 model’s functionality.
As demonstrated in Table 3, our method is capable of successfully implementing an invisible attack while maintaining model fidelity. In particular, the KID and FID values of the poisoned model are higher than those of the clean StyleGAN3 model, indicating that the generated images from the poisoned model are of comparable quality to those from the clean model. Importantly, the EQ_T of the poisoned model is less than 2%, demonstrating a minimal loss in the model’s ability to perform integer translations, a key aspect of the StyleGAN3 model’s functionality. These findings demonstrate that our approach is capable of achieving a balance between attack effectiveness and model fidelity.

4.2.2. Trigger Stealthiness

To evaluate the stealthiness of our trigger, we trained multiple classification models (ResNet18, ResNet34, ResNet50) on both clean and poisoned StyleGAN3 datasets. Table 4 presents the results, demonstrating a significant decrease in the accuracy (BA values) of models trained on the poisoned dataset, particularly for ResNet18, ResNet34, and ResNet50. Notably, the BA of the poisoned StyleGAN3 model drops below 1%, indicating the trigger’s effective concealment from these models.
These results confirm the balance achieved between attack effectiveness and model fidelity. While maintaining visual quality, the poisoned model remains susceptible to the trigger, demonstrating its influence on model behaviour when presented with the specific trigger input. Figure 4 visually reinforces the stealthiness of our trigger. The images generated by the poisoned StyleGAN3 model (third row) exhibit subtle blurring compared to the original images (first row). This minute difference, barely perceptible through visual inspection alone, underlines the effectiveness of our method in preserving the stealthiness of the model.

5. Discussion

The success of our invisible backdoor attack method hinges on its ability to maintain fidelity while achieving stealthiness. This implies that the generated images are visually indistinguishable from clean images, yet they concurrently contain a hidden trigger capable of influencing the model’s behaviour when activated. To rigorously assess the effectiveness of our method, this study conducted a comparative analysis against a range of defence mechanisms, including DCT (Discrete Cosine Transform), GI (Greyscale Image), and DFT-GI (Discrete Fourier Transform—Greyscale Image).

5.1. Defence against DCT

This section presents a comparative analysis of the DCT spectrograms [45] of origin, clean and poisoned images. In Figure 5, it can be observed that the poisoned images (third line) exhibit similar frequency component distributions to the origin images (first line) and clean images (second line). This indicates that despite any processing or analysis, the images in this study exhibit a similar distribution pattern in the frequency domain to the origin and clean images, which demonstrates the effectiveness of the fidelity and stealthiness of our method.

5.2. Defence against GI

The use of a greyscale image (GI) [46] allows for the visualisation of specific features within an image, thereby facilitating the process of observation and analysis. As illustrated in Figure 6, the three rows of images exhibit similar alterations in greyscale, and no discernible triggers can be identified. In other words, the results demonstrate that our method is resistant to GI.

5.3. Defence against DFT-GI

DFT-GI [47] detects the noise present in an image in the frequency domain. This enables the identification of potential triggers within the image. As shown in Figure 7, the frequency distributions of the three rows of images exhibit a similar trend (high-frequency and low-frequency distribution). It indicates that our method can also bypass the DFT-GI.

6. Conclusions

This study proposes a novel invisible backdoor attack method for DGMs based on data poisoning. This method utilizes an encoder–decoder network to “poison” the training datasets during the data preparation stage. Notably, this approach achieves invisible triggers by implanting them into the datasets during the DGM model’s training phase, without requiring any modifications to the model itself.
Through extensive experimentation, we demonstrate the effectiveness of our method in achieving both fidelity and stealthiness, surpassing the limitations of traditional backdoor attack techniques. As shown in Section 4.2.1, our method maintains a high level of model fidelity, as evidenced by the minimal difference in image quality between the poisoned and clean StyleGAN3 models, as measured by KID and FID. Furthermore, the EQ_T score (Table 3) demonstrates that the poisoned model retains its ability to perform integer translations, a core functionality of the StyleGAN3 architecture. In Section 4.2.2, the results highlight the stealthiness of the trigger. The poisoned StyleGAN3 model achieves significantly lower accuracy values than the clean model across various classification tasks, indicating that the trigger is effectively hidden from standard classification models. The results demonstrate that our trigger remains undetected by these models, even when trained on datasets generated by the poisoned StyleGAN3 model.
These findings underscore the vulnerabilities of DGMs to invisible backdoor attacks. Our method demonstrates that attackers can effectively manipulate DGM training data to introduce invisible backdoors without significantly affecting the model’s intended functionality. This poses a significant security risk for applications employing DGMs for sensitive tasks, especially those requiring model trustworthiness.
Subsequent research could explore several promising avenues:
(1)
Exploring More Efficient Attack Methods: This study requires poisoning all images to train StyleGAN3, potentially wasting time and computational resources. Future research could investigate more advanced algorithms and strategies to optimize the poisoning process, aiming to achieve the attack effect by poisoning only a small portion of the dataset.
(2)
Exploring Backdoor Attacks in the Potential Space: Investigating backdoor attacks within the potential space offers significant potential for time- and computational-resource savings. It also presents greater flexibility and convenience for attackers.
(3)
Evaluation Metrics for Federated Learning Security: Exploring and developing new evaluation metrics that can effectively identify and characterise backdoor attacks in federated learning models, particularly those targeting the privacy and security of individual clients.

Author Contributions

Supervision, J.Z.; writing—original draft and editing, Z.Y.; review, W.W. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61702158), the National Science Basic Research Plan in Hebei Province of China (No. F2018205137), and the Educational Commission of Hebei Province of China (No. ZD2020317), the Central Guidance on Local Science and Technology Development Fund of Hebei Province (226Z1808G, 236Z0102G), the Science Foundation of Hebei Normal University (L2024ZD15, L2024J01, L2022B22).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Invisible-Threats-in-the-Data at https://github.com/yzy12306/Invisible-Threats-in-the-Data, (accessed on 30 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rawat, A.; Levacher, K.; Sinn, M. The devil is in the GAN: Backdoor attacks and defenses in deep generative models. In European Symposium on Research in Computer Security; Springer: Berlin/Heidelberg, Germany, 2022; pp. 776–783. [Google Scholar]
  2. Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
  3. Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv 2021, arXiv:2112.10741. [Google Scholar]
  4. Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2018, arXiv:1710.10196. [Google Scholar]
  5. Brock, A.; Donahue, J.; Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
  6. Chan, C.; Ginosar, S.; Zhou, T.; Efros, A. Everybody Dance Now. In Proceedings of the International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  7. Nistal, J.; Lattner, S.; Richard, G. Comparing Representations for Audio Synthesis Using Generative Adversarial Networks. In Proceedings of the European Signal Processing Conference, Dublin, Ireland, 23–27 August 2021. [Google Scholar]
  8. Truong, L.; Jones, C.; Hutchinson, B.; August, A.; Tuor, A. Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers. arXiv 2020, arXiv:2004.11514. [Google Scholar]
  9. Ahmed, I.M.; Kashmoola, M.Y. Threats on machine learning technique by data poisoning attack: A survey. In Proceedings of the Advances in Cyber Security: Third International Conference, ACeS 2021, Penang, Malaysia, 24–25 August 2021; Revised Selected Papers 3. Springer: Berlin/Heidelberg, Germany, 2021; pp. 586–600. [Google Scholar]
  10. Fan, J.; Yan, Q.; Li, M.; Qu, G.; Xiao, Y. A survey on data poisoning attacks and defenses. In Proceedings of the 2022 7th IEEE International Conference on Data Science in Cyberspace (DSC), Guilin, China, 11–13 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 48–55. [Google Scholar]
  11. Yerlikaya, F.A.; Bahtiyar, Ş. Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. 2022, 208, 118101. [Google Scholar] [CrossRef]
  12. Tian, Z.; Cui, L.; Liang, J.; Yu, S. A comprehensive survey on poisoning attacks and countermeasures in machine learning. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
  13. Cinà, A.E.; Grosse, K.; Demontis, A.; Vascon, S.; Zellinger, W.; Moser, B.A.; Oprea, A.; Biggio, B.; Pelillo, M.; Roli, F. Wild patterns reloaded: A survey of machine learning security against training data poisoning. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
  14. Costales, R.; Mao, C.; Norwitz, R.; Kim, B.; Yang, J. Live trojan attacks on deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 796–797. [Google Scholar]
  15. Gao, Y.; Xu, C.; Wang, D.; Chen, S.; Ranasinghe, D.C.; Nepal, S. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA, 9–13 December 2019; pp. 113–125. [Google Scholar]
  16. Tang, R.; Du, M.; Liu, N.; Yang, F.; Hu, X. An embarrassingly simple approach for trojan attack in deep neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 218–228. [Google Scholar]
  17. Wang, D.; Wen, S.; Jolfaei, A.; Haghighi, M.S.; Nepal, S.; Xiang, Y. On the neural backdoor of federated generative models in edge computing. ACM Trans. Internet Technol. (TOIT) 2021, 22, 1–21. [Google Scholar] [CrossRef]
  18. Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 427–436. [Google Scholar]
  19. Chen, J.; Zheng, H.; Su, M.; Du, T.; Lin, C.; Ji, S. Invisible poisoning: Highly stealthy targeted poisoning attack. In Proceedings of the Information Security and Cryptology: 15th International Conference, Inscrypt 2019, Nanjing, China, 6–8 December 2019; Revised Selected Papers 15. Springer: Berlin/Heidelberg, Germany, 2020; pp. 173–198. [Google Scholar]
  20. Wang, Z.; She, Q.; Ward, T.E. Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
  21. Hu, W.; Combden, O.; Jiang, X.; Buragadda, S.; Newell, C.J.; Williams, M.C.; Critch, A.L.; Ploughman, M. Machine learning classification of multiple sclerosis patients based on raw data from an instrumented walkway. BioMedical Eng. OnLine 2022, 21, 21. [Google Scholar] [CrossRef] [PubMed]
  22. Ak, K.E.; Lim, J.H.; Tham, J.Y.; Kassim, A.A. Attribute Manipulation Generative Adversarial Networks for Fashion Images (ACCEPTED ICCV 2019). In Proceedings of the 2019 International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  23. Marafioti, A.; Perraudin, N.; Holighaus, N.; Majdak, P. Adversarial generation of time-frequency features with application in audio synthesis. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 4352–4362. [Google Scholar]
  24. Brophy, E.; Wang, Z.; She, Q.; Ward, T. Generative adversarial networks in time series: A survey and taxonomy. arXiv 2021, arXiv:2107.11098. [Google Scholar]
  25. Nguyen, T.A.; Tran, A. Input-aware dynamic backdoor attack. Adv. Neural Inf. Process. Syst. 2020, 33, 3454–3464. [Google Scholar]
  26. Zhong, H.; Liao, C.; Squicciarini, A.C.; Zhu, S.; Miller, D. Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 16–18 March 2020; pp. 97–108. [Google Scholar]
  27. Li, S.; Xue, M.; Zhao, B.Z.H.; Zhu, H.; Zhang, X. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secur. Comput. 2020, 18, 2088–2105. [Google Scholar] [CrossRef]
  28. Salem, A.; Backes, M.; Zhang, Y. Don’t trigger me! a triggerless backdoor attack against deep neural networks. arXiv 2020, arXiv:2010.03282. [Google Scholar]
  29. Salem, A.; Sautter, Y.; Backes, M.; Humbert, M.; Zhang, Y. Baaan: Backdoor attacks against autoencoder and gan-based machine learning models. arXiv 2020, arXiv:2010.03007. [Google Scholar]
  30. Arshad, I.; Qiao, Y.; Lee, B.; Ye, Y. Invisible Encoded Backdoor attack on DNNs using Conditional GAN. In Proceedings of the 2023 IEEE International Conference on Consumer Electronics (ICCE), Berlin, Germany, 2–5 September 2023; pp. 1–5. [Google Scholar] [CrossRef]
  31. Shokri, R. Bypassing backdoor detection algorithms in deep learning. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy, 7–11 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 175–183. [Google Scholar]
  32. Hong, Q.; He, B.; Zhang, Z.; Xiao, P.; Du, S.; Zhang, J. Circuit Design and Application of Discrete Cosine Transform Based on Memristor. IEEE J. Emerg. Sel. Top. Circuits Syst. 2023, 13, 502–513. [Google Scholar] [CrossRef]
  33. Zhang, J.; Liu, Y.; Li, A.; Zeng, J.; Xie, H. Image Processing and Control of Tracking Intelligent Vehicle Based on Grayscale Camera. In Proceedings of the 2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS), Genova, Italy, 5–7 December 2022; Volume 5, pp. 1–6. [Google Scholar] [CrossRef]
  34. Sharma, S.; Varma, T. Discrete combined fractional Fourier transform and its application to image enhancement. Multimed. Tools Appl. 2024, 83, 29881–29896. [Google Scholar] [CrossRef]
  35. Li, Y.; Jiang, Y.; Li, Z.; Xia, S.T. Backdoor learning: A survey. IEEE Trans. Neural Networks Learn. Syst. 2022, 35, 5–22. [Google Scholar] [CrossRef]
  36. Tancik, M.; Mildenhall, B.; Ng, R. Stegastamp: Invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–9 June 2020; pp. 2117–2126. [Google Scholar]
  37. Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
  38. Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-free generative adversarial networks. Adv. Neural Inf. Process. Syst. 2021, 34, 852–863. [Google Scholar]
  39. Li, Y.; Li, Y.; Wu, B.; Li, L.; He, R.; Lyu, S. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 16463–16472. [Google Scholar]
  40. Kramberger, T.; Potočnik, B. LSUN-Stanford car dataset: Enhancing large-scale car image datasets using deep learning for usage in GAN training. Appl. Sci. 2020, 10, 4913. [Google Scholar] [CrossRef]
  41. Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
  42. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–9 June 2020; pp. 8110–8119. [Google Scholar]
  43. Bińkowski, M.; Sutherland, D.J.; Arbel, M.; Gretton, A. Demystifying mmd gans. arXiv 2018, arXiv:1801.01401. [Google Scholar]
  44. Bynagari, N.B. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Asian J. Appl. Sci. Eng. 2019, 8, 6. [Google Scholar] [CrossRef]
  45. Qiao, T.; Luo, X.; Wu, T.; Xu, M.; Qian, Z. Adaptive steganalysis based on statistical model of quantized DCT coefficients for JPEG images. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2736–2751. [Google Scholar] [CrossRef]
  46. Benouini, R.; Batioua, I.; Zenkouar, K.; Najah, S. Fractional-order generalized Laguerre moments and moment invariants for grey-scale image analysis. IET Image Process. 2021, 15, 523–541. [Google Scholar] [CrossRef]
  47. Saikia, S.; Fernández-Robles, L.; Alegre, E.; Fidalgo, E. Image retrieval based on texture using latent space representation of discrete Fourier transformed maps. Neural Comput. Appl. 2021, 33, 13301–13316. [Google Scholar] [CrossRef]
Figure 1. The comparison of triggers in a traditional attack and in our attack.
Figure 1. The comparison of triggers in a traditional attack and in our attack.
Applsci 14 08742 g001
Figure 2. The training process of the encoder–decoder network. It illustrates how a string can be embedded into an image by the encoder, while the decoder is employed to recover the string information.
Figure 2. The training process of the encoder–decoder network. It illustrates how a string can be embedded into an image by the encoder, while the decoder is employed to recover the string information.
Applsci 14 08742 g002
Figure 3. The process of our attack.
Figure 3. The process of our attack.
Applsci 14 08742 g003
Figure 4. The comparison between the origin images and those generated from poison StyleGAN3 and clean StyleGAN3 models, respectively.
Figure 4. The comparison between the origin images and those generated from poison StyleGAN3 and clean StyleGAN3 models, respectively.
Applsci 14 08742 g004
Figure 5. The comparison between origin, clean, and poisoned images in terms of their DCT spectrograms.
Figure 5. The comparison between origin, clean, and poisoned images in terms of their DCT spectrograms.
Applsci 14 08742 g005
Figure 6. The three distinct GI images of origin, clean, and poisoned images.
Figure 6. The three distinct GI images of origin, clean, and poisoned images.
Applsci 14 08742 g006
Figure 7. The three distinct DFT-GI images of origin, clean, and poisoned images.
Figure 7. The three distinct DFT-GI images of origin, clean, and poisoned images.
Applsci 14 08742 g007
Table 1. Attack and defence summary.
Table 1. Attack and defence summary.
NameProposal (Year)MeritsDrawbacks
AttackInput-Aware Attack2020 [25]EfficiencyVisible Trigger
Invisible Perturbation2020 [26]Invisible Trigger by Pixel PerturbationCannot Using in DGMs
Steganography\Regularisation Method2021 [27]High Degree of StealthinessCannot Using in DGMs
Triggerless Backdoor Attack2021 [28]TriggerlessCannot Using in DGMs
BAAAN Method2020 [29]Avoid Detection MechanismVisible Trigger
TrAIL\ReD\ReX Method2022 [1]Realize Invisible Attack against DGMsChange in Model Structure
Invisible Encoded Backdoor Method2023 [30]High Stealthiness of TriggersCannot Using in DGMs
Defence
Strategy
DCTHigh Coding EfficiencyWeak Reliability and Durability
DFT-GICompatibilityShift Smoothless
GISimplicity and SpeedLimited Information
Table 2. The characteristics of the images from the LSUN dataset.
Table 2. The characteristics of the images from the LSUN dataset.
CategoryNumber (Thousand)Image FormatImage Size
Airplane1500.jpg256 × 256
Bridge820.jpg256 × 256
Bus690.jpg256 × 256
Car1000.jpg256 × 256
Church1260.jpg256 × 256
Cat1000.jpg256 × 256
Conference610.jpg256 × 256
Cow630.jpg256 × 256
Class600.jpg256 × 256
Dining-room650.jpg256 × 256
horse1000.jpg256 × 256
human1000.jpg256 × 256
Kitchen1100.jpg256 × 256
Living-room1310.jpg256 × 256
motorbike1190.jpg256 × 256
Plant1100.jpg256 × 256
Restaurant620.jpg256 × 256
Sheep600.jpg256 × 256
Tower700.jpg256 × 256
Train1150.jpg256 × 256
Table 3. The comparison of the Clean StyleGAN3 and the Poison StyleGAN3 on 3 key parameters (KID, EQ_T, and FID ), using the LSUN dataset.
Table 3. The comparison of the Clean StyleGAN3 and the Poison StyleGAN3 on 3 key parameters (KID, EQ_T, and FID ), using the LSUN dataset.
DatasetLSUN
MetricsKIDEQ_TFID
Clean StyleGAN30.009354.3518.91
Poison StyleGAN30.005952.7818.70
Table 4. The comparison of BA on 4 different types of image classification models (Resnet18, Resnet34, Resnet50, and Vgg16).
Table 4. The comparison of BA on 4 different types of image classification models (Resnet18, Resnet34, Resnet50, and Vgg16).
MetricBA
ModelResnet18Resnet34Resnet50Vgg16
Clean StyleGAN397.2298.2599.3898.89
Poison StyleGAN397.3798.4799.4998.36
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Z.; Zhang, J.; Wang, W.; Li, H. Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models. Appl. Sci. 2024, 14, 8742. https://doi.org/10.3390/app14198742

AMA Style

Yang Z, Zhang J, Wang W, Li H. Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models. Applied Sciences. 2024; 14(19):8742. https://doi.org/10.3390/app14198742

Chicago/Turabian Style

Yang, Ziying, Jie Zhang, Wei Wang, and Huan Li. 2024. "Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models" Applied Sciences 14, no. 19: 8742. https://doi.org/10.3390/app14198742

APA Style

Yang, Z., Zhang, J., Wang, W., & Li, H. (2024). Invisible Threats in the Data: A Study on Data Poisoning Attacks in Deep Generative Models. Applied Sciences, 14(19), 8742. https://doi.org/10.3390/app14198742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop