Adversarial ML Survey Paper

1
Towards Privacy and Security of Deep Learning

Systems: A Survey
Yingzhe He1,2 , Guozhu Meng1,2 , Kai Chen1,2 , Xingbo Hu1,2 , Jinwen He1,2
1 Institute
of Information Engineering, Chinese Academy of Sciences, China
2 School of Cybersecurity, University of Chinese Academy of Sciences
Abstract—Deep learning has gained tremendous success and great popularity in the past few years. However, recent research found
that it is suffering several inherent weaknesses, which can threaten the security and privacy of the stackholders. Deep learning’s wide
arXiv:1911.12562v1 [cs.CR] 28 Nov 2019
use further magnifies the caused consequences. To this end, lots of research has been conducted with the purpose of exhaustively
identifying intrinsic weaknesses and subsequently proposing feasible mitigation. Yet few is clear about how these weaknesses are
incurred and how effective are these attack approaches in assaulting deep learning. In order to unveil the security weaknesses and aid
in the development of a robust deep learning system, we are devoted to undertaking a comprehensive investigation on attacks towards
deep learning, and extensively evaluating these attacks in multiple views. In particular, we focus on four types of attacks associated
with security and privacy of deep learning: model extraction attack, model inversion attack, poisoning attack and adversarial attack. For
each type of attack, we construct its essential workflow as well as adversary capabilities and attack goals. Many pivot metrics are
devised for evaluating the attack approaches, by which we perform a quantitative and qualitative analysis. From the analysis, we have
identified significant and indispensable factors in an attack vector, e.g., how to reduce queries to target models, what distance used for
measuring perturbation. We spot light on 17 findings covering these approaches’ merits and demerits, success probability, deployment
complexity and prospects. Moreover, we discuss other potential security weaknesses and possible mitigation which can inspire relevant
researchers in this area.
Index Terms—deep learning, poisoning attack, adversarial attack, model extraction attack, model inversion attack
1 I NTRODUCTION
D EEP learning has gained a tremendous success and

becomes the most significant driving force for artificial
intelligence (AI). It fuels multiple areas including image
are exploring and studying the potential attacks as well
as corresponding defense techniques against deep learning
systems. Szegedy et al. [154] pioneer in exploring the stabil-
classification, speech recognition, natural language process- ity of neural networks, and uncover their fragile properties
ing, and malware detection. Due to the great advances in in front of imperceptible perturbations. Since then, adversar-
computing power and the dramatic increase in data volume, ial attack has swiftly grown into a buzzing term in both
deep learning has exhibited superior potential in these sce- artificial intelligence and security. Many efforts have been
narios, comparing to traditional techniques. Deep learning dedicated to disclosing the vulnerabilities in varying deep
excels in feature learning, deepening the understanding learning models (e.g., CNN [129] [114] [113], LSTM [54] [39]
of one object, and unparalleled prediction ability. In im- [131], reinforcement learning (RL) [74], generative adver-
age recognition, convolutional neural networks (CNNs) can sarial network (GAN) [91] [138]), and meanwhile testing
classify different unknown images for us, and some even the safety and robustness for deep learning systems [90]
perform better than humans. In natural language process- [106] [124] [153] [62] [177]. On the other hand, the wide
ing, recurrent neural networks (RNNs) or long-short-term commercial deployment of deep learning systems raises the
memory networks (LSTMs) can help us translate and sum- interest of proprietary asset protection such as the training
marize text information. Other fields including autonomous data [117] [134] [186] [11] and model parameters [84] [96]
driving, speech recognition, and malware detection all have [72] [88]. It has started a war where privacy hunters exert
widespread application of deep learning. Internet of things corporate espionage to collect privacy from the rivals and
(IoT) and intelligent home system have also arisen in recent the corresponding defenders conduct extensive measures to
years. As such, we are stepping into the era of intelligence. counteract the attacks.
However, deep learning-based intelligent systems Prior works have been conducted to survey security and
around us are suffering from a number of security prob- privacy issues in machine learning and deep learning [14]
lems. Machine learning models could be stolen through [25] [129] [22]. They enumerate and analyze attacks as well
APIs [160]. Intelligent voice systems may execute unex- as defenses that are relevant to both training phase and
pected commands [182]. 3D-printing objects could fool real- prediction phase. However, these works mainly evaluate
world image classifiers [20]. Moreover, to ensure safety, tech- the attacks either in limited domains (e.g., computer vision)
nologies such as autonomous driving need lots of security or perspectives (e.g., adversarial attack). Few studies can
testing before it can be widely used [157] [185]. In the past provide a systematical evaluation of these attacks in their
few years, the security of deep learning has drawn the at- entire life cycles, which include the general workflow, ad-
tention of many relevant researchers and practitioners. They versary model, and comprehensive comparisons between
2
In this study, we first introduce the background of deep

learning, and summarize relevant risks and commercial
deep learning systems deployed in the cloud for public. For
each type of attacks, we systematically study its capabili-
ties, workflow and attack targets. More specifically, if one
attacker is confronting a commercial deep learning system,
what action it can perform in order to achieve the target.
How the system is subverted step by step in the investigated
approaches, and what influences the attack will make to
both users and the system owner. In addition, we develop
a number of metrics to evaluate these approaches such as
reducing query strategies, precision of recovered training data,
and distance with perturbed images. Based on a quantitative
and qualitative analysis, we conclude many insights cover-
ing the popularity of specific attack techniques, merits and
demerits of these approaches, future trend and so forth.
Fig. 1: Publications of security and privacy in deep learning Takeaways. According to our investigation, we have drawn
a number of insightful findings for future research. In par-
ticular, we find that in a black-box setting, attackers have
different approaches. This knowledge can well demystify to interact by querying certain inputs from target deep
how these attacks happen, what capabilities the attackers learning systems. How to reduce the number of queries for
possess, and both salient and tiny differences in attack ef- avoiding the awareness of security detectors is the signif-
fects. This motivates us to explore a variety of characteristics icant consideration for attackers. But there is few research
for the attacks against deep learning. In particular, we aim to on query reduction to date. It is doomed to be a crowded
dissect attacks in a stepwise manner (i.e., how the attacks are research area in the near future considering commercial
carried on progressively), identify the diverse capabilities deep learning systems have been equipped with more pro-
of attackers, evaluate these attacks in terms of deliberate tection techniques for prohibitive queries (cf. Section 4).
metrics, and distill insights for future research. This study Substitute model is commonly seen across different attacks
is deemed to benefit the community threefold: 1) it presents against deep learning systems, and becomes a prerequisite
a fine-grained description of attack vectors for defenders for attacks. It behaves similarly with the target model, and
from which they can undertake cost-effective measures to exhibits approximating properties. Due to the transferabil-
enhance the security of the target model. 2) the evaluation ity, the successful attacks against the substitute model are
on these attacks can unveil some significant properties such likely effective in the target model. As a sequence, model
as success rate, capabilities. 3) the insights concluded from extraction, model inversion and adversarial attack can all
the survey can inspire researchers on new solutions. benefit from the substitute model. Additionally, it converts
Our Approach. To gain a comprehensive understanding of the black-box problem to a white-box one, which lowers the
privacy and security issues in deep learning, we conduct difficulties of attacks (cf. Section 4). Because of uncertainty
extensive investigations on the relevant literature and sys- of training data, data synthesis is a common practice to
tems. In total, 137 publications have been studied which represent similar training data. Either generated by fol-
are spanning across four prevailing areas–image classifica- lowing the distribution or generative adversarial network,
tion, speech recognition, neural language processing and synthesized data can provide sufficient samples for training
malware detection. Since the all-encompassing survey is a substitute model (cf. Section 5). A more advanced way for
nearly impossible, we instead select the more representative poisoning purpose is to implant a backdoor in data and then
research in the past five years. Overall, we summarize attackers can manipulate the predictions results with crafted
these attacks into four classes: model extraction attack, model input (cf. Section 6). However, this technique is still far away
inversion attack, data poisoning attack, and adversarial attack. from the mature and remains a promising area. Most of
In particular, model extraction and inversion attacks are adversarial attacks have put their main efforts on addressing
targeting the privacy (cf. Section 4,5), and data poisoning optimization objectives, i.e., maximizing prediction errors
and adversarial attacks can influence prediction results by but minimizing “distance” with the original input (cf. Sec-
either downgrading the formation of deep learning models tion 7). A few studies also explore their practicality and ef-
or creating imperceptible perturbations that can deceive the fectiveness in the physical space. In addition, the “distance”
model (cf. Section 6,7). Figure 1 shows the publications on with the original input are measured in varying fashions
these attacks in the past five years. The number of related and still need to be improved for better estimation and new
publications is experiencing a drastic increase in the past applications. Moreover, we have discussed more security
five years, where it gains 100% increase in 2017 and 61.5% issues for modern deep learning systems in Section 8, such
increase in 2018. Adversarial attack is obviously the most as ethical considerations and system security. Challenges of
intriguing research and occupies around 46% of researchers’ physical attacks are also presented in this paper. We have
attention. It is also worth mentioning that there is an ever- investigated some works on deep learning defenses and
increasing interest in model inversion attack recently, which summarized them in terms of attacks.
is largely credited to the laborious processing of training Contribution. We make the following contributions.
data (More discussions can be found in Section 8). • Systematic security analysis of deep learning. We sum-
3
Fig. 2: Deep learning systems and the encountered attacks
marize 4 types of attacks. For each attack, we construct TABLE 1: Notations used in this paper
their attack vectors and pivot properties, i.e., workflow,
Notation Explanation
adversary model and attack goal. This could ease the
D dataset
understanding of how these attacks are executed and
x = {x1 , . . . , xn } inputs in D
facilitate the development of counter measures. y = {y 1 , . . . , y n } predicted labels of x
• Quantitative and qualitative analysis. We develop a yt = {yt1 , . . . , ytn } true labels of x
||x − y||2 the Euclidean distance for x and y
number of metrics that are pertinent to each type of F model function
attacks, for a better assessment of different approaches. Z output of second-to-last layer
These metrics also serve as highlights in the development L loss function
w weights of parameters
of attack approaches that facilitate more robust attacks. b bias of parameters
• New findings. Based on the analysis, we have concluded λ hyperparameters
Lp distance measurement
17 findings that span in the four attacks, and uncover δ perturbation to input x
implicit properties for these attack methods. Beyond these
attacks, we have identified other related security problems
such as secure implementation, interpretability, discrimi- addition, their study focuses on privacy with no mention of
nation and defense techniques, which are promising re- other attack types.
search topics in future. Liu et al. [103] aim to provide a comprehensive liter-
ature review in two phases of machine learning, i.e., the
training phase and the testing/inferring phase. As for the
2 R ELATED W ORK corresponding defenses, they sum up with four categories.
There is a line of works that survey and evaluate attacks In addition, this survey focuses more on data distribution
toward machine learning or deep learning. drifting caused by adversarial samples and sensitive infor-
Barreno et al. conduct a survey of machine learning mation violation problems in statistical machine learning
security and present a taxonomy of attacks against machine algorithms.
learning systems [25]. They experiment on a popular statis- Akhtar et al. [14] first conduct a comprehensive study
tical spam filter to illustrate their effectiveness. Attacks are on adversarial attacks on deep learning in computer vision.
dissected in terms of three dimensions, including workable They summarize 12 attack methods for classification, and
manners, influence to input and generality. Amodei et al. study attacks on models or algorithms such as autoen-
[18] introduce five possible research problems related to ac- coders, generative model, RNNs and so on. They also study
cident risk and discuss probable approaches, with an exam- attacks in the real world and summarize defenses. However,
ple of cleaning robot, according to how it works. Papernot et they only study the computer vision part of adversarial
al. [130] study the security and privacy of machine learning attack.
systematically. They summarize some attack and defense
methods, and propose a threat model for machine learn- 3 OVERVIEW
ing. It introduces attack methods in training and inferring
process, black-box and white-box model. However, methods 3.1 Deep Learning System
they summarized in each attack are not comprehensive Deep learning is inspired by biological nervous systems and
enough. Besides, they don’t involve much about defenses composed of thousands of neurons to transfer information.
or the most widely used deep learning models. Figure 2 demonstrates a classic deep learning model. Typi-
Bae et al. [22] review the attack and defense methods cally, it exhibits to the public an overall process including: 1)
under security and privacy AI concept. They inspect eva- Model Training, where it converts a large volume of data into
sion and poisoning attacks, in black-box and white-box. In a data model, and 2) Model Prediction, where the model can
4
be used for prediction as per input data. Prediction tasks are or decision boundaries [128], which are collectively known
widely used in different fields. For instance, image classifi- as model extraction attack (cf. Section 4).
cation, speech recognition, natural language processing and 3 Inputs and results of predictions. As for prediction data
malware detection are all pertinent applications for deep and results, curious service providers may retain user’s
learning. prediction data and results to extract sensitive information.
To formalize the process of deep learning systems, we These data may also be attacked by miscreants who intend
present some notations in Table 1. Given a learning task, the to utilize these data to make their own profits. On the
training data can be represented as x = {x1 , x2 , . . . , xn } ∈ other hand, attackers may submit carefully modified input
D. Let F be the deep learning model and it computes the to fool models, which is dubbed adversarial example [154].
corresponding outcomes y based on the given input x, i.e., An adversarial example is crafted by inserting slight pertur-
y = F (x). yt is the true label of input x. Within the course bations into the original normal sample which are not easy
of model training, there is a loss function L to measure the to perceive. This is recognized as adversarial attack or evasion
predication error between predicted result and true label, attack (cf. Section 7).
and the training process intends to gain a minimal error
value via fine-tuning parameters. The loss function can be
3.3 Commercial Off-The-Shelf
computed as L = Σ16i6n ||yti − y i ||2 . So the process of
model training can be formalized as [136]: Machine learning as a Service (MLaaS) has gained the mo-
mentum in recent years [99], and lets its clients benefit from
X machine learning without establishing their own predictive
arg min ||yti − y i ||2 (1) models. To ease the usage, the MLaaS suppliers make a
F 16i6n number of APIs for clients to accomplish machine learning
tasks, e.g., classifying an image, recognizing a slice of audio
3.2 Risks in Deep Learning or identifying the intent of a passage. Certainly, these ser-
One deep learning system involves several pivotal assets vices are the core competence which also charge clients for
that are confidential and significant for the owner. As their queries. Table 2 shows representative COTS as well
per the phases in Figure 2, risks stem from three types as their functionalities, outputs to the clients, and usage
of concerned assets in deep learning systems: 1) training charges. Taking Amazon Image Recognition for example, it
dataset. 2) trained model including structure, algorithms can recognize the person in a profile photo and tell his/her
and parameters. 3) inputs and results of predictions. gender, age range, emotions. Amazon charges this service
1 Training dataset. High-quality training data is significant with 1,300 USD per one million queries.
and vital for a better performance of the deep learning
model. As a deep learning system has to absorb plenty 3.4 Dataset
of data to form a qualified model, mislabelled or inferior
Here we present common datasets used in our paper. In im-
data can hinder this formation and affect the model’s qual-
age field, there are MNIST [95], CIFAR-10 [93], ImageNet [2],
ity. These kinds of data can be intentionally appended to
GTSRB [5], GSS [4], IJB-A [7] and so on. In text field, reviews
the benign by attackers, which is referred to as poisoning
from IMDB [8] are usually used. In speech field, corpora
attack (cf. Section 6). On the other hand, the collection of
such as Mozilla Common Voice [10] are used. In malware
training data takes lots of human resources and time costs.
field, datasets include DREBIN [1], Microsoft Kaggle [9], and
Industry giants such as Google have far more data than
millions of files or programs they found.
other companies. They are more inclined to share their state-
of-the-art algorithms [83] [46], but they barely share data.
Therefore, training data is crucial and considerably valuable 4 M ODEL E XTRACTION ATTACK : YOUR M ODEL IS
for a company, and its leakage means big loss of assets. M INE
However, recent research found there is an inverse flow
from prediction results to training data [161]. It leads that 4.1 Introduction
one attacker can infer out the confidential information in Model extraction attack attempts to duplicate a machine
training data, merely relying on authorized access to the learning model through the provided APIs, without prior
victim system. It is literally noted as model inversion attack knowledge of training data and algorithms [160]. To for-
whose goal is to uncover the composition of the training malize, given a specifically selected input X , one attacker
data or its specific properties (cf. Section 5). queries the target model F and obtains the corresponding
2 Trained model. The trained model is a kind of data prediction results Y . Then the attacker can infer or even
model, which is an abstract representation of its training extract the entire in-use model F . With regard to an arti-
data. Modern deep learning systems have to cope with ficial neural network y = wx + b, model extraction attack
a large volume of data in the training phase, which has can somehow approximate the values of w and b. Model
a rigorous demand for high performance computing and extraction attacks cannot only destroy the confidentiality
mass storage. Therefore, the trained model is regarded as the of model, thus damaging the interests of its owners, but
core competitiveness for a deep learning system, endowed also construct a near-equivalent white-box model for further
with commercial value and creative achievements. Once it attacks such as adversarial attack [128].
is cloned, leaked or extracted, the interests of model owners Adversary Model. This attack is mostly carried out under
will be seriously damaged. More specifically, attackers have a black-box model and attackers only have access to pre-
started to steal model parameters [160], functionality [122] diction APIs. Their capabilities are limited in three ways:
5
TABLE 2: Commercial MLaaS systems and the provided functionalities, output for clients and charges per 1M queries
System Functionality Output Cost/M-times

Image marking label, confidence 2500 CNY
Alibaba Image Recognition scene recognition label, confidence 1500 CNY
porn identification label, suggestion 1620 CNY
Object & Scene Recognition label, boundingbox, confidence 1300 USD
Amazon Image Recognition
face recognition AgeRange, boundingbox, emotions, eyeglasses, gender, pose, etc 1300 USD
Google Vision API label description description, score 1500 USD
training meta- and substitute model. It can restore precise

parameters but is only suitable for small scale models.
Due to the increase of model size, it is common to train a
substitute model to simulate the original model’s decision
boundaries or classification functionalities. However, pre-
cise parameters seem less important. Metamodel [119] is an
inverse training with substitute model, as it takes the query
outputs as input and predicts the query inputs as well as
model attributes. Besides, it can be also used to explore more
informative inputs that help infer more internal information
Fig. 3: Workflow of model extraction attack of model.
4.3 Extracted Information

model knowledge, dataset access, and query frequency. In
4.3.1 Parameters & Hyperparameter
particular, attackers have no idea about model architectures,
hyperparameters, training process of the victim’s model. Parameters are weight values (w) of layer to layer, and bias
They cannot obtain natural data with the same distribution values (b) of each layer. Hyperparameters refer to param-
of the target’s training data. In addition, attackers may be eters during training, including dropout rate, learning rate,
blocked by the target if submitting queries too frequently. mini-batch size, parameters in objective functions to balance
Workflow. Figure 3 shows a typical workflow of this attack. loss function and regularization terms, and so on. In the
First, attackers submit inputs to the target model and get early work, Tramèr et al. [160] tried equation solving to
prediction values. Then they use input-output pairs and recover parameters in machine learning models, such as
different approaches to extract the confidential data. More logistic regression, SVM, and MLP. They built equations
specifically, confidential data includes parameters [160], hy- about the model by querying APIs, and obtained parameters
perparameters [165], architectures [119], decision bound- by solving equations. However, it needs plenty of queries
aries [128] [84], and functionality [122] [45] of the model. and is not applicable to DNN. Wang et al. [165] tried to
steal hyperparameter-λ on the premise of known model
algorithm and training data. λ is used to balance loss
4.2 Approaches functions and regularization terms. They assumed that the
There are basically three types of approaches to extract gradient of the objective function is ~0 and thus got many
models: linear equations through many queries. They estimated the
• Equation Solving (ES). For a classification model com- hyperparameters through linear least square method.
puting class probabilities as a continuous function, it can
be denoted as F (x) = σ(w · x + b) [160]. Hence, given 4.3.2 Architectures
sufficient samples (x, F (x)), attackers can recover the Architectures include that how many layers in the model,
parameters (e.g., w, b) by solving the equation w · x + b = how many neurons in each layer, how are they connected,
σ −1 (F (x)). what activation functions are used, and so on. Recent papers
• Training Metamodel (MM). Metamodel is a classifier usually train classifiers to predict attributes. Joon et al. [119]
for classification models [119]. By querying a classifica- trained Metamodel, a supervised classifier of classifiers,
tion model on the ouputs Y for certain inputs X , at- to steal model attributes (architecture, operation time, and
tackers train a meta-model F m , mapping Y to X , i.e., training data size). They submitted query inputs via APIs,
X = F m (Y ). The trained model can further predict and took corresponding outputs as inputs of metamodel,
model attributes from the query outputs Y . then trained metamodel to predict model attributes as out-
• Training Substitute Model (SM). Substitute model is puts.
a simulative model mimicing behaviors of the original
model. With sufficient querying inputs X and corre- 4.3.3 Decision Boundaries
sponding outputs Y , attackers train the model F s where Decision boundaries are the classification boundary be-
Y = F s (X). As a result, the attributes of the substitute tween any two classes. In [128] [84] [127], they steal decision
model can be near-equivalent to those of the original. boundaries and generate transferable adversarial samples
Stealing different information corresponds to different to attack black model. Papernot et al. [128] used Jacobian-
methods. In terms of time, equation solving is earlier than based Dataset Augmentation (JbDA) to produce synthetic
6
TABLE 3: Evaluation on model extraction attacks as per stolen information.
Recovery Rate (%) for Models

Information Paper Approach Reducing Query
SVM DT LR kNN CNN DNN
Parameter Tramer et al. [160] ES - 99 99 99 - - 99
Hyper-par Wang et al. [165] ES - 99 - 99 - - -
Arch. Joon et al. [119] MM KENNEN-IO - - - - - 88
Papernot et al. [128] SM reservoir sampling [163] - - - - - 84
Decision. Papernot et al. [127] SM reservoir sampling [163] 83 61 89 85 - 89
PRADA [84] SM - - - - - - 67
Silva et al. [45] SM - - - - - 98 -
Func.
Orekondy et al. [122] SM random, adaptive sampling - - - - 98 -
samples, which moved to the nearest boundary between Equation solving is deemed as an efficient way to recover
current class and all other classes. This technology aims parameters [160] or hyperparameters [165] in linear algo-
not to maximize the accuracy of substitute models, but en- rithms, since it has an upper bound for sufficient queries.
sures that samples arrive at decision boundaries with small As claimed in [160], d-dimensional weights can be cracked
queries. Juuti et al. [84] extended JbDA to Jb-topk, where with only d + 1 queries. However, this approach is hardly
samples move to the nearest k boundaries between current applicable to the non-linear deep learning algorithms. So
class and any other class. They produced transferable tar- researchers turn to the compelling training-based approach.
geted adversarial samples rather than untargeted [128]. In For instance, [119] trains a classifier, dubbed as metamodel,
terms of model knowledge, Papernot et al. [127] found that over the target model so as to predict architectural infor-
model architecture knowledge was unnecessary because a mation which is categorical or limited real values. This
simple model could be extracted by more complex model, approach cannot cope with complex model attributes such
such as DNN. as decision boundary and functionality. That drives the
prevalence of substitute model as it serves as an incarna-
4.3.4 Functionalities tion of the target model which behaves quite similarly. As
Similar functionalities refer to replicating the original model such, the substitute model has approximated attributes and
as much as possible on classification results. The primary prediction results. Additionally, it can be further used to
goal is to construct a predictive model that have closest steal model’s training data [84] and generating adversarial
input-output pairs with the original. In [122] [45], they try to examples [127].
improve classification accuracy of substitute model. Silva et
Finding 2. To learn a substitute model of deep learning models de-
al. [45] used problem domain dataset, non-problem domain
mands more queries than to infer parameters or hyperparmaeters
dataset, and their mixture to train a model respectively. They
in simple machine learning models.
found model trained with non-problem domain dataset also
did well in accuracy. Besides, Orekondy et al. [122] assumed To be specific, attackers requires thousands of queries
attackers had no semantic knowledge over model outputs. on machine learning models, but have to query over 11,000
They chose very large datasets and selected suitable samples queries for stealing parameters in a simple neural net-
one by one to query the black-box model. Reinforcement work [160]. Deep learning models are more challenging
learning approach was introduced to improve query effi- because they are highly nonlinear, non-convex, and maybe
ciency and reduce query counts. over-fitting. Additionally, the parameters will be drastically
increased along with the increment of layers and neurons.
4.4 Analysis Finding 3. Reducing queries, which can save monetary costs for
Model extraction attack is an emerging field of attack. In a pay-per-query MLaaS commercial model and also be resistant
this study, we totally survey 8 related papers and clas- to attack detection, has become an intriguing research direction in
sify them by target information as shown in Table 3. We recent years.
sort them by the stolen information and evaluate them on
The requirement of query reduction arises due to the
multiple aspects including employed approaches, strategies
high expense of queries and query amount limitation. In our
for reducing queries, recovery rate for applicable models.
investigated papers, [119] trains a metamodel–KENNEN-
Recovery rate means how many percent of information
IO for optimizing the query inputs. [128], leverage reservoir
can be stolen, and is computed by differing the inferred
sampling to select representative samples for querying, and
data with that of the original model. However, the attacks
[122] proposes two sampling strategies, i.e., random and
on boundary decision cannot be directly measured in this
adaptive to reduce queries. Moreover, active learning [97],
way. Thus, we use the misclassification rate of generated
natural evolutionary strategies [78], optimization-based ap-
adversarial examples as an alternative, since it reveals the
proaches [44] [140] have been adopted for query reduction.
similarity between the simulative model and the original
model to some extent. Based on the statistics, we draw the Finding 4. Model extraction attack is evolving from a puzzle
following conclusions. solving game to a simulation game with cost-profit tradeoffs.
Finding 1. Training substitute model is by no doubt the dominant MLaaS magnates like Amazon and Google have a
method in model extraction attacks with manifold advantages. tremendous scale of networks running behind services. To
7
corresponding outputs as well as class probabilities and

confidence values.
Workflow. Figure 4 shows a workflow of model inversion
attack which is suitable for both MIA and PIA. Here we take
MIA as an example. MIA can be accomplished in varying
ways: by querying the target model to get input-output
pairs, attackers can merely exercise Step 4 with heuristic
methods to determine the membership of a record [143]
[104] [52] [69] [102] (Approach 1); Alternatively, attackers
can train an attack model for determination, which neces-
sitates an attack model training process (Step 3). Attack
model’s training data is obtained by query inputs and
response [137] [19] (Approach 2); Due to the limitation
of queries and model attributes, some studies introduce
shadow models to provide training data for the attack
Fig. 4: Workflow of model inversion attack model [147] [143], which necessitates shadow model train-
ing (Step 2). Moreover, data synthesis (Step 1) is proposed
to provide more training data for a sufficient training (Ap-
infer how many layers or neurons in the neural networks proach 3).
become impossible and unaffordably costly. Therefore, it
makes a remarkable dent in attackers’ interest of solving
model attributes. On the other hand, inferring decision 5.2 Membership Inference Attack
boundary and model functionality emerge as new circum- Truex et al. [161] presented a generally systematic formu-
vention. Treating the target model as a black box, attackers lation of MIA. Given the instance x and black-box access
observes the response by feeding it with crafted inputs, to the classification model Ft trained on the dataset D, can
and finally construct a close approximation. Although the an adversary infer whether the instance x is included in D
substitute model is likely simpler and underperforms in when training Ft with a high degree of confidence?
some cases, its prediction capabilities still make consider- Most of MIAs proceed in accordance with the workflow
able profits for attackers. in Figure 4. More specifically, to infer whether one data
item or property exists in the training set, the attacker may
prepare the initial data and make transformations to the
5 M ODEL I NVERSION ATTACK : YOUR M ODEL R E - data. Subsequently, it devises a number of principles for
VEALS YOUR I NFORMATION determining the correction of its guessing. We details these
components as follows.
5.1 Introduction
In a typical model training process, lots of information 5.2.1 Data Synthesis
is extracted and abstracted from the training data to the Initial data has to be collected as prerequisites for deter-
product model. However, there also exists one inverse in- mining the membership. According to our investigation,
formation flow which allows attackers infer the training an approximated set of training data is desired to imply
data from the model since neural networks may remember membership. This set can be obtained either by:
too much information of the training data [148]. Model • Generate samples manually. This method needs some prior
inversion attack is just to leverage this information flow knowledge to generate data. For instance, Shokri [147]
and restore the feed data or data properties such as faces produced datasets similar to the target training dataset
in face recognition systems through model prediction or its and used the same MLaaS to train several shadow models.
confidence coefficient. These datasets were produced by model-based synthe-
Additionally, model inversion attack can be further re- sis, statistics-based synthesis, noisy real data and other
fined into membership inference attack (MIA) and property methods. If the attacker has access to part of dataset.
inference attack (PIA). In MIA, the attacker can determine Then he can generate noisy real data by flipping a few
whether a specific record is included or not in the training randomly selected features on real data. These data make
data. In PIA, the attacker can speculate whether there is a up the noisy dataset. If the attacker has some statistical
certain statistical property in the training dataset. information about dataset, such as marginal distributions
Adversary Model. Model inversion attack can be executed of different features. Then he can generate statistics-based
in both black-box or white-box settings. In a white-box synthesis using this knowledge. If the attacker has no
attack, the parameters and architecture of the target model knowledge above, he can also generate model-based syn-
are known by attackers. Hence, they can easily obtain a sub- thesis by searching for possible data records. The records
stitute model that behaves similarly, even without querying that search algorithm needs to find are correctly classified
the model. In a black-box attack, attacker’s capabilities are by target model with high confidence.
limited in model architectures, statistics and distribution In [143], they proposed a data transferring attack without
of training data and so on. Attackers cannot obtain com- any query to target model. They chose different datasets
plete training set information. However, in either setting, to train the shadow model. The shadow model was used
attackers can make queries with specific inputs and get to capture membership status of data points in datasets.
8
• Generate samples by model. This method aims to produce [143] or the target model [137] [111]. The training process
training records by training generated models such as is to select some records from both inside and outside the
GAN. Generated samples are similar to that from the substituted dataset, and obtain the class probability vector
target training dataset. Improving the similarity ratio will through target model or shadow model. The vector and
make this method more useful. the label of record are taken as input, and whether this
Both [102] and [67] attacked generated models. Liu et record belongs to substituted dataset is taken as output.
al. [102] presented a new white-box method for single Then training and learning attack model.
membership attacks and co-membership attacks. The ba-
sic idea was to train a generated model with the target 5.2.4 Membership Determination
model, which took the output of the target model as input, Given one input, this component is responsible for de-
and took the similar input of the target model as output. termining whether the query input is a member of the
After training, the attack model could generate data that training set of the target system. To accomplish the goal,
is similar to the target training dataset. Considering about the contemporary approaches can be categorized into three
the difficult implementation of CNN in [147], Hitaj et al. classes:
[69] proposed a more general MIA method. They per- • Attack model-based Method. In inference phase, attackers
formed a white-box attack in the scenario of collaborative first put record to be judged into the target model, and
deep learning models. They constructed a generator for get its class probability vector, then put the vector and
target classification model, and used them to form a GAN. label of record into the attack model, and get the mem-
After training, GAN could generate data similar to the bership of this record. Pyrgelis et al. [137] implemented
target training set. However, this method was limited in MIA for aggregating location data. The main idea was to
that all samples belonging to the same classification need use priori position information and attack through distin-
to be visually similar, and it could not generate an actual guishability game process with a distinguishing function.
target training pattern or distinguish them under the same They trained a classifier (attack model) as distinguishing
class. function to determine whether data is in target dataset.
• Heuristic Method. This method uses prediction probability,
5.2.2 Shadow Model Training instead of an attack model, to determine the membership.
Attackers have sometimes to transform the initial data for Intuitively, the maximum value in class probabilities of
further determination. In particular, shadow model is pro- a record in the target dataset is usually greater than the
posed to imitate target model’s behavior by training on record not in it. But they require some preconditions
a similar dataset [147]. The dataset takes records by data and auxiliary information to obtain reliable probability
synthesis as inputs, and their labels as outputs. Shadow vectors or binary results, which is a limitation to apply
model is trained on such dataset. It can provide class prob- to more general scenarios. How to lower attack cost
ability vector and classification result of a record. Shokri and reduce auxiliary information can be considered in
et al. [147] designed, implemented and evaluated the first the future study. Fredrikson et al. [52] tried to construct
MIA attack method for a black-box model by API calls in the probability of whether a certain data appears in the
machine learning. They produced datasets similar to the target training dataset, according to the probability and
target training dataset and used the same MLaaS to train auxiliary information, such as error statistics or marginal
several shadow models. These datasets were produced by priors of training data. Then they searched for input data
model-based synthesis, statistics-based synthesis, noisy real which maximized the probability, and the obtained data
data and other methods. Shadow models were used to was similar to data in target training dataset. The third
provide training set (class labels, prediction probabilities attack method in Salem et al. [143] only required the
and whether data record belongs to shadow training set) for probability vector of outputs from the target model, and
the attack model. Salem et al. [143] relaxed the constraints in used statistical measurement method to compare whether
[147] (need to train shadow models on the same MLaaS, and the maximum classification probability exceeds a certain
the same distribution between datasets of shadow models value.
and target model), and used only one shadow model with- Long et al. [104] put forward Generalized MIA method,
out the knowledge of target model structure and training which was easier to attack non-overfitted data, different
dataset distribution. Here, the shadow model just tried to from [147]. They trained a number of reference models
capture the membership status of records in a different similar to the target model, and chose vulnerable data
dataset. according to the output of reference models before Soft-
max, then compared outputs between the target model
5.2.3 Attack Model Training and reference models to calculate the probability of data
The attack model is a binary classifier. Its input is the belonging to target training dataset. Reference models
class probabilities and label of the record to be judged, and in this paper were used to mimic the target model, like
output is yes (means the record belongs to the dataset of shadow models. But they did not need an attack model.
target model) or no. Training dataset is usually required Hayes et al. [67] proposed a method of attacking generated
to train the attack model. The problem is that the output models. The idea was that attackers determined which
label of whether a record belongs to the dataset of target dataset from attackers belonged to target training set,
model cannot be obtained. So here attackers often generate according to the probability vector output by classifier.
substituted dataset by data synthesis. The input of this Higher probability was more likely from target training
training is generated either by the shadow model [147] set (they selected the upper n sizes). In white-box, the
9
classifier was constructed by that of target model. In too many times. The synthesized data, which could be gen-
black-box, they used obtained data by querying target erated either by the statistical distribution of known training
model to reproduce classifier with GAN. data, or a generative adversarial network, can effectively
sample the original data. Hence, it is employed to train
a shadow model, a substitute for the target. It avoids too
5.3 Property Inference Attack many queries to the target model and thereby lowers the
Property inference attack (PIA) mainly deduces properties perception by security mechanisms.
in the training dataset. For instance, how many people have
Finding 7. MIA is essentially a process that explicitly expresses
long hair or wear dresses in a generic gender classifier. Is
the logical relations contained in the trained model.
there enough women or minorities in dataset of common
classifiers. The approach is largely same to a membership This kind of attacks requires many datasets and much
inference attack. In this section, we only remark main differ- time, but the obtained information is really limited (only 1
ences between model inversion attacks. bit [19] [53]). So the development of model inversion attack
Data Synthesis. In PIA, training datasets are classified by is to obtain more overall information. For example, what is
including or not including a specific attribute [19]. the relationship between different training datasets. What’s
Shadow Model Training. In PIA, shadow models are more, another development is to increase the amount of the
trained by training sets with or without a certain property. In obtained information, for example, how to get details in a
[19] [53], they used several training datasets with or without single record.
a certain property, then built corresponding shadow models
to provide training data for meta-classifier. Finding 8. Research about membership inference (10/13) is more
Attack Model Training. Here, attack model is usually also than property inference (4/13).
a binary classifier. Ateniese et al. [19] proposed a white- This is because membership inference now has a more
box PIA method by training a meta-classifier. It took model general adaptation scenario, and it emerges earlier. Further-
features as input, and output whether the corresponding more, MIA can get more information than PIA in one-time
dataset contained a certain property. However, this ap- attack (just like training an attack model). A trained attack
proach did not work well on DNNs. To address this, Ganju model can be applied to many records in MIA, but only a
et al. [53] mainly studied how to extract feature values few properties in PIA. In [19], attackers want to know if their
of DNNs. The part of meta-classifier was similar to [19]. speech classifier was trained only with voices from people
Melis et al. [111] trained a binary classifier to judge dataset who speak Indian English. In [53], they try to find if some
properties in collaborative learning, which took updated classifiers have enough women or minorities in training
gradient values as input. Here the model is continuously dataset. In [33], they are interested in the global distribution
updated, so attacker could analyze updated information at of skin color. In [111], they want to know the proportion
each stage to infer properties. between black and asian people.
Finding 9. Studies about heuristic methods (6/13) and attack
5.4 Analysis model (7/13) nearly share on a fifty-fifty basis.
As shown in Table 4, we have totally surveyed 13
In heuristic methods, using probabilities is easy to im-
model/property inversion attack papers.
plement, but barely works (0.5 precision and 0.54 recall)
Finding 5. Shadow model has a number of advantages over other on MNIST dataset [143]. Obtaining similar datasets usu-
methods in model inversion attack. ally needs to train a generative model [67] [102] [69]. In
attack model methods, attackers need to train an attack
Shadow models (4/13) are used in both MIA (2/13) [147] model [137] [19]. Shadow models [147] [143] [19] are pro-
[143] and PIA (2/13) [19] [53]. It is superior than other posed to provide datasets for the attack model, but increase
methods in manifolds: 1) requiring no addition auxiliary training costs.
information [52], which is underlied by the assumption that
a higher confidence of prediction indicates the presence of
data records of a higher probability. 2) providing true in- 6 P OISONING ATTACK : C REATE A BACKDOOR IN
formation as training dataset for attack model. For a model YOUR M ODEL
F and its training dataset D, training attack model needs Poisoning attack seeks to downgrade deep learning sys-
information of label x, F (x), and whether x ∈ D. If using
tems’ predictions by polluting training data. Since it hap-
a shadow model, shadow model F and its dataset D are
pens before the training phase, the caused contamination is
known. All information is from shadow model and corre-
usually inextricable by tuning the involved parameters or
sponding dataset. If using the target model, F is the target
adopting alternative models.
model and D is the training dataset. However, attackers do
not know D. So information whether x ∈ D need to be
replaced by whether x ∈ D0 , where D0 is similar to D. 6.1 Introduction
In the early age of machine learning, poisoning attack had
Finding 6. Data synthesis is a common practice to conduct model
been proposed as a non-trivial threat to the mainstream
inversion attack, compared to direct querying.
algorithms. For instance, Bayes classifiers [118], Support
Data synthesis could generate data similar to target Vector Machine (SVM) [28] [31] [174] [173] [34], Hierarchical
dataset conveniently [147] [52] [69] [102], without querying Clustering [29], Logistic Regression [110] are all suffering
10
TABLE 4: Evaluation on model inversion attack. It presents how the “Workflow” proceeds for each work, and its “Goal”,
either mia or pia. We select one experimental “Dataset” in the works and the corresponding “Precision” achieved as well
as the target “Model”. “Knowledge” denotes the acquisitions of attackers to the model, and “Application” is the applicable
domain of the target model. “structured data” refers to any data in a fixed field within a record or file [27].
Workflow
Paper Goal Precision Dataset Model Knowledge Application
Step 1 Step 2 Step 3 Step 4
Truex et al. [161] X X MIA 61.75% MNIST DT Black image
Fredrikson et al. [52] X MIA 38.8% GSS DT Black image
Pyrgelis et al. [137] X X MIA - TFL MLP Black structured data
Shokri et al. [147] X X X X MIA 51.7% MNIST DNN Black image
Hayes et al. [67] X X MIA 58% CIFAR-10 GAN Black image
Long et al. [104] X MIA 93.36% MNIST NN Black image
Melis et al. [111] X X MIA/PIA - FaceScrub DNN White image
Liu et al. [102] X X MIA - MNIST GAN White image
Salem et al. [143] X X X X MIA 75% MNIST CNN Black image
Ateniese et al. [19] X X X X PIA 95% - SVM White speech
Buolamwini et al. [33] X PIA 79.6% IJB-A DNN Black image
Ganju et al. [53] X X X X PIA 85% MNIST NN White image
Hitaj et al. [69] X X MIA - - CNN White image
methods: mislabel original data, and craft confused data.

The poisoned data then enters into the original data and
subverts the training process, leading to greatly degraded
prediction capability or a backdoor implanted into the
model. More specifically, mislabeled data is yielded by
selecting certain records of interest and flipping their labels.
Confused data is crafted by embedding special features that
can be learnt by the model which are actually not the essence
of target objects. These special features can serve as a trigger,
Fig. 5: Workflow of poisoning attack incurring an wrong classification.
6.2 Poisoning Approach

the degradation from data poisoning. Along with the broad
use of deep learning, attackers have moved their attention 6.2.1 Mislabeled Data
to deep learning instead [79] [144] [152]. Learning model usually experiences training under labeled
Adversary Model. Attackers can implement this attack with data in advance. Attackers may get access to a dataset,
full knowledge (white-box) and limited knowledge (black- and change a correct label to wrong. Mislabeled data could
box). Knowledge mainly means the understanding of train- push decision boundary of classifier significantly to incor-
ing process, including training algorithms, model architec- rect zones, thus reducing its classification accuracy. Muñoz-
tures, and so on. Capabilities of attackers refer to controlling González et al. [115] undertook a poisoning attack towards
over the training dataset. In particular, it discriminates how multi-class problem based on back-gradient optimization.
much new poisoned data attackers can insert, and whether It calculated gradient by automatic differentiation and re-
they can alter labels in the original dataset and so on. versed the learning process to reduce attack complexity. This
Attack Goal. There are two main purposes for poisoning the attack is resultful for spam filtering, malware detection and
data. One intuitive goal is to destroy the model’s availability handwirtten digit recognition.
by deviating its decision boundary. As a result, the poisoned Xiao et al. [174] adjusted a training dataset to attack SVM
model could not well represent the correct data and prone by flipping labels of records. They proposed an optimized
to making wrong predictions. This is likely caused by mis- framework for finding the label flips which maximizes clas-
labeled data (cf. Section 6.2.1), whose labels are intentionally sification errors, and thus reducing the accuracy of classifier
tampered by attackers, e.g., one photo with a cat in it is successfully. Biggio et al. [29] used obfuscation attack to
marked as dog. The other purpose is to create a backdoor in maximally worsen clustering results, where they relied on
the target model by inserting confused data (cf. Section 6.2.2). heuristic algorithms to find the optimal attack strategy.
The model may behave normally at most of the time, Alfeld et al. [17] added optimal special records into training
but arouse wrong predictions with crafted data. With the dataset to drive predictions in a certain direction. They
pre-implanted backdoor and trigger data, one attacker can presented a framework to encode an attacker’s desires and
manipulate prediction results and launch further attacks. constraints under linear autoregressive models. Jagielski et
Workflow. Figure 5 shows a common workflow of poison- al. [79] could manipulate datasets and algorithms to influ-
ing attack. Basically, this attack is accomplished by two ence linear regression models. They also introduced a fast
11
TABLE 5: Evaluation on poisoning attack. The data denotes an attacker needs to contaminate how many percent of training
data “Poison Percent” and achieves how many “Success Rate” under specific “Dataset”. “Model” indicates the attacked
model. “Timeliness” denotes whether the poison attack is in an online or offline setting. “Damage” means how many
predictions can be impacted. Attackers may possess two different “Knowledge”, either black-box or white-box, and make
poisoned model predict as expected, i.e., “Targeted”, or not. “structured data” is the same as Table 4. “LR” is linear
regression. “OLR” is online logistic regression. “SLHC” is single-linkage hierarchical clustering.
Paper Success Rate Dataset Poison Percent Model Timeliness Damage Knowledge Targeted Application
Xiao et al. [172] 20% 11944 files 5% LASSO offline - Black No malware
Muñoz-González et al. [115] 25% MNIST 15% CNN offline 30% error Black No image, malware
Jagielski et al. [79] 75% Health care dataset 20% LASSO offline 75% error Black No structured data
Alfeld et al. [17] - - - LR offline - White Yes -
Shafahi et al. [144] 60% CIFAR-10 5% DNN offline 20% error White Yes image
Wang et al. [171] 90% MNIST 100% OLR online - White Both image
Biggio et al. [29] - MNIST 1% SLHC offline - White Yes image, malware
statistical attack which only required limited knowledge of gradually and get them compromised. That causes vary-
training process. ing difficulties for a successful attack. In particular, online
The major research focuses on an offline environment attacks have to consider more factors such as the order of
where the classifier is trained on fixed inputs. However, fed data, the evasiveness of poisonous data. It implies that
training often happens as data arrives sequentially in a more studies start with offline attacks. However, in reality,
stream, i.e., in an online setting. Wang et al. [171] conducted more and more models are trained online. Due to the drive
poisoning attacks for online learning. They formalized the of profits, it is expected that there are emerging more attacks
problem into semi-online and fully-online, with three attack against online training in the near future.
algorithms of incremental, interval and teach-and-reinforce.
Finding 11. A few (2/7) papers use confused data with the
6.2.2 Confused Data purpose of implanting a backdoor into the model.
Learning algorithms elicit representative features from a
large amount of information for learning and training. How- In terms of difficulty, making mistakes inadvertently or
ever, if attackers submit crafted data with special features, imperceptibly is more difficult than making misclassifica-
the classifier may learn fooled features. For example, mark- tion publicly for a model. A backdoor is such an impercepti-
ing figures with number “6” as a turn left sign and putting ble mistake. A model performs well under normal functions,
them into the dataset, then images with a bomb may be while it opens the door for attackers when they need it.
identified as a turn-left sign, even if it is in fact a STOP sign. In [144], the attacker adds a low-transparency watermark
Xiao et al. [172] directly investigated the robustness of into samples to allow some indivisible features overlapping.
popular feature selection algorithms under poisoning at- In the prediction phase, attacker can use this watermark to
tack. They reduced LASSO to almost random choices of open the backdoor, causing misclassification. In addition,
feature sets by inserting less than 5% poisoned training attackers may use curious characteristics to cheat model
samples. Shafahi et al. [144] found a specific test instance to because it just learns useless features [172].
control the behavior of classifier with backdoor, without any
access to data collection or labeling process. They proposed Finding 12. Poisoning attacks essentially seek for a globally or
a watermarking strategy and trained a classifier with multi- locally distributional disturbance over training data.
ple poisoned instances. Low-opacity watermark of the target
It is well-known that the performance of learning is
instance is added to poisoned instances to allow overlap of
largely dependent on the quality of training data. Quality
some indivisible features.
data is commonly acknowledged as being comprehensive,
unbiased, and representative. In the process of data poison-
6.3 Analysis ing, wrongly labeled or biased data is deliberately crafted
We investigated 7 papers on poisoning attack in total and and added into training data, degrading the overall quality.
evaluate them over 9 metrics in Table 5. Based on the
analysis, we conclude the following findings.
Finding 10. Most attacks (6/7) are under an offline setting, and 7 A DVERSARIAL ATTACK : U TILIZE THE W EAK -
only one [171] implements an online attack via online gradient
NESS OF YOUR M ODEL
descent.
In an offline setting, model owners collect the training Similar to poisoning attack, adversarial attack also makes a
data from multiple sources and train the models for once. model classify a malicious sample wrongly. Their difference
Attackers have to contaminate the data before the training. is that poisoning attack inserts malicious samples into the
However, in an online setting, the trained model can be training data, directly contaminating model, while adver-
updated periodically with newly coming training data. It sarial attack leverages adversarial examples to exploit the
allows attackers to feed poisonous data into the models weaknesses of the model and gets a wrong prediction result.
12
Z(·) is the output of second-to-last layer, usually indicates

class probability. Z(·)t is the probability of t-th class. Loss
function describes the loss of input and output. δ is the
perturbation. kδkp is the p-norm of δ . x = {x1 , x2 , ..., xn }
is the original sample, xi is the pixel or element in sample
where xi ∈ x, 1 6 i 6 n. xi is sample of the i-th iteration,
usually x0 = x.
The process of finding perturbations essentially needs to
solve the following optimization problems (the first equa-
tion is non-targeted attack, the second equation is targeted
Fig. 6: Workflow of adversarial attack
attack, T is targeted class label):
arg min kδkp , s.t. F (x + δ) 6= F (x)
7.1 Introduction δ
(3)
Adversarial attack adds unperceived perturbations to nor- arg min kδkp , s.t. F (x + δ) = T
δ
mal samples during the prediction process, and then pro-
Methods of finding perturbations can be roughly di-
duces adversarial examples (AEs). This is an exploratory
vided into calculating gradients and solving optimization
attack and violates the availability of a model. It can be
function. Szegedy et al. [154] first proposed an optimization
used in many fields, e.g., image, speech, text, and malware,
function to find AEs and solved it with L-BFGS. FGSM [58],
particularly widespread in image classification. They can
BIM [16], MI-FGSM [47] are a series of methods for find-
deceive the trained model but look nothing unusual to
ing perturbations by directly calculating gradients. Deep-
humans. That is to say, AEs need to both fool the classifier
fool [114] and NewtonFool [81] approximate the nearest
and be imperceptible to humans. For an image, the added
classification boundary by Taylor expansion. Instead of per-
perturbation is usually tuned by minimizing the distance
turbing a whole image, JSMA [129] finds a few pixels to
between the original and adversarial examples. For a piece
perturb through calculating partial derivative. C&W [38],
of speech or text, the perturbation should not change the
EAD [42], OptMargin [68] are a series of methods to find
original meaning or context. In the field of malware detec-
perturbations by optimizing the objective function.
tion, AEs need avoid being detected by models. Adversarial
L-BFGS attack. Szegedy et al. [154] tries to find δ that
attack can be classified into the targeted attack and untar-
satisfies F (x + δ) = l, so to minimize perturbation and
geted attack. The former requires adversarial examples to
be misclassified as a specific label, while the latter desires a
Loss function, using Box-constrained L-BFGS to solve this
constraint optimization problem. In Equation 4, c (> 0) is a
wrong prediction, no matter what it will be recognized as.
Workflow. Figure 6 depicts the general workflow for an hyperparameter and obviously, Loss(x, F (x)) = 0.
adversarial attack. In white-box setting, attackers could di- min c kδk2 + Loss(x + δ, l)
δ (4)
rectly calculate gradients [58] [16] [47] or solve optimization
functions [38] [42] [68] to find perturbations on original s.t. x + δ ∈ [0, 1]n
samples (Step 3). In black-box setting, attackers obtain in- FGSM attack. Goodfellow et al. [58] attacked the classifier
formation by querying the target model many times (Step based on the gradient of input. lx is the true label of x. The
1). Then they could train a substitute model to perform a direction of perturbation is determined by the computed
white-box attack [127] [128] (Step 2.1), or estimate gradients gradient using back-propagation. Each pixel goes ε size in
to search for AEs [77] (Step 2.2). gradient direction.
In addition to deceiving the classification model, AEs
should carry minimal perturbations that evade the aware- δ = ε · sign(∇x Loss(x, lx )) (5)
ness of human. Generally, the distance between normal and
BIM attack. BIM (or I-FGSM) [16] iteratively changes the
adversarial sample can be measured by Lp Distance (or
step for new inputs as Equation 6. lx is the true label of x,
Minkowski Distance), e.g., L0 , L1 , L2 and L∞ .
Clipx, {x} function performs clipping on image per-pixel.
n x0 = x
X 1
Lp (x, y) = ( |xi − y i |p ) p (6)
(2) xi+1 = Clipx, {xi + α · sign(∇x Loss(xi , lx ))}
i=1
x = {x1 , x2 , ...,xn }, y = {y 1 , y 2 , ..., y n } MI-FGSM attack. MI-FGSM [47] adds momentum based on
I-FGSM [16]. Momentum is used to escape from poor local
7.2 Approach maximum and iterations are used to stabilize optimization.
Since the main development of adversarial attack is in the In Equation 7 y is the target class to be misclassified as:
field of image classification [154] [58] [38], we will introduce gi+1
xi+1 = Clipx, {xi + α · }
more related work on image using CNN, and supplement kgi+1 k2
research on other fields or other models at the end of this (7)
∇x Loss(xi , y)
section. gi+1 = µ · gi +
k∇x Loss(xi , y)k1
7.2.1 White-box attack in image classification JSMA attack. JSMA [129] modifies a few pixels at every
First, we define that F : Rn −→ {1 . . . k} is the classifier iteration. In each iteration, shown in Equation 8, αpq repre-
of model to map image value vectors to a class label. sents the impact on target classification of pixels p, q , and
13
βpq represents the impact on all other outputs. Larger value probability, and generally 0 < ζ << 1. The purpose is to
in this map means greater possibility to fool the network. seek δ which could fool F (·) on almost any sample from µ.
They pick (p∗ , q ∗ ) to attack.
F (x + δ) 6= F (x), f or most x ∼ µ
X ∂Z(x)t
αpq = s.t. kδkp ≤ ξ (14)
∂xi
i∈{p,q} Px∼µ (F (x + δ) 6= F (x)) ≥ 1 − ζ
X X ∂Z(x)j (8)
βpq =( ) − αpq
j
∂xi 7.2.2 Black-box attack in image classification
i∈{p,q}
Finding small perturbations often requires white-box mod-
(p∗ , q ∗ ) = arg max(−αpq · βpq ) · (αpq > 0) · (βpq < 0) els to calculate gradients. However, this method does not
(p,q)
work in a black-box setting due to some constraints in-
NewtonFool attack. NewtonFool [81] uses softmax output cluding gradients. Therefore, researchers propose several
Z(x). In Equation 9, x0 is the original sample and l = F (x0 ). methods to overcome these constraints.
δi = xi+1 − xi is the perturbation at iteration i. They tried Step 2.1. Training substitute model. As mentioned in
to find small δ so that Z(x0 + δ)l ≈ 0. Starting with x0 , they Section 4, stealing decision boundaries in model extraction
approximated Z(xi )l using a linear function step by step as attack and training substitute model can facilitate black-box
follows. adversarial attacks [128] [127] [84]. Papernot et al. [128] pro-
Z(xi+1 )l ≈ Z(xi )l + ∇Z(xi )l · (xi+1 − xi ), i = 0, 1, 2, · · · posed a method based on an alternative training algorithm
(9) using synthetic data generation in black-box settings.
Training substitute model needs that AEs can transfer
C&W attack. C&W [38] tries to find small δ in L0 , L2 , from the substitute model to the target model. Gradient
and L∞ norms. Different from L-BFGS, C&W optimizes Aligned Adversarial Subspace [159] estimated previously
following goals, unknown dimensions of the input space. They found that a
large part of the subspace is shared for two different models,
min kδkp + c · f (x + δ) thus achieving transferability. Further, they determined suf-
δ (10) ficient conditions for the transferability of model-agnostic
s.t. x + δ ∈ [0, 1]n perturbations.
Step 2.2. Estimating gradients. This method needs many
c is a hyperparameter and f (·) is defined as:
queries to estimate gradients and then search for AEs.
Narodytska et al. [116] used a technique based on local
f (x + δ) = max(max{Z(x + δ)i : i 6= t} − Z(x + δ)t , −K) search to construct the numerical approximation of network
(11) gradients, and then constructed perturbations in an image.
Moreover, Ilyas et al. [77] introduced a more rigorous and
f (·) is an artificially defined function, the above is just practical black-box threat model. They applied a natural
one case. Here, f (·) 6 0 if and only if classification result evolution strategy to estimate gradients and perform black-
is adversarial targeted label t. K guarantees x + δ will be box attacks, using 2∼3 orders of magnitude less queries.
classified as t with high confidence.
EAD attack. EAD [42] combines L1 and L2 penalty func-
tions. In Equation 12, f (x + δ) is the same as C&W and t is 7.2.3 Attack in other fields
the targeted label. Obviously, C&W attack becomes a special Except for the image classification, adversarial attacks are
EAD case when β = 0 [42]. also used in other fields, such as speech recognition [57]
2 [182], text processing [54], malware detection [75] [131] [133]
min c · f (x + δ) + β kδk1 + kδk2 [92] and so on.
δ (12)
s.t. x + δ ∈ [0, 1]n In the speech field, Yuan et al. [182] embedded voice com-
mands into songs, and thereby attacked speech recognition
OptMargin attack. OptMargin [68] is an extension of C&W systems, not being detected by humans. DeepSearch [39]
L2 attack by adding many objective functions around x. In could convert any given waveform into any desired target
Equation 13, x0 is the original example. x = x0 + δ is adver- phrase through adding small perturbations on speech-to-
sarial. y is the true label of x0 . vi are perturbations applied text neural networks.
to x. OptMargin guarantees not only x fools network, but In the text processing field, DeepWordBug [54] gener-
also its neighbors x + vi . ated adversarial text sequences in black-box settings. They
2 adopted different score functions to better mutate words.
min kδk2 + c · (f1 (x) + · · · + fm (x)) They minimized edit distance between the original and
δ
s.t. x + δ ∈ [0, 1]n modified texts, and reduced text classification accuracy from
90% to 30∼60%.
fi (x) = max(Z(x + vi )y − max{Z(x + vi )j : j 6= y}, −K)
(13) In the malware field, Rigaki et al. [138] used GANs to
avoid malware detection by modifying network behavior
UAP attack. UAP [113] is universal perturbations which to imitate traffic of legitimate applications. They can adjust
suit almost all samples of a certain dataset. In Equation 14, command and control channels to simulate Facebook chat
µ is the dataset that contains all samples. P represents network traffic by modifying the source code of malware.
14
Hu et al. [70] [71] and Rosenberg et al. [141] proposed meth- distance, 18.2% use L1 distance and 18.2% use L0 distance.
ods to generate adversarial malware examples in black- Considering image classification only, 70% attacks use L2
box to attack detection models. Dujaili et al. [15] proposed distance, 45% use L∞ distance, 10% use L1 distance and
SLEIPNIR for adversarial attack on binary encoded malware 20% use L0 distance.
detection. L0 distance reflects the number of changed elements, but
it is unable to limit the variation of each element. It suits
7.2.4 Attack against other models the scenes that only care about the number of perturbation
There is furthermore research in addition to DNN, such pixels, but not variation size. L1 distance is the absolute
as generative model, reinforcement learning and some ma- values summation of every element in perturbations, equiv-
chine learning algorithms. Mei et al. [110] identified the alent to Manhattan distance in 2D space. It limits the sum
optimal training set attack for SVM, logistic regression, and of all variations, but does not limit large perturbation of
linear regression. They proved the optimal attack can be individual elements. L∞ distance does not care about how
described as a bilevel optimization problem, which can be many elements have been changed, but only cares about
solved by gradient methods. Huang et al. [74] demonstrated the maximum of perturbations, equivalent to Chebyshev
that adversarial attack policies are also effective in reinforce- distance in 2D space. L2 distance is an Euclidean distance
ment learning, such as A3C, TRPO, DQN. Kos et al. [91] that considers all pixel perturbation, which is a more bal-
attempted to produce AEs using deep generative models anced and the most widespread metric. It takes into account
such as variational autoencoder. Their methods include a both the largest perturbation and the number of changed
classifier-based attack, and an attack on latent space. elements.
Finding 16. Different positions should have different weights for

7.3 Analysis perturbation.
In Table 6, we have measured 33 papers on adversarial
attack in total, and identified the following interesting ob- In the current measurement methods, the perturbations
servations. of different elements are considered to have the same
weight. However, in face images, the same perturbations
Finding 13. Only a few attacks could be implemented in the
applied on the important part of face such as nose, eyes
physical world.
and mouth, will be easier to identify than that applied on
Real-world attacks are scarce in image field (2/20) ac- the background. Similarly, in audio analysis, perturbations
cording to our research. AEs in the digital space may fail to are difficult to be noticed in a chaotic scene, but are easily
fool classifiers in the physical space because physical attacks perceived in a quiet scene. According to above analysis,
need to consider more environmental factors. For example, we can consider to adopt different weights on different
when an adversarial image is snapshot by a camera, it elements when measuring distance. The important part has
is affected by photographing viewpoints, environmental a larger weight, so it can only make smaller perturbations,
lighting, and camera noise. So camera may not be able to while the unimportant part has a smaller weight, which can
catch those tiny perturbations. There are also some studies introduce larger perturbations.
about physical world attack [16] [20]. These images usually
need larger and obvious changes. Finding 17. More advanced measurements for human perception
Except for image classification, AEs in speech also need are desired.
to consider physical channel because of the noise. However,
this problem does not exist in text or malware field, so we The original goal of AEs is to make the model classify
give them all “Yes” in “Real-world”. samples wrongly while make humans be unaware of the
differences. However, it is difficult to measure humans’ per-
Finding 14. A bit more works focus on untargeted attacks ception of these perturbations. Intuitively, small Lp distance
(57.6%) which are easier to achieve but less severe than targeted implies a low probability of being detected by humans.
attacks. While recent work found that Lp distance is neither nec-
essary nor sufficient for perceptual similarity [145]. That is,
Untargeted attacks aims at inducing wrong predictions,
perturbations with large Lp values may also look similar to
and thus more flexible in finding perturbations which only
humans, such as overall translation and rotation of images,
need smaller modifications. Therefore, it can achieve success
and small Lp perturbations do not mean imperceptible.
more easily. Targeted attacks have to make the model predict
Therefore, we should break the constraint of Lp distance.
what as expected. Therefore, much more perturbations need
How to search for AEs systematically without Lp limitation,
to be created for accomplishing the target. However, they are
and how to propose new measurements that could be nec-
usually more harmful and practical in reality. For example,
essary or sufficient for perceptual similarity, will be a trend
attackers may disguise themselves as authenticated users
of adversarial attack in the near future.
in a face recognition system, in order to gain the access to
privileged resources.
Finding 15. Philosophy of distance selection.
8 D ISCUSSION
Distance metrics is an important factor to find mini-
mum perturbations, which mostly use L-distance currently. In this section, we summarize 7 observations according to
In Table 6, 60.1% attacks use L2 distance, 36.4% use L∞ the survey as follows.
15
TABLE 6: Evaluation on adversarial attacks. This table presents “Success Rate” of these attacks in specific “Dataset” with
varying target “System” and “Model”. “Distance” implies how these works measure the distance between samples. “Real-
world” is used to distinguish the works that are also suitable for physical adversarial attacks. “Knowledge” is valued either
black-box or white-box. “Iterative” illustrates whether the optimization steps are iterative. “Targeted” differs whether an
attack is a targeted attack or not. “Application” covers the practical areas.
Paper Success Rate Dataset System Distance Model Real-world Knowledge Iterative Targeted Application
L-BFGS [154] 20% MNIST FC10(1) L2 DNN No White Yes Yes image
FGSM [58] 54.6% MNIST a shallow softmax network L∞ DNN No White No No image
BIM [16] 24% ImageNet Inception v3 L∞ CNN Yes White Yes No image
MI-FGSM [47] 37.6% ImageNet Inception v3 L∞ CNN No White Yes Both image
JSMA [129] 97.05% MNIST LeNet L0 CNN No White Yes Yes image
C&W [38] 100% ImageNet Inception v3 L0 , L2 , L∞ CNN No White Yes Yes image
EAD [42] 100% ImageNet Inception v3 L1 , L2 , L∞ CNN No White Yes Yes image
OptMargin [68] 100% CIFAR-10 ResNet L0 , L2 , L∞ CNN No White Yes No image
Guo et al. [60] 95.5% ImageNet ResNet-50 L2 CNN No Both Yes No image
Deepfool [114] 68.7% ILSVRC2012 GoogLeNet L2 CNN No White Yes No image
NewtonFool [81] 81.63% GTSRB CNN(3Conv+1FC) L2 CNN No White Yes No image
UAP [113] 90.7% ILSVRC2012 VGG-16 L2 , L∞ CNN No White Yes No image
UAN [66] 91.8% ImageNet ResNet-152 L2 , L∞ CNN No White Yes Yes image
ATN [23] 89.2% MNIST CNN(3Conv+1FC) L2 CNN No White Yes Yes image
Athalye et al. [20] 83.4% 3D-printed turtle Inception-v3 L2 CNN Yes White No Yes image
Ilyas et al. [77] 99.2% ImageNet Inception-v3 - CNN No Black No Both image
Narodytska et al. [116] 97.51% CIFAR-10 VGG L0 CNN No Black No No image
Kos et al. [91] 76% MNIST VAE-GAN L2 GAN No White No Yes image
Mei et al. [110] - - - L2 SVM No Black Yes No image
Huang et al. [74] - - A3C,TRPO,DQN L1 , L2 , L∞ RL No Both No No image
Papernot et al. [131] 100% Reviews LSTM L2 RNN Yes White No No text
DeepWordBug [54] 51.80% IMDB Review LSTM L0 RNN Yes Black Yes Yes text
DeepSpeech [39] 100% Mozilla Common Voice LSTM L∞ RNN No White No Yes speech
Gong et al. [57] 72% IEMOCAP LSTM L2 RNN Yes White No No speech
CommanderSong [182] 96% Fisher ASplRE Chain Model L1 RNN Yes White No Yes speech
Rosenberg et al. [141] 99.99% 500000 files LSTM L2 RNN Yes Black Yes No malware
MtNet [75] 97% 4500000 files DNN(4 Hidden layers) L2 DNN Yes Black No No malware
SLEIPNIR [15] 99.7% 55000 PEs DNN L2 , L∞ DNN Yes Black No No malware
Rigaki et al. [138] 63% - GAN L0 GAN Yes Black No No malware
Pascanu et al. [133] 69% DREBIN DNN L1 DNN Yes Black No No malware
Kreuk et al. [92] 88% Microsoft Kaggle 2015 CNN L2 , L∞ CNN Yes White No Yes malware
Hu et al. [70] 90.05% 180 programs BiLTSM L1 RNN Yes Black Yes No malware
Hu et al. [71] 99.80% 180000 programs MalGAN L1 GAN Yes Black No No malware
8.1 Regulations on privacy protection fication. As a software system, deep learning can be easily
As shown in Section 4 and 5, both the enterprises and built on mature frameworks such as TensorFlow, Torch or
users are suffering from the risk of privacy. In addition Caffe. The vulnerabilities residing in these frameworks can
to removing privacy in the data, governments and related make the constructed deep learning systems vulnerable to
organizations can issue laws and regulations against privacy other types of attacks. The work [175] enumerates the secu-
violations in the course of data use and transmission. In rity issues such as heap overflow, integer overflow and use after
particular, it is recommended that: 1) introducing regulatory free in these widespread frameworks. These vulnerabilities
authorities to monitor these deep learning systems and can result in denial of service, control-flow hijacking or sys-
strictly supervise the use of data. The involved systems are tem compromise. Moreover, deep learning systems often de-
only allowed to extract features and predict results within pend on third-party libraries to provide auxiliary functions.
the permitted range. The private information is forbidden For instance, OpenCV is commonly used to process images,
for being extracted and inferred without authorization. 2) and Sound eXchange (SoX) is oftentimes used for audios.
establishing and improving relevant laws and regulations Once the vulnerabilities are exploited, the attacker can cause
(e.g., GDPR [3]), for supervising the process of data collec- more severe losses to deep learning systems. Therefore, the
tion, use, storage and deletion. 3) adding digital watermarks security auditing of deep learning implementation deserves
into the data for leak source tracking [21]. The watermarks more research attention and efforts in the further work.
helps to fast find out the rule breakers that are liable for
On the other hand, there are emerging a large number of
exposing privacy.
research works that leverage deep learning to detect and ex-
ploit software vulnerabilities automatically [181] [178] [80]
8.2 Secure implementation of deep learning systems [151]. It is believed that these techniques are also applicable
Most of the research on deep learning security is concentrat- in deep learning systems. Even more, deep learning might
ing on the leak of private data and the correctness of classi- help uncover the interpretation and fix the classification
16
vulnerabilities in future. At present, some work is about security and robustness

proof, usually against adversarial attack [169]. Deeper work
8.3 How far away from a complete black-box attack? requires to explain the reasons for prediction results, making
training and prediction processes are no longer in black-box.
Black-box attacks are relatively more destructive as they
Kantchelian et al. [86] suggested that system designers
do not require much information about the target which
need to broaden the classification goal into an explanatory
lowers the cost of attack. Many works are claiming they
goal and deepen interaction with human operators to ad-
are performing black-box attacks towards deep learning
dress the challenge of adversarial drift. Reluplex [87] can
systems [147] [143] [79]. But it is not clear that whether they
prove in which situations, small perturbations to inputs
are feasible on a large number of models and systems, and
cannot cause misclassification. The main idea is the lazy
what is the gap between these works with the real world
handling of ReLU constraints. It temporarily ignores ReLU
attack.
constraints and tries to solve the linear part of problems.
According to the surveyed results, we find that many
As a development, Wang et al. [169] presented ReluVal
black-box attacks still assume that some information is
to do formal security analysis of neural networks using
accessible. For example, [160] has to know what exact model
symbolic intervals. They proposed a new direction for
is running as well as its model structure before successfully
formally checking security properties without Satisfiability
stealing out the model parameters. [147] conducts a mem-
Modulo Theory. They leveraged symbolic interval algorithm
bership inference attack built on the fact that the statistics
to compute rigorous bounds on DNN outputs through min-
of training data is publicly known and similar data with
imizing over-estimations. AI 2 [55] attempts to do abstract
the same distribution can be easily synthesized. However,
interpretation in AI systems, and tries to prove the secu-
these conditions may be difficult to satisfy the real world,
rity and robustness of neural networks. They constructed
and a complete black-box attack is rarely seen in the recent
almost all perturbations, made them propagate automat-
research.
ically, and captured the behavior of convolutional layers,
Another difficulty of a complete black-box attack stems
max pooling layers and fully connected layers. They also
from the protection measures performed by deep learning
solved the state space explosion problem. DeepStellar [48]
systems: 1) query limit. Commercial deep learning systems
characterizes RNN internal behaviors by modeling a RNN
usually set a limit for service requests that prevents sub-
as an abstract state transition system. They design two trace
stitute model training. In [84], PRADA can detect model
similarity metrics to analyze RNNs quantitatively and also
extraction attacks based on characteristic distribution of
detect AEs with very small perturbations.
queries. 2) uncharted defense deployment. Besides not fully tan-
The interpretability cannot only bring security, but also
gible model, a black-box attacker also cannot infer how the
uncover the mystery of neural network and make us under-
defense is deployed and configured at the backend. These
stand its working mechanism easily. However, this is also
defenses may block a malicious request [112] [107], create
beneficial to attackers. They can exclude the range of input
misleading results [84] and dynamically change or enhance
proved secure, thus reducing the retrieval space and find-
their abilities [165] [160]. Due to the extreme imbalance
ing AEs more efficiently. They can also construct targeted
of knowledge between attackers and defenders, all of the
attacks through an in-depth understanding on models. In
above measures can avoid black-box attacks efficiently and
spite of this, this field should not be stagnant. Because a
effectively.
black-box model does not guarantee security [148]. There-
fore, with the improvement of interpretability, deep learning
8.4 Relationship between interpretability and security security may rise in a zigzag way.
The development of interpretability can help us better un- The development of interpretability is also conductive to
derstand the underlying principles of all these attacks. Since solving the hysteresis of defensive methods. Since we have
the neural network was born, it has the problem of low not yet achieved a deep understanding of DNN (it is not
interpretability. A small change of model parameters may clear why a record is predicted to the result, and how dif-
affect the prediction results drastically. People also cannot ferent data affect model parameters), finding vulnerabilities
directly understand how neural network operates. Recently, for attack is easier than preventing in advance. So there is a
interpretability has become an urgent field in deep learning. certain lag in deep learning security. If we can understand
In May of 2018, GDPR is announced to protect the privacy models thoroughly, it is believed that defense will precede
of personal data and it requires interpretability when using or synchronize with attack [87] [169] [55].
AI algorithms [3]. How to deeply understand the neural
network itself, and explain how the output is affected by 8.5 Discrimination in AI
the input are all problems that need to be solved urgently. AI system may seem rational, neutral and unbiased, but
Interpretability mainly refers to the ability to explain the actually, AI and algorithmic decisions can lead to unfair
logic behind every decision/judgment made by AI and how and discrimination [30]. For example, amazon’s AI hiring
to trust these decisions [162]. It mainly includes rationality, tool taught itself that male candidates were preferable [63].
traceability, and understandability [86]. Rationality means There are also discrimination in crime prevention, online
being able to understand the reasoning behind each predic- shops [30], bank loan [6], and so on. There are two main
tion. Traceability refers to the ability to track predictive pro- reasons causing AI discrimination [6]: 1) Imbalanced train-
cesses, which can be derived from the logic of mathematical ing data; 2)Training data reflects past discrimination.
algorithms [87] [169]. Understandability refers to a complete In order to solve this problem and make AI system better
understanding of the model on which decisions are based. benefit humans, what we need to do is: 1) balancing dataset,
17
by adding/removing data about under/over represented denial and falsification, and detecting poisonous data [170]
subsets. 2) modifying data or trained model where training [105] [65]. In particular, Olufowobi et al. [121] described the
data reflects past discrimination [6]; 3) importing testing context of creation or modification of data points to enhance
techniques to test the fairness of models, such as sym- trustworthiness and dependability of the data. Chakarov
bolic execution and local interpretability [12]; 4) enacting et al. [40] evaluated the effect of individual data points
non-discrimination law, and data protection law, such as on the performance of trained model. Baracaldo et al. [24]
GDPR [3]. used source information of training data points and the
transformation context to identify poisonous data; protecting
8.6 Corresponding defense methods algorithm, which adjusts training algorithms, e.g., robust
PCA [35], robust linear regression [43] [100], and robust
There is a line of approaches for preventing the aforemen- logistic regression [51].
tioned attacks.
MEA defense. Blurring the prediction results is an effec- AA defense. As adversarial attack draws the major atten-
tive way to prevent model stealing, for instance, rounding tion, defensive work is more comprehensive and ample ac-
parameters [165] [160], adding noise into class probabili- cordingly. The mainstream defense approaches is as follows:
ties [96] [84]. On the other hand, detecting and prevent • Adversarial training. This method selects AEs as part of the
abnormal queries can also resolve MEA. Kesarwani et al. [88] training dataset to make trained model learn characteris-
recorded all requests made by clients and calculated the tics of AEs [73] [94]. Furthermore, Ensemble Adversarial
explored feature space to detect attack. PRADA [84] de- Training [158] contained each turbine input transferred
tected attack based on sudden changes in the distribution from other pre-trained models.
of samples submitted by a given customer. • Region-based method. Understanding properties of adver-
MIA defense. To defend with model inversion attacks, sarial regions and using more robust region-based clas-
researchers propose the following approaches: sification could also defend adversarial attack. Cao et
• Differential privacy (DP), which is a cryptographic scheme al. [36] developed DNNs using region-based classification
designed to maximize the accuracy of data queries while instead of point-based. They predicted label through ran-
minimizing the opportunity to identify their records when domly selecting several points from the hypercube cen-
querying from a statistical database [50]. Individual fea- tered at the testing sample. In [125], the classifier mapped
tures are removed to preserve user privacy. It is first normal samples to the neighborhood of low-dimensional
proposed in [49] and prove to be effective in privacy manifolds in the final-layer hidden space. Local Intrinsic
preservation in database. DP can be applied to prediction Dimensionality [107] characterized dimensional proper-
outputs [41] [64] [166] [184] [76], loss function [89] [155], ties of adversarial regions and evaluated the spatial fill
and gradients [149] [26] [155] [11] [184] [187]. capability. Background Class [109] added a large and
• Homomorphic encryption (HE), which is an encryption func- diverse class of background images into datasets.
tion and enables the following two operations are value- • Transformation. Transforming inputs can defend adversar-
equivalent [139]: exercising arithmetic operations ⊕ on the ial attack to a large extent. Song et al. [150] found that AEs
ring of plain text and encrypting the result, encrypting mainly lay in the low probability regions of the training
operators first and then carry on the same arithmetic regions. So they purified an AE by moving it back towards
operations, i.e., En(x) ⊕ En(y) = En(x + y). In this way, the distribution adaptively. Guo et al. [61] explored model-
clients can encrypt their data and then send it to MLaaS. agnostic defenses on image-classification systems by im-
The server returns encrypted predictions without learning age transformations. Xie et al. [176] used randomization at
anything about the plain data. In the meantime, the clients inference time, including random resizing and padding.
have no idea about the model attributes [56] [101] [85] Tian et al. [156] considered that AEs are more sensitive to
[82]. certain image transformation operations, such as rotation
• Secure multi-party computation (SMC), stemming from and shifting, than normal images. Wang et al. [168] [167]
Yao’s Millionaires’ problem [180] and enabling a safe thought AEs are more sensitive to random perturbations
calculation of contract functions without trusted third than normal. Buckman et al. [32] used thermometer code
parties. In the context of deep learning, it extends to that and one-hot code discretization to increase the robustness
multiple parties collectively train a model and preserve of network to AEs.
their own data [164] [146] [134] [135]. As such, the training • Gradient regularization/masking. This method hides gradi-
data cannot be easily inferred by attackers residing at ents or reduces the sensitivity of models. Madry et al. [108]
either computing servers or the client side. realized it by optimizing a saddle point formulation,
• Training reconstitution. Cao et al. [37] put forward machine which included solving an inner maximization solved and
unlearning, which makes ML models completely forget a an outer minimization. Ross et al. [142] trained differen-
piece of training data and recover the effects to models tiable models that penalized the degree to infinitesimal
and features. Ohrimenko et al. [120] proposed a data- changes in inputs.
oblivious machine learning algorithm. Osia et al. [123] • Distillation. Papernot et al. [126] proposed Defensive Dis-
broke down large, complex deep models to enable scal- tillation, which could successfully mitigate AEs con-
able and privacy-preserving analytics by removing sensi- structed by FGSM and JSMA. Papernot et al. [132] also
tive information with a feature extractor. used the knowledge extracted in distillation to reduce the
PA defense. Poisoning attack can be mitigated through two magnitude of network gradient.
aspect: protecting data, including avoiding data tampering, • Data preprocessing. Liang et al. [98] introduced scalar quan-
18
tization and smooth spatial filtering to reduce the effect 9 C ONCLUSION

of perturbations. Zantedeschi et al. [183] used bounded In this paper, we conduct a comprehensive and extensive
ReLU activation function for hedging forward propaga- investigation on attacks towards deep learning systems. Dif-
tion of adversarial perturbation. Xu et al. [179] proposed ferent from other surveys, we dissect an attack in a system-
feature squeezing methods, including reducing the depth atical way, where interested readers can clearly understand
of color bit on each pixel and spatial smoothing. how these attacks happen step by step. We have compared
• Defense network. Some studies use networks to automati- the investigated works on their attack vectors and proposed
cally fight against AEs. Gu et al. [59] used deep contractive a number of metrics to compare their performance. Based
network with contractive autoencoders and denoising on the comparison, we then proceed to distill a number of
autoencoders, which can remove amounts of adversarial insights, disclosing advantages and disadvantages of attack
noise. Akhtar et al. [13] proposed a perturbation rectifying methods, limitations and trends. The discussion covering
network as pre-input layers to defend against UAPs. Mag- the difficulties of these attacks in the physical world, secu-
Net [112] used detector networks to detect AEs which are rity concerns in other aspects and potential mitigation for
far from the boundary of manifold, and used a reformer these attacks provide a platform that future research can be
to reform AEs which are close to the boundary. based.
8.7 Future direction of attack and defense

R EFERENCES
[1] Drebin dataset. https://www.sec.tu-bs.de/∼danarp/drebin/,
It is an endless war between attackers and defenders, and 2016.
neither of them can win an absolute victory. But both [2] Imagenet dataset. http://www.image-net.org, 2017.
[3] General data protection regulation. https://gdpr-info.eu, May
sides can research new techniques and applications to gain 2018.
advantages. From the attacker’s point of view, one effective [4] Gss general social survey. http://gss.norc.org/, 2019.
way is to explore new attack surfaces, find out new attack [5] Gtsrb dataset. http://benchmark.ini.rub.de/?section=gtsrb&sub
scenarios, seek for new attack purposes and broaden the section=dataset, 2019.
[6] Human bias and discrimination in ai systems.
scope of attack effects. In particular, main attack surfaces on https://ai-auditingframework.blogspot.com/2019/06/huma
deep learning systems include malformed operational in- n-bias-and-discrimination-in-ai.html, 2019.
put, malformed training data and malformed models [175]. [7] Ijb-a dataset. https://www.nist.gov/itl/iad/image-group/ijb-d
ataset-request-form, 2019.
In adversary attack, Lp -distance is not an ideal mea- [8] Imdb review dataset. https://www.kaggle.com/utathya/imd
surement. Some images with big perturbations are still b-review-dataset, 2019.
indistinguishable for humans. However, unlike Lp -distance, [9] Microsoft kaggle dataset. https://www.kaggle.com/c/microso
ft-malware-prediction, 2019.
there is no standard measure for large Lp perturbations. [10] Mozilla common voice. https://voice.mozilla.org/en, 2019.
This will be a hot point for adversarial learning in future. [11] M. Abadi, A. Chu, I. J. Goodfellow, H. B. McMahan, I. Mironov,
In model extraction attack, stealing functionality of complex K. Talwar, and L. Zhang. Deep learning with differential privacy.
models needs massive queries. How to come up with a In Proceedings of the ACM Conference on Computer and Communica-
tions Security (CCS), Vienna, Austria, 2016, pages 308–318, October
better method to reduce the number of queries in order of 24-28, 2016.
magnitude will be the focus of this field. [12] A. Aggarwal, P. Lohia, S. Nagar, K. Dey, and D. Saha. Black
The balance of attack cost and benefit is also an impor- box fairness testing of machine learning models. In Proceedings
of the ACM Joint Meeting on European Software Engineering Con-
tant factor. Some attacks, even can achieve fruitful targets, ference and Symposium on the Foundations of Software Engineering,
have to perform costly computation or resources [160]. For ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019.,
example, in [147], the attacker has to train a number of pages 625–635.
shadow models that simulate the target model, and then [13] N. Akhtar, J. Liu, and A. S. Mian. Defense against universal
adversarial perturbations. CoRR, abs/1711.05929, 2017.
undertake membership inference. They need 156 queries to [14] N. Akhtar and A. S. Mian. Threat of adversarial attacks on deep
produce a data point on average. learning in computer vision: A survey. IEEE Access, 6:14410–
Attack cost and attack benefit are a trade-off pro- 14430, 2018.
[15] A. Al-Dujaili, A. Huang, E. Hemberg, and U. O’Reilly. Adversar-
cess [110]. Generally, the cost of attack contains time, com- ial deep learning for robust detection of binary encoded malware.
putation resources, acquired knowledge, and monetary ex- In IEEE Security and Privacy Workshops, San Francisco, CA, USA,
pense. The benefit from an attack include economic pay- pages 76–82, May 24, 2018.
[16] Alexey, I. J. Goodfellow, and S. Bengio. Adversarial examples in
back, rivals’ failure and so forth. In this study, we will not the physical world. CoRR, abs/1607.02533, 2016.
give a uniform formula to quantify the cost and benefit [17] S. Alfeld, X. Zhu, and P. Barford. Data poisoning attacks against
as the importance of each element is varying in different autoregressive models. In Proceedings of the Thirtieth AAAI Confer-
scenarios. Nevertheless, it is usually modeled as an opti- ence on Artificial Intelligence, Phoenix, Arizona, USA., pages 1452–
1458, February 12-17, 2016.
mization problem where the cost is minimized while the [18] D. Amodei, C. Olah, J. Steinhardt, P. F. Christiano, J. Schul-
benefit is maximized, like a min-max game [117]. man, and D. Mané. Concrete problems in AI safety. CoRR,
As for defenders, a combination of multiple defense abs/1606.06565, 2016.
[19] G. Ateniese, L. V. Mancini, A. Spognardi, A. Villani, D. Vitali, and
techniques is a good choice to reduce the risk of being at- G. Felici. Hacking smart machines with smarter ones: How to
tacked. But the combination may incur additional overhead extract meaningful data from machine learning classifiers. IJSN,
on the system that should be solved in design. For example, 10(3):137–150, 2015.
in [101] [85], they adopted a mixed protocol combining [20] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing
robust adversarial examples. In Proceedings of the 35th Interna-
HE and MPC, which improved performance but with high tional Conference on Machine Learning (ICML), Stockholmsmässan,
bandwidth. Stockholm, Sweden, pages 284–293, July 10-15, 2018.
19
[21] A. Awad, J. Traub, and S. Sakr. Adaptive watermarks: A concept [42] P. Chen, Y. Sharma, H. Zhang, J. Yi, and C. Hsieh. EAD: elastic-
drift-based approach for predicting event-time progress in data net attacks to deep neural networks via adversarial examples.
streams. In 22nd International Conference on Extending Database In Proceedings of the Thirty-Second AAAI Conference on Artificial
Technology (EDBT), Lisbon, Portugal, pages 622–625, March 26-29, Intelligence, New Orleans, Louisiana, USA, pages 10–17, February
2019. 2-7, 2018.
[22] H. Bae, J. Jang, D. Jung, H. Jang, H. Ha, and S. Yoon. Security [43] Y. Chen, C. Caramanis, and S. Mannor. Robust high dimensional
and privacy issues in deep learning. CoRR, abs/1807.11655, 2018. sparse regression and matching pursuit. CoRR, abs/1301.2725,
[23] S. Baluja and I. Fischer. Learning to attack: Adversarial trans- 2013.
formation networks. In Proceedings of the Thirty-Second AAAI [44] M. Cheng, T. Le, P. Chen, H. Zhang, J. Yi, and C. Hsieh. Query-
Conference on Artificial Intelligence, New Orleans, Louisiana, USA, efficient hard-label black-box attack: An optimization-based ap-
February 2-7, 2018. proach. In 7th International Conference on Learning Representations,
[24] N. Baracaldo, B. Chen, H. Ludwig, and J. A. Safavi. Mitigating ICLR, New Orleans, LA, USA, May 6-9, 2019.
poisoning attacks on machine learning models: A data prove- [45] J. R. C. da Silva, R. F. Berriel, C. Badue, A. F. de Souza, and
nance based approach. In Proceedings of the 10th ACM Workshop T. Oliveira-Santos. Copycat CNN: stealing knowledge by per-
on Artificial Intelligence and Security, AISec@CCS, Dallas, TX, USA, suading confession with random non-labeled data. In Interna-
pages 103–110, November 3, 2017. tional Joint Conference on Neural Networks, IJCNN, Rio de Janeiro,
[25] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security Brazil, pages 1–8, July 8-13, 2018.
of machine learning. Machine Learning, 81(2):121–148, Nov 2010. [46] J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training
[26] R. Bassily, A. D. Smith, and A. Thakurta. Private empirical risk of deep bidirectional transformers for language understanding.
minimization: Efficient algorithms and tight error bounds. In In Proceedings of the 2019 Conference of the North American Chapter
55th IEEE Annual Symposium on Foundations of Computer Science, of the Association for Computational Linguistics: Human Language
FOCS, Philadelphia, PA, USA, pages 464–473, October 18-21, 2014. Technologies (NAACL-HLT), pages 4171–4186, 2019.
[27] V. Beal. What is structured data? webopedia definition. [47] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li.
https://www.webopedia.com/TERM/S/structured data.html, Boosting adversarial attacks with momentum. In IEEE Conference
Aug. 2018. on Computer Vision and Pattern Recognition, CVPR, Salt Lake City,
[28] B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against UT, USA, pages 9185–9193, June 18-22, 2018.
support vector machines. In Proceedings of the 29th International [48] X. Du, X. Xie, Y. Li, L. Ma, Y. Liu, and J. Zhao. Deepstellar: model-
Conference on Machine Learning, ICML, Edinburgh, Scotland, UK, based quantitative analysis of stateful deep learning systems. In
June 26 - July 1, 2012. Proceedings of the ACM Joint Meeting on European Software Engi-
[29] B. Biggio, I. Pillai, S. R. Bulò, D. Ariu, M. Pelillo, and F. Roli. Is neering Conference and Symposium on the Foundations of Software
data clustering in adversarial settings secure? In AISec’13, Pro- Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-
ceedings of the ACM Workshop on Artificial Intelligence and Security, 30, 2019., pages 477–487.
Co-located with CCS, Berlin, Germany, pages 87–98, November 4, [49] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor.
2013. Our data, ourselves: Privacy via distributed noise generation. In
[30] F. Z. Borgesius. Discrimination, artificial intelligence, and 25th Annual International Conference on the Theory and Applications
algorithmic decision-making. https://rm.coe.int/discrimin of Cryptographic Techniques, St. Petersburg, Russia, pages 486–503,
ation-artificial-intelligence-and-algorithmic-decision-making/ May 28-June 1, 2006.
1680925d73, 2018.
[50] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith. Calibrating
[31] M. Brückner and T. Scheffer. Nash equilibria of static prediction noise to sensitivity in private data analysis. In Theory of Cryptog-
games. In 23rd Annual Conference on Neural Information Process- raphy, Third Theory of Cryptography Conference, TCC, New York, NY,
ing Systems, Vancouver, British Columbia, Canada, pages 171–179, USA, pages 265–284, March 4-7, 2006.
December 7-10, 2009.
[51] J. Feng, H. Xu, S. Mannor, and S. Yan. Robust logistic regression
[32] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer
and classification. In Annual Conference on Neural Information
encoding: One hot way to resist adversarial examples. In Interna-
Processing Systems, Montreal, Quebec, Canada, pages 253–261, De-
tional Conference on Learning Representations, 2018.
cember 8-13, 2014.
[33] J. Buolamwini and T. Gebru. Gender shades: Intersectional accu-
racy disparities in commercial gender classification. In Conference [52] M. Fredrikson, S. Jha, and T. Ristenpart. Model inversion attacks
on Fairness, Accountability and Transparency, FAT, New York, NY, that exploit confidence information and basic countermeasures.
USA, pages 77–91, February 23-24, 2018. In Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security, Denver, CO, USA, pages 1322–1333,
[34] C. Burkard and B. Lagesse. Analysis of causative attacks against
October 12-16, 2015.
svms learning from data streams. In Proceedings of the 3rd
ACM on International Workshop on Security And Privacy Analytics, [53] K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov.
IWSPA@CODASPY, Scottsdale, Arizona, USA, pages 31–36, March Property inference attacks on fully connected neural networks
24, 2017. using permutation invariant representations. In Proceedings of
[35] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal the ACM SIGSAC Conference on Computer and Communications
component analysis? J. ACM, 58(3):11:1–11:37, 2011. Security, CCS, Toronto, ON, Canada, pages 619–633, October 15-
19, 2018.
[36] X. Cao and N. Z. Gong. Mitigating evasion attacks to deep neural
networks via region-based classification. In Proceedings of the [54] J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi. Black-box generation
33rd Annual Computer Security Applications Conference, Orlando, of adversarial text sequences to evade deep learning classifiers. In
FL, USA, pages 278–287, December 4-8, 2017. IEEE Security and Privacy Workshops, SP Workshops, San Francisco,
[37] Y. Cao and J. Yang. Towards making systems forget with machine CA, USA, pages 50–56, May 24, 2018.
unlearning. In IEEE Symposium on Security and Privacy, SP, San [55] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaud-
Jose, CA, USA, pages 463–480, May 17-21, 2015. huri, and M. T. Vechev. AI2: safety and robustness certification of
[38] N. Carlini and D. A. Wagner. Towards evaluating the robustness neural networks with abstract interpretation. In IEEE Symposium
of neural networks. In IEEE Symposium on Security and Privacy on Security and Privacy, SP, San Francisco, CA, USA, pages 3–18,
(SP), pages 39–57, 2017. May 21-23, 2018.
[39] N. Carlini and D. A. Wagner. Audio adversarial examples: [56] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. E. Lauter, M. Naehrig,
Targeted attacks on speech-to-text. In IEEE Security and Privacy and J. Wernsing. Cryptonets: Applying neural networks to en-
Workshops, SP Workshops, San Francisco, CA, USA, pages 1–7, May crypted data with high throughput and accuracy. In Proceedings of
24, 2018. the 33nd International Conference on Machine Learning, ICML 2016,
[40] A. Chakarov, A. V. Nori, S. K. Rajamani, S. Sen, and D. Vi- New York City, NY, USA, June 19-24, 2016, pages 201–210, 2016.
jaykeerthy. Debugging machine learning tasks. CoRR, [57] Y. Gong and C. Poellabauer. Crafting adversarial examples for
abs/1603.07292, 2016. speech paralinguistics applications. CoRR, abs/1711.03280, 2017.
[41] K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic re- [58] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and
gression. In Proceedings of the Twenty-Second Annual Conference on harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
Neural Information Processing Systems, Vancouver, British Columbia, [59] S. Gu and L. Rigazio. Towards deep neural network architectures
Canada, pages 289–296, December 8-11, 2008. robust to adversarial examples. CoRR, abs/1412.5068, 2014.
20
[60] C. Guo, J. S. Frank, and K. Q. Weinberger. Low frequency In Proceedings of the 33rd Annual Computer Security Applications
adversarial perturbation. CoRR, abs/1809.08758, 2018. Conference, Orlando, FL, USA, December 4-8, 2017, pages 262–277.
[61] C. Guo, M. Rana, M. Cissé, and L. van der Maaten. Coun- [82] X. Jiang, M. Kim, K. E. Lauter, and Y. Song. Secure outsourced
tering adversarial images using input transformations. CoRR, matrix computation and application to neural networks. In
abs/1711.00117, 2017. Proceedings of the 2018 ACM SIGSAC Conference on Computer and
[62] J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun. Dlfuzz: differential Communications Security, CCS 2018, Toronto, ON, Canada, October
fuzzing testing of deep learning systems. In Proceedings of the 15-19, 2018, pages 1209–1222, 2018.
2018 ACM Joint Meeting on European Software Engineering Con- [83] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks
ference and Symposium on the Foundations of Software Engineering, for efficient text classification. In Proceedings of the 15th Confer-
ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November ence of the European Chapter of the Association for Computational
04-09, 2018, pages 739–743. Linguistics: Volume 2, Short Papers, pages 427–431. Association for
[63] I. A. Hamilton. Amazon built an ai tool to hire people but had Computational Linguistics, April 2017.
to shut it down because it was discriminating against women. [84] M. Juuti, S. Szyller, A. Dmitrenko, S. Marchal, and N. Asokan.
https://www.businessinsider.com/amazon-built-ai-to-hire-peo PRADA: protecting against DNN model stealing attacks. CoRR,
ple-discriminated-against-women-2018-10, Oct. 2018. abs/1805.02628, 2018.
[64] J. Hamm, Y. Cao, and M. Belkin. Learning privately from [85] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan. GAZELLE:
multiparty data. In Proceedings of the 33nd International Conference A low latency framework for secure neural network inference.
on Machine Learning, ICML, New York City, NY, USA, pages 555– In 27th USENIX Security Symposium, USENIX Security, Baltimore,
563, June 19-24, 2016. MD, USA, pages 1651–1669, August 15-17, 2018.
[65] R. Hasan, R. Sion, and M. Winslett. The case of the fake [86] A. Kantchelian, S. Afroz, L. Huang, A. C. Islam, B. Miller, M. C.
picasso: Preventing history forgery with secure provenance. In Tschantz, R. Greenstadt, A. D. Joseph, and J. D. Tygar. Ap-
7th USENIX Conference on File and Storage Technologies, February proaches to adversarial drift. In Proceedings of the ACM Workshop
24-27, 2009, San Francisco, CA, USA. Proceedings, pages 1–14, 2009. on Artificial Intelligence and Security, AISec, Berlin, Germany, pages
[66] J. Hayes and G. Danezis. Learning universal adversarial per- 99–110, November 4, 2013.
turbations with generative models. In IEEE Security and Privacy [87] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochender-
Workshops, SP Workshops, San Francisco, CA, USA, pages 43–49, fer. Towards proving the adversarial robustness of deep neural
May 24, 2018. networks. In Proceedings First Workshop on Formal Verification
[67] J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro. LOGAN: of Autonomous Vehicles, FVAV@iFM, Turin, Italy, 19th September.,
evaluating privacy leakage of generative models using genera- pages 19–26, 2017.
tive adversarial networks. CoRR, abs/1705.07663, 2017. [88] M. Kesarwani, B. Mukhoty, V. Arya, and S. Mehta. Model
[68] W. He, B. Li, and D. Song. Decision boundary analysis of extraction warning in mlaas paradigm. CoRR, abs/1711.07221,
adversarial examples. In International Conference on Learning 2017.
Representations, 2018. [89] D. Kifer, A. D. Smith, and A. Thakurta. Private convex optimiza-
[69] B. Hitaj, G. Ateniese, and F. Pérez-Cruz. Deep models under tion for empirical risk minimization with applications to high-
the GAN: information leakage from collaborative deep learning. dimensional regression. In The 25th Annual Conference on Learning
In Proceedings of the ACM SIGSAC Conference on Computer and Theory, COLT, Edinburgh, Scotland, pages 25.1–25.40, June 25-27,
Communications Security, CCS, Dallas, TX, USA, pages 603–618, 2012.
October 30-November 03, 2017. [90] J. Kim, R. Feldt, and S. Yoo. Guiding deep learning system
[70] W. Hu and Y. Tan. Black-box attacks against RNN based malware testing using surprise adequacy. In Proceedings of the 41st Interna-
detection algorithms. In The Workshops of the The Thirty-Second tional Conference on Software Engineering, ICSE 2019, Montreal, QC,
AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, Canada, May 25-31, 2019, pages 1039–1049.
USA, pages 245–251, February 2-7,. [91] J. Kos, I. Fischer, and D. Song. Adversarial examples for genera-
[71] W. Hu and Y. Tan. Generating adversarial malware examples for tive models. In IEEE Security and Privacy Workshops, SP Workshops,
black-box attacks based on GAN. CoRR, abs/1702.05983, 2017. San Francisco, CA, USA, May 24, 2018, pages 36–42.
[72] W. Hua, Z. Zhang, and G. E. Suh. Reverse engineering convolu- [92] F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and
tional neural networks through side-channel information leaks. J. Keshet. Deceiving end-to-end deep learning malware detectors
In Proceedings of the 55th Annual Design Automation Conference, using adversarial examples. 2018.
DAC, San Francisco, CA, USA, pages 4:1–4:6, June 24-29, 2018. [93] A. Krizhevsky, V. Nair, and G. Hinton. CIFAR dataset. https:
[73] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári. Learning //www.cs.toronto.edu/∼kriz/cifar.html, 2019.
with a strong adversary. CoRR, abs/1511.03034, 2015. [94] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine
[74] S. H. Huang, N. Papernot, I. J. Goodfellow, Y. Duan, and learning at scale. CoRR, abs/1611.01236, 2016.
P. Abbeel. Adversarial attacks on neural network policies. CoRR, [95] Y. LeCun, C. Cortes, and C. Burges. Mnist dataset. http://yann
abs/1702.02284, 2017. .lecun.com/exdb/mnist/, 2017.
[75] W. Huang and J. W. Stokes. Mtnet: A multi-task neural net- [96] T. Lee, B. Edwards, I. Molloy, and D. Su. Defending against
work for dynamic malware classification. In 13th International model stealing attacks using deceptive perturbations. CoRR,
Conference, Detection of Intrusions and Malware, and Vulnerability abs/1806.00054, 2018.
Assessment, DIMVA, San Sebastián, Spain, pages 399–418, July 7-8, [97] P. Li, J. Yi, and L. Zhang. Query-efficient black-box attack by
2016. active learning. In IEEE International Conference on Data Mining,
[76] N. Hynes, R. Cheng, and D. Song. Efficient deep learning on ICDM , Singapore, November 17-20, 2018, pages 1200–1205.
multi-source private data. CoRR, abs/1807.06689, 2018. [98] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. Detecting
[77] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Query-efficient adversarial image examples in deep networks with adaptive
black-box adversarial examples. CoRR, abs/1712.07113, 2017. noise reduction. 2017.
[78] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box adversar- [99] R. Light. Ai trends: Machine learning as a service (mlaas).
ial attacks with limited queries and information. In Proceedings https://learn.g2.com/trends/machine-learning-service-mlaas,
of the 35th International Conference on Machine Learning, ICML, Jan. 2018.
Stockholmsmässan, Stockholm, Sweden, pages 2142–2151, 2018. [100] C. Liu, B. Li, Y. Vorobeychik, and A. Oprea. Robust linear regres-
[79] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and sion against training data poisoning. In Proceedings of the 10th
B. Li. Manipulating machine learning: Poisoning attacks and ACM Workshop on Artificial Intelligence and Security, AISec@CCS
countermeasures for regression learning. In IEEE Symposium on 2017, Dallas, TX, USA, November 3, 2017, pages 91–102, 2017.
Security and Privacy, SP, San Francisco, California, USA, pages 19– [101] J. Liu, M. Juuti, Y. Lu, and N. Asokan. Oblivious neural network
35, May 21-23, 2018. predictions via minionn transformations. In Proceedings of the
[80] S. Jan, A. Panichella, A. Arcuri, and L. C. Briand. Automatic ACM SIGSAC Conference on Computer and Communications Secu-
generation of tests to exploit XML injection vulnerabilities in web rity, CCS, Dallas, TX, USA, October 30 - November 03, 2017, pages
applications. IEEE Trans. Software Eng., 45(4):335–362, 2019. 619–631.
[81] U. Jang, X. Wu, and S. Jha. Objective metrics and gradient [102] K. S. Liu, B. Li, and J. Gao. Generative model: Membership attack,
descent algorithms for adversarial examples in machine learning. generalization and diversity. CoRR, abs/1805.09898, 2018.
21
[103] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung. A [123] S. A. Ossia, A. S. Shamsabadi, A. Taheri, H. R. Rabiee, N. D.
survey on security threats and defensive techniques of machine Lane, and H. Haddadi. A Hybrid Deep Learning Architecture
learning: A data driven view. IEEE Access, 6:12103–12117, 2018. for Privacy-Preserving Mobile Analytics. CoRR, abs/1703.02952,
[104] Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. 2017.
Gunter, and K. Chen. Understanding membership inferences on [124] R. Pan. Static deep neural network analysis for robustness. In
well-generalized learning models. CoRR, abs/1802.04889, 2018. Proceedings of the ACM Joint Meeting on European Software Engi-
[105] J. Lyle and A. P. Martin. Trusted computing and provenance: neering Conference and Symposium on the Foundations of Software
Better together. In 2nd Workshop on the Theory and Practice of Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-
Provenance, TaPP’10, San Jose, CA, USA, February 22, 2010, 2010. 30, 2019., pages 1238–1240.
[106] L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, [125] T. Pang, C. Du, and J. Zhu. Robust deep learning via re-
L. Li, Y. Liu, J. Zhao, and Y. Wang. Deepgauge: multi-granularity verse cross-entropy training and thresholding test. CoRR,
testing criteria for deep learning systems. In Proceedings of the abs/1706.00633, 2017.
33rd ACM/IEEE International Conference on Automated Software [126] N. Papernot and P. D. McDaniel. On the effectiveness of defen-
Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, sive distillation. CoRR, abs/1607.05113, 2016.
pages 120–131. [127] N. Papernot, P. D. McDaniel, and I. J. Goodfellow. Transferability
[107] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. N. R. Wijewickrema, M. E. in machine learning: from phenomena to black-box attacks using
Houle, G. Schoenebeck, D. Song, and J. Bailey. Characterizing adversarial samples. CoRR, abs/1605.07277, 2016.
adversarial subspaces using local intrinsic dimensionality. CoRR, [128] N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik,
abs/1801.02613, 2018. and A. Swami. Practical black-box attacks against machine
[108] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. learning. In Proceedings of the 2017 ACM on Asia Conference
Towards deep learning models resistant to adversarial attacks. on Computer and Communications Security, AsiaCCS, Abu Dhabi,
CoRR, abs/1706.06083, 2017. United Arab Emirates, April 2-6, 2017, pages 506–519.
[109] M. McCoyd and D. A. Wagner. Background class defense against [129] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik,
adversarial examples. In 2018 IEEE Security and Privacy Work- and A. Swami. The limitations of deep learning in adversarial
shops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, settings. In IEEE European Symposium on Security and Privacy,
pages 96–102, 2018. EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pages
[110] S. Mei and X. Zhu. Using machine teaching to identify optimal 372–387, 2016.
training-set attacks on machine learners. In Proceedings of the [130] N. Papernot, P. D. McDaniel, A. Sinha, and M. P. Wellman. Sok:
Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25- Security and privacy in machine learning. In 2018 IEEE European
30, 2015, Austin, Texas, USA., pages 2871–2877, 2015. Symposium on Security and Privacy, EuroS&P 2018, London, United
[111] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov. Inference Kingdom, April 24-26, 2018, pages 399–414, 2018.
attacks against collaborative learning. CoRR, abs/1805.04049, [131] N. Papernot, P. D. McDaniel, A. Swami, and R. E. Harang. Craft-
2018. ing adversarial input sequences for recurrent neural networks.
[112] D. Meng and H. Chen. Magnet: A two-pronged defense against In 2016 IEEE Military Communications Conference, MILCOM 2016,
adversarial examples. In Proceedings of the ACM SIGSAC Confer- Baltimore, MD, USA, November 1-3, 2016, pages 49–54, 2016.
ence on Computer and Communications Security, CCS, Dallas, TX,
[132] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and A. Swami.
USA, October 30 - November 03, 2017, pages 135–147.
Distillation as a defense to adversarial perturbations against deep
[113] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Uni- neural networks. In IEEE Symposium on Security and Privacy, SP
versal adversarial perturbations. In 2017 IEEE Conference on 2016, San Jose, CA, USA, May 22-26, 2016, pages 582–597, 2016.
Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI,
[133] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and
USA, July 21-26, 2017, pages 86–94, 2017.
A. Thomas. Malware classification with recurrent networks. In
[114] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A
2015 IEEE International Conference on Acoustics, Speech and Signal
simple and accurate method to fool deep neural networks. In
Processing, ICASSP 2015, South Brisbane, Queensland, Australia,
IEEE Conference on Computer Vision and Pattern Recognition, CVPR,
April 19-24, 2015, pages 1916–1920, 2015.
Las Vegas, NV, USA, June 27-30, 2016, pages 2574–2582.
[134] L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai. Privacy-
[115] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Won-
preserving deep learning via additively homomorphic encryp-
grassamee, E. C. Lupu, and F. Roli. Towards poisoning of
tion. IEEE Trans. Information Forensics and Security, 13(5):1333–
deep learning algorithms with back-gradient optimization. In
1345, 2018.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and
Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, [135] L. T. Phong and T. T. Phuong. Privacy-preserving deep learning
pages 27–38, 2017. for any activation function. CoRR, abs/1809.03272, 2018.
[116] N. Narodytska and S. P. Kasiviswanathan. Simple black-box [136] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L.
adversarial attacks on deep neural networks. In IEEE Confer- Shyu, S.-C. Chen, and S. S. Iyengar. A survey on deep learn-
ence on Computer Vision and Pattern Recognition Workshops, CVPR ing: Algorithms, techniques, and applications. ACM Computing
Workshops, Honolulu, HI, USA, July 21-26, 2017, pages 1310–1318. Surveys, 51(5):92:1–92:36, Sept. 2018.
[117] M. Nasr, R. Shokri, and A. Houmansadr. Machine learning [137] A. Pyrgelis, C. Troncoso, and E. D. Cristofaro. Knock knock,
with membership privacy using adversarial regularization. In who’s there? membership inference on aggregate location data.
Proceedings of the 2018 ACM SIGSAC Conference on Computer and 2017.
Communications Security (CCS), pages 634–646, 2018. [138] M. Rigaki and S. Garcia. Bringing a GAN to a knife-fight:
[118] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, Adapting malware communication to avoid detection. In 2018
U. Saini, C. A. Sutton, J. D. Tygar, and K. Xia. Exploiting machine IEEE Security and Privacy Workshops, SP Workshops 2018, San
learning to subvert your spam filter. In First USENIX Workshop Francisco, CA, USA, May 24, 2018, pages 70–75, 2018.
on Large-Scale Exploits and Emergent Threats, LEET, 2008. [139] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al. On data banks
[119] S. J. Oh, M. Augustin, M. Fritz, and B. Schiele. Towards reverse- and privacy homomorphisms. Foundations of secure computation,
engineering black-box neural networks. In International Confer- 4(11):169–180, 1978.
ence on Learning Representations, 2018. [140] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach. Low resource
[120] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, black-box end-to-end attack against state of the art API call based
K. Vaswani, and M. Costa. Oblivious multi-party machine learn- malware classifiers. CoRR, abs/1804.08778, 2018.
ing on trusted processors. In 25th USENIX Security Symposium, [141] I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici. Generic black-
USENIX Security 16, Austin, TX, USA, August 10-12, 2016., pages box end-to-end attack against state of the art API call based
619–636, 2016. malware classifiers. In 21st International Symposium, Research
[121] H. Olufowobi, R. Engel, N. Baracaldo, L. A. D. Bathen, S. Tata, in Attacks, Intrusions, and Defenses, RAID 2018, Heraklion, Crete,
and H. Ludwig. Data provenance model for internet of things Greece, September 10-12, 2018, Proceedings, pages 490–510.
(iot) systems. In Service-Oriented Computing, ICSOC 2016 Work- [142] A. S. Ross and F. Doshi-Velez. Improving the adversarial robust-
shops, Banff, AB, Canada, October 10-13, 2016., pages 85–91. ness and interpretability of deep neural networks by regularizing
[122] T. Orekondy, B. Schiele, and M. Fritz. Knockoff nets: Stealing their input gradients. In Proceedings of the Thirty-Second AAAI
functionality of black-box models. June 2019. Conference on Artificial Intelligence (AAAI), pages 1660–1669, 2018.
22
[143] A. Salem, Y. Zhang, M. Humbert, M. Fritz, and M. Backes. [164] S. Wagh, D. Gupta, and N. Chandran. Securenn: Efficient and
Ml-leaks: Model and data independent membership inference private neural network training. IACR Cryptology ePrint Archive,
attacks and defenses on machine learning models. CoRR, 2018:442, 2018.
abs/1806.01246, 2018. [165] B. Wang and N. Z. Gong. Stealing hyperparameters in machine
[144] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Du- learning. In IEEE Symposium on Security and Privacy (SP), San
mitras, and T. Goldstein. Poison frogs! targeted clean-label Francisco, California, USA, 21-23 May 2018, pages 36–52, 2018.
poisoning attacks on neural networks. CoRR, abs/1804.00792, [166] D. Wang, M. Ye, and J. Xu. Differentially private empirical
2018. risk minimization revisited: Faster and more general. In Annual
[145] M. Sharif, L. Bauer, and M. K. Reiter. On the suitability of lp- Conference on Neural Information Processing Systems, Long Beach,
norms for creating and preventing adversarial examples. In 2018 CA, USA, 4-9 December 2017, pages 2719–2728, 2017.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [167] J. Wang, G. Dong, J. Sun, X. Wang, and P. Zhang. Adversarial
Workshops, Salt Lake City, UT, USA, June 18-22, 2018, pages 1605– sample detection for deep neural network through model muta-
1613. tion testing. In Proceedings of the 41st International Conference on
[146] R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31,
Proceedings of the 22nd ACM SIGSAC Conference on Computer and 2019, pages 1245–1256.
Communications Security, Denver, CO, USA, October 12-16, 2015, [168] J. Wang, J. Sun, P. Zhang, and X. Wang. Detecting adversarial
pages 1310–1321, 2015. samples for deep neural networks through mutation testing.
[147] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership CoRR, abs/1805.05010, 2018.
inference attacks against machine learning models. In 2017 IEEE [169] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal
Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, security analysis of neural networks using symbolic intervals. In
May 22-26, 2017, pages 3–18, 2017. 27th USENIX Security Symposium, USENIX Security 2018, Balti-
[148] C. Song, T. Ristenpart, and V. Shmatikov. Machine learning more, MD, USA, August 15-17, 2018., pages 1599–1614, 2018.
models that remember too much. In Proceedings of ACM SIGSAC [170] X. O. Wang, K. Zeng, K. Govindan, and P. Mohapatra. Chaining
Conference on Computer and Communications Security, CCS 2017, for securing data provenance in distributed information net-
Dallas, TX, USA, October 30 - November 03, 2017, pages 587–601. works. In 31st IEEE Military Communications Conference, MIL-
[149] S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient COM, Orlando, FL, USA, October 29 - November 1, 2012, pages 1–6.
descent with differentially private updates. In IEEE Global Con- [171] Y. Wang and K. Chaudhuri. Data poisoning attacks against online
ference on Signal and Information Processing, GlobalSIP 2013, Austin, learning. CoRR, abs/1808.08994, 2018.
TX, USA, December 3-5, 2013, pages 245–248, 2013.
[172] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli.
[150] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixelde-
Is feature selection secure against training data poisoning? In
fend: Leveraging generative models to understand and defend
Proceedings of the 32nd International Conference on Machine Learning,
against adversarial examples. CoRR, abs/1710.10766, 2017.
ICML 2015, Lille, France, 6-11 July 2015, pages 1689–1698, 2015.
[151] A. Stasinopoulos, C. Ntantogian, and C. Xenakis. Commix:
[173] H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli.
automating evaluation and exploitation of command injection
Support vector machines under adversarial label contamination.
vulnerabilities in web applications. Int. J. Inf. Sec., 18(1):49–72,
Neurocomputing, 160:53–62, 2015.
2019.
[152] J. Steinhardt, P. W. Koh, and P. S. Liang. Certified defenses for [174] H. Xiao, H. Xiao, and C. Eckert. Adversarial label flips attack on
data poisoning attacks. In Annual Conference on Neural Information support vector machines. In 20th European Conference on Artificial
Processing Systems, 4-9 December 2017, Long Beach, CA, USA, pages Intelligence (ECAI), Montpellier, France, August 27-31, 2012, pages
3520–3532, 2017. 870–875, 2012.
[153] Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and [175] Q. Xiao, K. Li, D. Zhang, and W. Xu. Security risks in deep
D. Kroening. Concolic testing for deep neural networks. In learning implementations. In 2018 IEEE Security and Privacy (SP)
Proceedings of the 33rd ACM/IEEE International Conference on Auto- Workshops, San Francisco, CA, USA, May 24, 2018, pages 123–128.
mated Software Engineering, ASE 2018, Montpellier, France, Septem- [176] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. L. Yuille. Mitigating ad-
ber 3-7, 2018, pages 109–119. versarial effects through randomization. CoRR, abs/1711.01991,
[154] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. 2017.
Goodfellow, and R. Fergus. Intriguing properties of neural [177] X. Xie, L. Ma, F. Juefei-Xu, M. Xue, H. Chen, Y. Liu, J. Zhao,
networks. CoRR, abs/1312.6199, 2013. B. Li, J. Yin, and S. See. Deephunter: a coverage-guided fuzz
[155] K. Talwar, A. Thakurta, and L. Zhang. Private empirical risk testing framework for deep neural networks. In Proceedings of the
minimization beyond the worst case: The effect of the constraint 28th ACM SIGSOFT International Symposium on Software Testing
set geometry. CoRR, abs/1411.5417, 2014. and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019., pages
[156] S. Tian, G. Yang, and Y. Cai. Detecting adversarial examples 146–157.
through image transformation. In Proceedings of the Thirty-Second [178] L. Xu, W. Jia, W. Dong, and Y. Li. Automatic exploit generation
AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, for buffer overflow vulnerabilities. In 2018 IEEE International
USA, February 2-7, 2018, pages 4139–4146, 2018. Conference on Software Quality, Reliability and Security (QRS) Com-
[157] Y. Tian, K. Pei, S. Jana, and B. Ray. Deeptest: automated testing panion, Lisbon, Portugal, July 16-20, 2018, pages 463–468.
of deep-neural-network-driven autonomous cars. In Proceedings [179] W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting
of the 40th International Conference on Software Engineering, ICSE adversarial examples in deep neural networks. In 25th Annual
2018, Gothenburg, Sweden, May 27 - June 03, 2018, pages 303–314. Network and Distributed System Security Symposium, NDSS 2018,
[158] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. D. Mc- San Diego, California, USA, February 18-21, 2018, 2018.
Daniel. Ensemble adversarial training: Attacks and defenses. [180] A. C. Yao. Protocols for secure computations (extended abstract).
CoRR, abs/1705.07204, 2017. In 23rd Annual Symposium on Foundations of Computer Science,
[159] F. Tramèr, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. Chicago, Illinois, USA, 3-5 November 1982, pages 160–164, 1982.
McDaniel. The space of transferable adversarial examples. CoRR, [181] W. You, P. Zong, K. Chen, X. Wang, X. Liao, P. Bian, and
abs/1704.03453, 2017. B. Liang. Semfuzz: Semantics-based automatic generation of
[160] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. proof-of-concept exploits. In Proceedings of the 2017 ACM SIGSAC
Stealing machine learning models via prediction apis. In 25th Conference on Computer and Communications Security (CCS), Dallas,
USENIX Security Symposium, USENIX Security 16, Austin, TX, TX, USA, October 30 - November 03, 2017, pages 2139–2154.
USA, August 10-12, 2016., pages 601–618, 2016. [182] X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang,
[161] S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei. Towards de- H. Huang, X. Wang, and C. A. Gunter. Commandersong: A
mystifying membership inference attacks. CoRR, abs/1807.09173, systematic approach for practical adversarial voice recognition.
2018. In 27th USENIX Security Symposium, USENIX Security 2018, Bal-
[162] M. Veale, R. Binns, and L. Edwards. Algorithms that remem- timore, MD, USA, August 15-17, 2018., pages 49–64, 2018.
ber: Model inversion attacks and data protection law. CoRR, [183] V. Zantedeschi, M. Nicolae, and A. Rawat. Efficient defenses
abs/1807.04644, 2018. against adversarial attacks. In Proceedings of the 10th ACM
[163] J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Workshop on Artificial Intelligence and Security, AISec@CCS 2017,
Softw., 11(1):37–57, 1985. Dallas, TX, USA, November 3, 2017, pages 39–49, 2017.
23
[184] J. Zhang, K. Zheng, W. Mou, and L. Wang. Efficient private

ERM for smooth objectives. In Proceedings of the Twenty-Sixth
International Joint Conference on Artificial Intelligence, IJCAI 2017,
Melbourne, Australia, August 19-25, 2017, pages 3922–3928, 2017.
[185] M. Zhang, Y. Zhang, L. Zhang, C. Liu, and S. Khurshid.
Deeproad: Gan-based metamorphic testing and input validation
framework for autonomous driving systems. In Proceedings of
the 33rd ACM/IEEE International Conference on Automated Software
Engineering, ASE 2018, Montpellier, France, September 3-7, 2018,
pages 132–142.
[186] T. Zhang, Z. He, and R. B. Lee. Privacy-preserving machine
learning through data obfuscation. CoRR, abs/1807.01860, 2018.
[187] T. Zhang and Q. Zhu. A dual perturbation approach for differen-
tial private admm-based distributed empirical risk minimization.
In Proceedings of ACM Workshop on Artificial Intelligence and Secu-
rity (AISec), Vienna, Austria, October 28, 2016, pages 129–137.

Adversarial ML Survey Paper

Uploaded by

Copyright:

Available Formats

Adversarial ML Survey Paper

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adversarial ML Survey Paper

Uploaded by

Copyright:

Available Formats

1

Towards Privacy and Security of Deep Learning

D EEP learning has gained a tremendous success and

In this study, we first introduce the background of deep

Fig. 2: Deep learning systems and the encountered attacks

System Functionality Output Cost/M-times

training meta- and substitute model. It can restore precise

4.3 Extracted Information

TABLE 3: Evaluation on model extraction attacks as per stolen information.

Recovery Rate (%) for Models

corresponding outputs as well as class probabilities and

methods: mislabel original data, and craft confused data.

6.2 Poisoning Approach

Z(·) is the output of second-to-last layer, usually indicates

Finding 16. Different positions should have different weights for

vulnerabilities in future. At present, some work is about security and robustness

tization and smooth spatial filtering to reduce the effect 9 C ONCLUSION

8.7 Future direction of attack and defense

[184] J. Zhang, K. Zheng, W. Mou, and L. Wang. Efficient private

You might also like