Adversarial ML Survey Paper
Adversarial ML Survey Paper
Adversarial ML Survey Paper
Abstract—Deep learning has gained tremendous success and great popularity in the past few years. However, recent research found
that it is suffering several inherent weaknesses, which can threaten the security and privacy of the stackholders. Deep learning’s wide
arXiv:1911.12562v1 [cs.CR] 28 Nov 2019
use further magnifies the caused consequences. To this end, lots of research has been conducted with the purpose of exhaustively
identifying intrinsic weaknesses and subsequently proposing feasible mitigation. Yet few is clear about how these weaknesses are
incurred and how effective are these attack approaches in assaulting deep learning. In order to unveil the security weaknesses and aid
in the development of a robust deep learning system, we are devoted to undertaking a comprehensive investigation on attacks towards
deep learning, and extensively evaluating these attacks in multiple views. In particular, we focus on four types of attacks associated
with security and privacy of deep learning: model extraction attack, model inversion attack, poisoning attack and adversarial attack. For
each type of attack, we construct its essential workflow as well as adversary capabilities and attack goals. Many pivot metrics are
devised for evaluating the attack approaches, by which we perform a quantitative and qualitative analysis. From the analysis, we have
identified significant and indispensable factors in an attack vector, e.g., how to reduce queries to target models, what distance used for
measuring perturbation. We spot light on 17 findings covering these approaches’ merits and demerits, success probability, deployment
complexity and prospects. Moreover, we discuss other potential security weaknesses and possible mitigation which can inspire relevant
researchers in this area.
Index Terms—deep learning, poisoning attack, adversarial attack, model extraction attack, model inversion attack
1 I NTRODUCTION
marize 4 types of attacks. For each attack, we construct TABLE 1: Notations used in this paper
their attack vectors and pivot properties, i.e., workflow,
Notation Explanation
adversary model and attack goal. This could ease the
D dataset
understanding of how these attacks are executed and
x = {x1 , . . . , xn } inputs in D
facilitate the development of counter measures. y = {y 1 , . . . , y n } predicted labels of x
• Quantitative and qualitative analysis. We develop a yt = {yt1 , . . . , ytn } true labels of x
||x − y||2 the Euclidean distance for x and y
number of metrics that are pertinent to each type of F model function
attacks, for a better assessment of different approaches. Z output of second-to-last layer
These metrics also serve as highlights in the development L loss function
w weights of parameters
of attack approaches that facilitate more robust attacks. b bias of parameters
• New findings. Based on the analysis, we have concluded λ hyperparameters
Lp distance measurement
17 findings that span in the four attacks, and uncover δ perturbation to input x
implicit properties for these attack methods. Beyond these
attacks, we have identified other related security problems
such as secure implementation, interpretability, discrimi- addition, their study focuses on privacy with no mention of
nation and defense techniques, which are promising re- other attack types.
search topics in future. Liu et al. [103] aim to provide a comprehensive liter-
ature review in two phases of machine learning, i.e., the
training phase and the testing/inferring phase. As for the
2 R ELATED W ORK corresponding defenses, they sum up with four categories.
There is a line of works that survey and evaluate attacks In addition, this survey focuses more on data distribution
toward machine learning or deep learning. drifting caused by adversarial samples and sensitive infor-
Barreno et al. conduct a survey of machine learning mation violation problems in statistical machine learning
security and present a taxonomy of attacks against machine algorithms.
learning systems [25]. They experiment on a popular statis- Akhtar et al. [14] first conduct a comprehensive study
tical spam filter to illustrate their effectiveness. Attacks are on adversarial attacks on deep learning in computer vision.
dissected in terms of three dimensions, including workable They summarize 12 attack methods for classification, and
manners, influence to input and generality. Amodei et al. study attacks on models or algorithms such as autoen-
[18] introduce five possible research problems related to ac- coders, generative model, RNNs and so on. They also study
cident risk and discuss probable approaches, with an exam- attacks in the real world and summarize defenses. However,
ple of cleaning robot, according to how it works. Papernot et they only study the computer vision part of adversarial
al. [130] study the security and privacy of machine learning attack.
systematically. They summarize some attack and defense
methods, and propose a threat model for machine learn- 3 OVERVIEW
ing. It introduces attack methods in training and inferring
process, black-box and white-box model. However, methods 3.1 Deep Learning System
they summarized in each attack are not comprehensive Deep learning is inspired by biological nervous systems and
enough. Besides, they don’t involve much about defenses composed of thousands of neurons to transfer information.
or the most widely used deep learning models. Figure 2 demonstrates a classic deep learning model. Typi-
Bae et al. [22] review the attack and defense methods cally, it exhibits to the public an overall process including: 1)
under security and privacy AI concept. They inspect eva- Model Training, where it converts a large volume of data into
sion and poisoning attacks, in black-box and white-box. In a data model, and 2) Model Prediction, where the model can
4
be used for prediction as per input data. Prediction tasks are or decision boundaries [128], which are collectively known
widely used in different fields. For instance, image classifi- as model extraction attack (cf. Section 4).
cation, speech recognition, natural language processing and 3 Inputs and results of predictions. As for prediction data
malware detection are all pertinent applications for deep and results, curious service providers may retain user’s
learning. prediction data and results to extract sensitive information.
To formalize the process of deep learning systems, we These data may also be attacked by miscreants who intend
present some notations in Table 1. Given a learning task, the to utilize these data to make their own profits. On the
training data can be represented as x = {x1 , x2 , . . . , xn } ∈ other hand, attackers may submit carefully modified input
D. Let F be the deep learning model and it computes the to fool models, which is dubbed adversarial example [154].
corresponding outcomes y based on the given input x, i.e., An adversarial example is crafted by inserting slight pertur-
y = F (x). yt is the true label of input x. Within the course bations into the original normal sample which are not easy
of model training, there is a loss function L to measure the to perceive. This is recognized as adversarial attack or evasion
predication error between predicted result and true label, attack (cf. Section 7).
and the training process intends to gain a minimal error
value via fine-tuning parameters. The loss function can be
3.3 Commercial Off-The-Shelf
computed as L = Σ16i6n ||yti − y i ||2 . So the process of
model training can be formalized as [136]: Machine learning as a Service (MLaaS) has gained the mo-
mentum in recent years [99], and lets its clients benefit from
X machine learning without establishing their own predictive
arg min ||yti − y i ||2 (1) models. To ease the usage, the MLaaS suppliers make a
F 16i6n number of APIs for clients to accomplish machine learning
tasks, e.g., classifying an image, recognizing a slice of audio
3.2 Risks in Deep Learning or identifying the intent of a passage. Certainly, these ser-
One deep learning system involves several pivotal assets vices are the core competence which also charge clients for
that are confidential and significant for the owner. As their queries. Table 2 shows representative COTS as well
per the phases in Figure 2, risks stem from three types as their functionalities, outputs to the clients, and usage
of concerned assets in deep learning systems: 1) training charges. Taking Amazon Image Recognition for example, it
dataset. 2) trained model including structure, algorithms can recognize the person in a profile photo and tell his/her
and parameters. 3) inputs and results of predictions. gender, age range, emotions. Amazon charges this service
1 Training dataset. High-quality training data is significant with 1,300 USD per one million queries.
and vital for a better performance of the deep learning
model. As a deep learning system has to absorb plenty 3.4 Dataset
of data to form a qualified model, mislabelled or inferior
Here we present common datasets used in our paper. In im-
data can hinder this formation and affect the model’s qual-
age field, there are MNIST [95], CIFAR-10 [93], ImageNet [2],
ity. These kinds of data can be intentionally appended to
GTSRB [5], GSS [4], IJB-A [7] and so on. In text field, reviews
the benign by attackers, which is referred to as poisoning
from IMDB [8] are usually used. In speech field, corpora
attack (cf. Section 6). On the other hand, the collection of
such as Mozilla Common Voice [10] are used. In malware
training data takes lots of human resources and time costs.
field, datasets include DREBIN [1], Microsoft Kaggle [9], and
Industry giants such as Google have far more data than
millions of files or programs they found.
other companies. They are more inclined to share their state-
of-the-art algorithms [83] [46], but they barely share data.
Therefore, training data is crucial and considerably valuable 4 M ODEL E XTRACTION ATTACK : YOUR M ODEL IS
for a company, and its leakage means big loss of assets. M INE
However, recent research found there is an inverse flow
from prediction results to training data [161]. It leads that 4.1 Introduction
one attacker can infer out the confidential information in Model extraction attack attempts to duplicate a machine
training data, merely relying on authorized access to the learning model through the provided APIs, without prior
victim system. It is literally noted as model inversion attack knowledge of training data and algorithms [160]. To for-
whose goal is to uncover the composition of the training malize, given a specifically selected input X , one attacker
data or its specific properties (cf. Section 5). queries the target model F and obtains the corresponding
2 Trained model. The trained model is a kind of data prediction results Y . Then the attacker can infer or even
model, which is an abstract representation of its training extract the entire in-use model F . With regard to an arti-
data. Modern deep learning systems have to cope with ficial neural network y = wx + b, model extraction attack
a large volume of data in the training phase, which has can somehow approximate the values of w and b. Model
a rigorous demand for high performance computing and extraction attacks cannot only destroy the confidentiality
mass storage. Therefore, the trained model is regarded as the of model, thus damaging the interests of its owners, but
core competitiveness for a deep learning system, endowed also construct a near-equivalent white-box model for further
with commercial value and creative achievements. Once it attacks such as adversarial attack [128].
is cloned, leaked or extracted, the interests of model owners Adversary Model. This attack is mostly carried out under
will be seriously damaged. More specifically, attackers have a black-box model and attackers only have access to pre-
started to steal model parameters [160], functionality [122] diction APIs. Their capabilities are limited in three ways:
5
TABLE 2: Commercial MLaaS systems and the provided functionalities, output for clients and charges per 1M queries
samples, which moved to the nearest boundary between Equation solving is deemed as an efficient way to recover
current class and all other classes. This technology aims parameters [160] or hyperparameters [165] in linear algo-
not to maximize the accuracy of substitute models, but en- rithms, since it has an upper bound for sufficient queries.
sures that samples arrive at decision boundaries with small As claimed in [160], d-dimensional weights can be cracked
queries. Juuti et al. [84] extended JbDA to Jb-topk, where with only d + 1 queries. However, this approach is hardly
samples move to the nearest k boundaries between current applicable to the non-linear deep learning algorithms. So
class and any other class. They produced transferable tar- researchers turn to the compelling training-based approach.
geted adversarial samples rather than untargeted [128]. In For instance, [119] trains a classifier, dubbed as metamodel,
terms of model knowledge, Papernot et al. [127] found that over the target model so as to predict architectural infor-
model architecture knowledge was unnecessary because a mation which is categorical or limited real values. This
simple model could be extracted by more complex model, approach cannot cope with complex model attributes such
such as DNN. as decision boundary and functionality. That drives the
prevalence of substitute model as it serves as an incarna-
4.3.4 Functionalities tion of the target model which behaves quite similarly. As
Similar functionalities refer to replicating the original model such, the substitute model has approximated attributes and
as much as possible on classification results. The primary prediction results. Additionally, it can be further used to
goal is to construct a predictive model that have closest steal model’s training data [84] and generating adversarial
input-output pairs with the original. In [122] [45], they try to examples [127].
improve classification accuracy of substitute model. Silva et
Finding 2. To learn a substitute model of deep learning models de-
al. [45] used problem domain dataset, non-problem domain
mands more queries than to infer parameters or hyperparmaeters
dataset, and their mixture to train a model respectively. They
in simple machine learning models.
found model trained with non-problem domain dataset also
did well in accuracy. Besides, Orekondy et al. [122] assumed To be specific, attackers requires thousands of queries
attackers had no semantic knowledge over model outputs. on machine learning models, but have to query over 11,000
They chose very large datasets and selected suitable samples queries for stealing parameters in a simple neural net-
one by one to query the black-box model. Reinforcement work [160]. Deep learning models are more challenging
learning approach was introduced to improve query effi- because they are highly nonlinear, non-convex, and maybe
ciency and reduce query counts. over-fitting. Additionally, the parameters will be drastically
increased along with the increment of layers and neurons.
4.4 Analysis Finding 3. Reducing queries, which can save monetary costs for
Model extraction attack is an emerging field of attack. In a pay-per-query MLaaS commercial model and also be resistant
this study, we totally survey 8 related papers and clas- to attack detection, has become an intriguing research direction in
sify them by target information as shown in Table 3. We recent years.
sort them by the stolen information and evaluate them on
The requirement of query reduction arises due to the
multiple aspects including employed approaches, strategies
high expense of queries and query amount limitation. In our
for reducing queries, recovery rate for applicable models.
investigated papers, [119] trains a metamodel–KENNEN-
Recovery rate means how many percent of information
IO for optimizing the query inputs. [128], leverage reservoir
can be stolen, and is computed by differing the inferred
sampling to select representative samples for querying, and
data with that of the original model. However, the attacks
[122] proposes two sampling strategies, i.e., random and
on boundary decision cannot be directly measured in this
adaptive to reduce queries. Moreover, active learning [97],
way. Thus, we use the misclassification rate of generated
natural evolutionary strategies [78], optimization-based ap-
adversarial examples as an alternative, since it reveals the
proaches [44] [140] have been adopted for query reduction.
similarity between the simulative model and the original
model to some extent. Based on the statistics, we draw the Finding 4. Model extraction attack is evolving from a puzzle
following conclusions. solving game to a simulation game with cost-profit tradeoffs.
Finding 1. Training substitute model is by no doubt the dominant MLaaS magnates like Amazon and Google have a
method in model extraction attacks with manifold advantages. tremendous scale of networks running behind services. To
7
• Generate samples by model. This method aims to produce [143] or the target model [137] [111]. The training process
training records by training generated models such as is to select some records from both inside and outside the
GAN. Generated samples are similar to that from the substituted dataset, and obtain the class probability vector
target training dataset. Improving the similarity ratio will through target model or shadow model. The vector and
make this method more useful. the label of record are taken as input, and whether this
Both [102] and [67] attacked generated models. Liu et record belongs to substituted dataset is taken as output.
al. [102] presented a new white-box method for single Then training and learning attack model.
membership attacks and co-membership attacks. The ba-
sic idea was to train a generated model with the target 5.2.4 Membership Determination
model, which took the output of the target model as input, Given one input, this component is responsible for de-
and took the similar input of the target model as output. termining whether the query input is a member of the
After training, the attack model could generate data that training set of the target system. To accomplish the goal,
is similar to the target training dataset. Considering about the contemporary approaches can be categorized into three
the difficult implementation of CNN in [147], Hitaj et al. classes:
[69] proposed a more general MIA method. They per- • Attack model-based Method. In inference phase, attackers
formed a white-box attack in the scenario of collaborative first put record to be judged into the target model, and
deep learning models. They constructed a generator for get its class probability vector, then put the vector and
target classification model, and used them to form a GAN. label of record into the attack model, and get the mem-
After training, GAN could generate data similar to the bership of this record. Pyrgelis et al. [137] implemented
target training set. However, this method was limited in MIA for aggregating location data. The main idea was to
that all samples belonging to the same classification need use priori position information and attack through distin-
to be visually similar, and it could not generate an actual guishability game process with a distinguishing function.
target training pattern or distinguish them under the same They trained a classifier (attack model) as distinguishing
class. function to determine whether data is in target dataset.
• Heuristic Method. This method uses prediction probability,
5.2.2 Shadow Model Training instead of an attack model, to determine the membership.
Attackers have sometimes to transform the initial data for Intuitively, the maximum value in class probabilities of
further determination. In particular, shadow model is pro- a record in the target dataset is usually greater than the
posed to imitate target model’s behavior by training on record not in it. But they require some preconditions
a similar dataset [147]. The dataset takes records by data and auxiliary information to obtain reliable probability
synthesis as inputs, and their labels as outputs. Shadow vectors or binary results, which is a limitation to apply
model is trained on such dataset. It can provide class prob- to more general scenarios. How to lower attack cost
ability vector and classification result of a record. Shokri and reduce auxiliary information can be considered in
et al. [147] designed, implemented and evaluated the first the future study. Fredrikson et al. [52] tried to construct
MIA attack method for a black-box model by API calls in the probability of whether a certain data appears in the
machine learning. They produced datasets similar to the target training dataset, according to the probability and
target training dataset and used the same MLaaS to train auxiliary information, such as error statistics or marginal
several shadow models. These datasets were produced by priors of training data. Then they searched for input data
model-based synthesis, statistics-based synthesis, noisy real which maximized the probability, and the obtained data
data and other methods. Shadow models were used to was similar to data in target training dataset. The third
provide training set (class labels, prediction probabilities attack method in Salem et al. [143] only required the
and whether data record belongs to shadow training set) for probability vector of outputs from the target model, and
the attack model. Salem et al. [143] relaxed the constraints in used statistical measurement method to compare whether
[147] (need to train shadow models on the same MLaaS, and the maximum classification probability exceeds a certain
the same distribution between datasets of shadow models value.
and target model), and used only one shadow model with- Long et al. [104] put forward Generalized MIA method,
out the knowledge of target model structure and training which was easier to attack non-overfitted data, different
dataset distribution. Here, the shadow model just tried to from [147]. They trained a number of reference models
capture the membership status of records in a different similar to the target model, and chose vulnerable data
dataset. according to the output of reference models before Soft-
max, then compared outputs between the target model
5.2.3 Attack Model Training and reference models to calculate the probability of data
The attack model is a binary classifier. Its input is the belonging to target training dataset. Reference models
class probabilities and label of the record to be judged, and in this paper were used to mimic the target model, like
output is yes (means the record belongs to the dataset of shadow models. But they did not need an attack model.
target model) or no. Training dataset is usually required Hayes et al. [67] proposed a method of attacking generated
to train the attack model. The problem is that the output models. The idea was that attackers determined which
label of whether a record belongs to the dataset of target dataset from attackers belonged to target training set,
model cannot be obtained. So here attackers often generate according to the probability vector output by classifier.
substituted dataset by data synthesis. The input of this Higher probability was more likely from target training
training is generated either by the shadow model [147] set (they selected the upper n sizes). In white-box, the
9
classifier was constructed by that of target model. In too many times. The synthesized data, which could be gen-
black-box, they used obtained data by querying target erated either by the statistical distribution of known training
model to reproduce classifier with GAN. data, or a generative adversarial network, can effectively
sample the original data. Hence, it is employed to train
a shadow model, a substitute for the target. It avoids too
5.3 Property Inference Attack many queries to the target model and thereby lowers the
Property inference attack (PIA) mainly deduces properties perception by security mechanisms.
in the training dataset. For instance, how many people have
Finding 7. MIA is essentially a process that explicitly expresses
long hair or wear dresses in a generic gender classifier. Is
the logical relations contained in the trained model.
there enough women or minorities in dataset of common
classifiers. The approach is largely same to a membership This kind of attacks requires many datasets and much
inference attack. In this section, we only remark main differ- time, but the obtained information is really limited (only 1
ences between model inversion attacks. bit [19] [53]). So the development of model inversion attack
Data Synthesis. In PIA, training datasets are classified by is to obtain more overall information. For example, what is
including or not including a specific attribute [19]. the relationship between different training datasets. What’s
Shadow Model Training. In PIA, shadow models are more, another development is to increase the amount of the
trained by training sets with or without a certain property. In obtained information, for example, how to get details in a
[19] [53], they used several training datasets with or without single record.
a certain property, then built corresponding shadow models
to provide training data for meta-classifier. Finding 8. Research about membership inference (10/13) is more
Attack Model Training. Here, attack model is usually also than property inference (4/13).
a binary classifier. Ateniese et al. [19] proposed a white- This is because membership inference now has a more
box PIA method by training a meta-classifier. It took model general adaptation scenario, and it emerges earlier. Further-
features as input, and output whether the corresponding more, MIA can get more information than PIA in one-time
dataset contained a certain property. However, this ap- attack (just like training an attack model). A trained attack
proach did not work well on DNNs. To address this, Ganju model can be applied to many records in MIA, but only a
et al. [53] mainly studied how to extract feature values few properties in PIA. In [19], attackers want to know if their
of DNNs. The part of meta-classifier was similar to [19]. speech classifier was trained only with voices from people
Melis et al. [111] trained a binary classifier to judge dataset who speak Indian English. In [53], they try to find if some
properties in collaborative learning, which took updated classifiers have enough women or minorities in training
gradient values as input. Here the model is continuously dataset. In [33], they are interested in the global distribution
updated, so attacker could analyze updated information at of skin color. In [111], they want to know the proportion
each stage to infer properties. between black and asian people.
Finding 9. Studies about heuristic methods (6/13) and attack
5.4 Analysis model (7/13) nearly share on a fifty-fifty basis.
As shown in Table 4, we have totally surveyed 13
In heuristic methods, using probabilities is easy to im-
model/property inversion attack papers.
plement, but barely works (0.5 precision and 0.54 recall)
Finding 5. Shadow model has a number of advantages over other on MNIST dataset [143]. Obtaining similar datasets usu-
methods in model inversion attack. ally needs to train a generative model [67] [102] [69]. In
attack model methods, attackers need to train an attack
Shadow models (4/13) are used in both MIA (2/13) [147] model [137] [19]. Shadow models [147] [143] [19] are pro-
[143] and PIA (2/13) [19] [53]. It is superior than other posed to provide datasets for the attack model, but increase
methods in manifolds: 1) requiring no addition auxiliary training costs.
information [52], which is underlied by the assumption that
a higher confidence of prediction indicates the presence of
data records of a higher probability. 2) providing true in- 6 P OISONING ATTACK : C REATE A BACKDOOR IN
formation as training dataset for attack model. For a model YOUR M ODEL
F and its training dataset D, training attack model needs Poisoning attack seeks to downgrade deep learning sys-
information of label x, F (x), and whether x ∈ D. If using
tems’ predictions by polluting training data. Since it hap-
a shadow model, shadow model F and its dataset D are
pens before the training phase, the caused contamination is
known. All information is from shadow model and corre-
usually inextricable by tuning the involved parameters or
sponding dataset. If using the target model, F is the target
adopting alternative models.
model and D is the training dataset. However, attackers do
not know D. So information whether x ∈ D need to be
replaced by whether x ∈ D0 , where D0 is similar to D. 6.1 Introduction
In the early age of machine learning, poisoning attack had
Finding 6. Data synthesis is a common practice to conduct model
been proposed as a non-trivial threat to the mainstream
inversion attack, compared to direct querying.
algorithms. For instance, Bayes classifiers [118], Support
Data synthesis could generate data similar to target Vector Machine (SVM) [28] [31] [174] [173] [34], Hierarchical
dataset conveniently [147] [52] [69] [102], without querying Clustering [29], Logistic Regression [110] are all suffering
10
TABLE 4: Evaluation on model inversion attack. It presents how the “Workflow” proceeds for each work, and its “Goal”,
either mia or pia. We select one experimental “Dataset” in the works and the corresponding “Precision” achieved as well
as the target “Model”. “Knowledge” denotes the acquisitions of attackers to the model, and “Application” is the applicable
domain of the target model. “structured data” refers to any data in a fixed field within a record or file [27].
Workflow
Paper Goal Precision Dataset Model Knowledge Application
Step 1 Step 2 Step 3 Step 4
Truex et al. [161] X X MIA 61.75% MNIST DT Black image
Fredrikson et al. [52] X MIA 38.8% GSS DT Black image
Pyrgelis et al. [137] X X MIA - TFL MLP Black structured data
Shokri et al. [147] X X X X MIA 51.7% MNIST DNN Black image
Hayes et al. [67] X X MIA 58% CIFAR-10 GAN Black image
Long et al. [104] X MIA 93.36% MNIST NN Black image
Melis et al. [111] X X MIA/PIA - FaceScrub DNN White image
Liu et al. [102] X X MIA - MNIST GAN White image
Salem et al. [143] X X X X MIA 75% MNIST CNN Black image
Ateniese et al. [19] X X X X PIA 95% - SVM White speech
Buolamwini et al. [33] X PIA 79.6% IJB-A DNN Black image
Ganju et al. [53] X X X X PIA 85% MNIST NN White image
Hitaj et al. [69] X X MIA - - CNN White image
TABLE 5: Evaluation on poisoning attack. The data denotes an attacker needs to contaminate how many percent of training
data “Poison Percent” and achieves how many “Success Rate” under specific “Dataset”. “Model” indicates the attacked
model. “Timeliness” denotes whether the poison attack is in an online or offline setting. “Damage” means how many
predictions can be impacted. Attackers may possess two different “Knowledge”, either black-box or white-box, and make
poisoned model predict as expected, i.e., “Targeted”, or not. “structured data” is the same as Table 4. “LR” is linear
regression. “OLR” is online logistic regression. “SLHC” is single-linkage hierarchical clustering.
Paper Success Rate Dataset Poison Percent Model Timeliness Damage Knowledge Targeted Application
Xiao et al. [172] 20% 11944 files 5% LASSO offline - Black No malware
Muñoz-González et al. [115] 25% MNIST 15% CNN offline 30% error Black No image, malware
Jagielski et al. [79] 75% Health care dataset 20% LASSO offline 75% error Black No structured data
Alfeld et al. [17] - - - LR offline - White Yes -
Shafahi et al. [144] 60% CIFAR-10 5% DNN offline 20% error White Yes image
Wang et al. [171] 90% MNIST 100% OLR online - White Both image
Biggio et al. [29] - MNIST 1% SLHC offline - White Yes image, malware
statistical attack which only required limited knowledge of gradually and get them compromised. That causes vary-
training process. ing difficulties for a successful attack. In particular, online
The major research focuses on an offline environment attacks have to consider more factors such as the order of
where the classifier is trained on fixed inputs. However, fed data, the evasiveness of poisonous data. It implies that
training often happens as data arrives sequentially in a more studies start with offline attacks. However, in reality,
stream, i.e., in an online setting. Wang et al. [171] conducted more and more models are trained online. Due to the drive
poisoning attacks for online learning. They formalized the of profits, it is expected that there are emerging more attacks
problem into semi-online and fully-online, with three attack against online training in the near future.
algorithms of incremental, interval and teach-and-reinforce.
Finding 11. A few (2/7) papers use confused data with the
6.2.2 Confused Data purpose of implanting a backdoor into the model.
Learning algorithms elicit representative features from a
large amount of information for learning and training. How- In terms of difficulty, making mistakes inadvertently or
ever, if attackers submit crafted data with special features, imperceptibly is more difficult than making misclassifica-
the classifier may learn fooled features. For example, mark- tion publicly for a model. A backdoor is such an impercepti-
ing figures with number “6” as a turn left sign and putting ble mistake. A model performs well under normal functions,
them into the dataset, then images with a bomb may be while it opens the door for attackers when they need it.
identified as a turn-left sign, even if it is in fact a STOP sign. In [144], the attacker adds a low-transparency watermark
Xiao et al. [172] directly investigated the robustness of into samples to allow some indivisible features overlapping.
popular feature selection algorithms under poisoning at- In the prediction phase, attacker can use this watermark to
tack. They reduced LASSO to almost random choices of open the backdoor, causing misclassification. In addition,
feature sets by inserting less than 5% poisoned training attackers may use curious characteristics to cheat model
samples. Shafahi et al. [144] found a specific test instance to because it just learns useless features [172].
control the behavior of classifier with backdoor, without any
access to data collection or labeling process. They proposed Finding 12. Poisoning attacks essentially seek for a globally or
a watermarking strategy and trained a classifier with multi- locally distributional disturbance over training data.
ple poisoned instances. Low-opacity watermark of the target
It is well-known that the performance of learning is
instance is added to poisoned instances to allow overlap of
largely dependent on the quality of training data. Quality
some indivisible features.
data is commonly acknowledged as being comprehensive,
unbiased, and representative. In the process of data poison-
6.3 Analysis ing, wrongly labeled or biased data is deliberately crafted
We investigated 7 papers on poisoning attack in total and and added into training data, degrading the overall quality.
evaluate them over 9 metrics in Table 5. Based on the
analysis, we conclude the following findings.
Finding 10. Most attacks (6/7) are under an offline setting, and 7 A DVERSARIAL ATTACK : U TILIZE THE W EAK -
only one [171] implements an online attack via online gradient
NESS OF YOUR M ODEL
descent.
In an offline setting, model owners collect the training Similar to poisoning attack, adversarial attack also makes a
data from multiple sources and train the models for once. model classify a malicious sample wrongly. Their difference
Attackers have to contaminate the data before the training. is that poisoning attack inserts malicious samples into the
However, in an online setting, the trained model can be training data, directly contaminating model, while adver-
updated periodically with newly coming training data. It sarial attack leverages adversarial examples to exploit the
allows attackers to feed poisonous data into the models weaknesses of the model and gets a wrong prediction result.
12
βpq represents the impact on all other outputs. Larger value probability, and generally 0 < ζ << 1. The purpose is to
in this map means greater possibility to fool the network. seek δ which could fool F (·) on almost any sample from µ.
They pick (p∗ , q ∗ ) to attack.
F (x + δ) 6= F (x), f or most x ∼ µ
X ∂Z(x)t
αpq = s.t. kδkp ≤ ξ (14)
∂xi
i∈{p,q} Px∼µ (F (x + δ) 6= F (x)) ≥ 1 − ζ
X X ∂Z(x)j (8)
βpq =( ) − αpq
j
∂xi 7.2.2 Black-box attack in image classification
i∈{p,q}
Finding small perturbations often requires white-box mod-
(p∗ , q ∗ ) = arg max(−αpq · βpq ) · (αpq > 0) · (βpq < 0) els to calculate gradients. However, this method does not
(p,q)
work in a black-box setting due to some constraints in-
NewtonFool attack. NewtonFool [81] uses softmax output cluding gradients. Therefore, researchers propose several
Z(x). In Equation 9, x0 is the original sample and l = F (x0 ). methods to overcome these constraints.
δi = xi+1 − xi is the perturbation at iteration i. They tried Step 2.1. Training substitute model. As mentioned in
to find small δ so that Z(x0 + δ)l ≈ 0. Starting with x0 , they Section 4, stealing decision boundaries in model extraction
approximated Z(xi )l using a linear function step by step as attack and training substitute model can facilitate black-box
follows. adversarial attacks [128] [127] [84]. Papernot et al. [128] pro-
Z(xi+1 )l ≈ Z(xi )l + ∇Z(xi )l · (xi+1 − xi ), i = 0, 1, 2, · · · posed a method based on an alternative training algorithm
(9) using synthetic data generation in black-box settings.
Training substitute model needs that AEs can transfer
C&W attack. C&W [38] tries to find small δ in L0 , L2 , from the substitute model to the target model. Gradient
and L∞ norms. Different from L-BFGS, C&W optimizes Aligned Adversarial Subspace [159] estimated previously
following goals, unknown dimensions of the input space. They found that a
large part of the subspace is shared for two different models,
min kδkp + c · f (x + δ) thus achieving transferability. Further, they determined suf-
δ (10) ficient conditions for the transferability of model-agnostic
s.t. x + δ ∈ [0, 1]n perturbations.
Step 2.2. Estimating gradients. This method needs many
c is a hyperparameter and f (·) is defined as:
queries to estimate gradients and then search for AEs.
Narodytska et al. [116] used a technique based on local
f (x + δ) = max(max{Z(x + δ)i : i 6= t} − Z(x + δ)t , −K) search to construct the numerical approximation of network
(11) gradients, and then constructed perturbations in an image.
Moreover, Ilyas et al. [77] introduced a more rigorous and
f (·) is an artificially defined function, the above is just practical black-box threat model. They applied a natural
one case. Here, f (·) 6 0 if and only if classification result evolution strategy to estimate gradients and perform black-
is adversarial targeted label t. K guarantees x + δ will be box attacks, using 2∼3 orders of magnitude less queries.
classified as t with high confidence.
EAD attack. EAD [42] combines L1 and L2 penalty func-
tions. In Equation 12, f (x + δ) is the same as C&W and t is 7.2.3 Attack in other fields
the targeted label. Obviously, C&W attack becomes a special Except for the image classification, adversarial attacks are
EAD case when β = 0 [42]. also used in other fields, such as speech recognition [57]
2 [182], text processing [54], malware detection [75] [131] [133]
min c · f (x + δ) + β kδk1 + kδk2 [92] and so on.
δ (12)
s.t. x + δ ∈ [0, 1]n In the speech field, Yuan et al. [182] embedded voice com-
mands into songs, and thereby attacked speech recognition
OptMargin attack. OptMargin [68] is an extension of C&W systems, not being detected by humans. DeepSearch [39]
L2 attack by adding many objective functions around x. In could convert any given waveform into any desired target
Equation 13, x0 is the original example. x = x0 + δ is adver- phrase through adding small perturbations on speech-to-
sarial. y is the true label of x0 . vi are perturbations applied text neural networks.
to x. OptMargin guarantees not only x fools network, but In the text processing field, DeepWordBug [54] gener-
also its neighbors x + vi . ated adversarial text sequences in black-box settings. They
2 adopted different score functions to better mutate words.
min kδk2 + c · (f1 (x) + · · · + fm (x)) They minimized edit distance between the original and
δ
s.t. x + δ ∈ [0, 1]n modified texts, and reduced text classification accuracy from
90% to 30∼60%.
fi (x) = max(Z(x + vi )y − max{Z(x + vi )j : j 6= y}, −K)
(13) In the malware field, Rigaki et al. [138] used GANs to
avoid malware detection by modifying network behavior
UAP attack. UAP [113] is universal perturbations which to imitate traffic of legitimate applications. They can adjust
suit almost all samples of a certain dataset. In Equation 14, command and control channels to simulate Facebook chat
µ is the dataset that contains all samples. P represents network traffic by modifying the source code of malware.
14
Hu et al. [70] [71] and Rosenberg et al. [141] proposed meth- distance, 18.2% use L1 distance and 18.2% use L0 distance.
ods to generate adversarial malware examples in black- Considering image classification only, 70% attacks use L2
box to attack detection models. Dujaili et al. [15] proposed distance, 45% use L∞ distance, 10% use L1 distance and
SLEIPNIR for adversarial attack on binary encoded malware 20% use L0 distance.
detection. L0 distance reflects the number of changed elements, but
it is unable to limit the variation of each element. It suits
7.2.4 Attack against other models the scenes that only care about the number of perturbation
There is furthermore research in addition to DNN, such pixels, but not variation size. L1 distance is the absolute
as generative model, reinforcement learning and some ma- values summation of every element in perturbations, equiv-
chine learning algorithms. Mei et al. [110] identified the alent to Manhattan distance in 2D space. It limits the sum
optimal training set attack for SVM, logistic regression, and of all variations, but does not limit large perturbation of
linear regression. They proved the optimal attack can be individual elements. L∞ distance does not care about how
described as a bilevel optimization problem, which can be many elements have been changed, but only cares about
solved by gradient methods. Huang et al. [74] demonstrated the maximum of perturbations, equivalent to Chebyshev
that adversarial attack policies are also effective in reinforce- distance in 2D space. L2 distance is an Euclidean distance
ment learning, such as A3C, TRPO, DQN. Kos et al. [91] that considers all pixel perturbation, which is a more bal-
attempted to produce AEs using deep generative models anced and the most widespread metric. It takes into account
such as variational autoencoder. Their methods include a both the largest perturbation and the number of changed
classifier-based attack, and an attack on latent space. elements.
TABLE 6: Evaluation on adversarial attacks. This table presents “Success Rate” of these attacks in specific “Dataset” with
varying target “System” and “Model”. “Distance” implies how these works measure the distance between samples. “Real-
world” is used to distinguish the works that are also suitable for physical adversarial attacks. “Knowledge” is valued either
black-box or white-box. “Iterative” illustrates whether the optimization steps are iterative. “Targeted” differs whether an
attack is a targeted attack or not. “Application” covers the practical areas.
Paper Success Rate Dataset System Distance Model Real-world Knowledge Iterative Targeted Application
L-BFGS [154] 20% MNIST FC10(1) L2 DNN No White Yes Yes image
FGSM [58] 54.6% MNIST a shallow softmax network L∞ DNN No White No No image
BIM [16] 24% ImageNet Inception v3 L∞ CNN Yes White Yes No image
MI-FGSM [47] 37.6% ImageNet Inception v3 L∞ CNN No White Yes Both image
JSMA [129] 97.05% MNIST LeNet L0 CNN No White Yes Yes image
C&W [38] 100% ImageNet Inception v3 L0 , L2 , L∞ CNN No White Yes Yes image
EAD [42] 100% ImageNet Inception v3 L1 , L2 , L∞ CNN No White Yes Yes image
OptMargin [68] 100% CIFAR-10 ResNet L0 , L2 , L∞ CNN No White Yes No image
Guo et al. [60] 95.5% ImageNet ResNet-50 L2 CNN No Both Yes No image
Deepfool [114] 68.7% ILSVRC2012 GoogLeNet L2 CNN No White Yes No image
NewtonFool [81] 81.63% GTSRB CNN(3Conv+1FC) L2 CNN No White Yes No image
UAP [113] 90.7% ILSVRC2012 VGG-16 L2 , L∞ CNN No White Yes No image
UAN [66] 91.8% ImageNet ResNet-152 L2 , L∞ CNN No White Yes Yes image
ATN [23] 89.2% MNIST CNN(3Conv+1FC) L2 CNN No White Yes Yes image
Athalye et al. [20] 83.4% 3D-printed turtle Inception-v3 L2 CNN Yes White No Yes image
Ilyas et al. [77] 99.2% ImageNet Inception-v3 - CNN No Black No Both image
Narodytska et al. [116] 97.51% CIFAR-10 VGG L0 CNN No Black No No image
Kos et al. [91] 76% MNIST VAE-GAN L2 GAN No White No Yes image
Mei et al. [110] - - - L2 SVM No Black Yes No image
Huang et al. [74] - - A3C,TRPO,DQN L1 , L2 , L∞ RL No Both No No image
Papernot et al. [131] 100% Reviews LSTM L2 RNN Yes White No No text
DeepWordBug [54] 51.80% IMDB Review LSTM L0 RNN Yes Black Yes Yes text
DeepSpeech [39] 100% Mozilla Common Voice LSTM L∞ RNN No White No Yes speech
Gong et al. [57] 72% IEMOCAP LSTM L2 RNN Yes White No No speech
CommanderSong [182] 96% Fisher ASplRE Chain Model L1 RNN Yes White No Yes speech
Rosenberg et al. [141] 99.99% 500000 files LSTM L2 RNN Yes Black Yes No malware
MtNet [75] 97% 4500000 files DNN(4 Hidden layers) L2 DNN Yes Black No No malware
SLEIPNIR [15] 99.7% 55000 PEs DNN L2 , L∞ DNN Yes Black No No malware
Rigaki et al. [138] 63% - GAN L0 GAN Yes Black No No malware
Pascanu et al. [133] 69% DREBIN DNN L1 DNN Yes Black No No malware
Kreuk et al. [92] 88% Microsoft Kaggle 2015 CNN L2 , L∞ CNN Yes White No Yes malware
Hu et al. [70] 90.05% 180 programs BiLTSM L1 RNN Yes Black Yes No malware
Hu et al. [71] 99.80% 180000 programs MalGAN L1 GAN Yes Black No No malware
8.1 Regulations on privacy protection fication. As a software system, deep learning can be easily
As shown in Section 4 and 5, both the enterprises and built on mature frameworks such as TensorFlow, Torch or
users are suffering from the risk of privacy. In addition Caffe. The vulnerabilities residing in these frameworks can
to removing privacy in the data, governments and related make the constructed deep learning systems vulnerable to
organizations can issue laws and regulations against privacy other types of attacks. The work [175] enumerates the secu-
violations in the course of data use and transmission. In rity issues such as heap overflow, integer overflow and use after
particular, it is recommended that: 1) introducing regulatory free in these widespread frameworks. These vulnerabilities
authorities to monitor these deep learning systems and can result in denial of service, control-flow hijacking or sys-
strictly supervise the use of data. The involved systems are tem compromise. Moreover, deep learning systems often de-
only allowed to extract features and predict results within pend on third-party libraries to provide auxiliary functions.
the permitted range. The private information is forbidden For instance, OpenCV is commonly used to process images,
for being extracted and inferred without authorization. 2) and Sound eXchange (SoX) is oftentimes used for audios.
establishing and improving relevant laws and regulations Once the vulnerabilities are exploited, the attacker can cause
(e.g., GDPR [3]), for supervising the process of data collec- more severe losses to deep learning systems. Therefore, the
tion, use, storage and deletion. 3) adding digital watermarks security auditing of deep learning implementation deserves
into the data for leak source tracking [21]. The watermarks more research attention and efforts in the further work.
helps to fast find out the rule breakers that are liable for
On the other hand, there are emerging a large number of
exposing privacy.
research works that leverage deep learning to detect and ex-
ploit software vulnerabilities automatically [181] [178] [80]
8.2 Secure implementation of deep learning systems [151]. It is believed that these techniques are also applicable
Most of the research on deep learning security is concentrat- in deep learning systems. Even more, deep learning might
ing on the leak of private data and the correctness of classi- help uncover the interpretation and fix the classification
16
by adding/removing data about under/over represented denial and falsification, and detecting poisonous data [170]
subsets. 2) modifying data or trained model where training [105] [65]. In particular, Olufowobi et al. [121] described the
data reflects past discrimination [6]; 3) importing testing context of creation or modification of data points to enhance
techniques to test the fairness of models, such as sym- trustworthiness and dependability of the data. Chakarov
bolic execution and local interpretability [12]; 4) enacting et al. [40] evaluated the effect of individual data points
non-discrimination law, and data protection law, such as on the performance of trained model. Baracaldo et al. [24]
GDPR [3]. used source information of training data points and the
transformation context to identify poisonous data; protecting
8.6 Corresponding defense methods algorithm, which adjusts training algorithms, e.g., robust
PCA [35], robust linear regression [43] [100], and robust
There is a line of approaches for preventing the aforemen- logistic regression [51].
tioned attacks.
MEA defense. Blurring the prediction results is an effec- AA defense. As adversarial attack draws the major atten-
tive way to prevent model stealing, for instance, rounding tion, defensive work is more comprehensive and ample ac-
parameters [165] [160], adding noise into class probabili- cordingly. The mainstream defense approaches is as follows:
ties [96] [84]. On the other hand, detecting and prevent • Adversarial training. This method selects AEs as part of the
abnormal queries can also resolve MEA. Kesarwani et al. [88] training dataset to make trained model learn characteris-
recorded all requests made by clients and calculated the tics of AEs [73] [94]. Furthermore, Ensemble Adversarial
explored feature space to detect attack. PRADA [84] de- Training [158] contained each turbine input transferred
tected attack based on sudden changes in the distribution from other pre-trained models.
of samples submitted by a given customer. • Region-based method. Understanding properties of adver-
MIA defense. To defend with model inversion attacks, sarial regions and using more robust region-based clas-
researchers propose the following approaches: sification could also defend adversarial attack. Cao et
• Differential privacy (DP), which is a cryptographic scheme al. [36] developed DNNs using region-based classification
designed to maximize the accuracy of data queries while instead of point-based. They predicted label through ran-
minimizing the opportunity to identify their records when domly selecting several points from the hypercube cen-
querying from a statistical database [50]. Individual fea- tered at the testing sample. In [125], the classifier mapped
tures are removed to preserve user privacy. It is first normal samples to the neighborhood of low-dimensional
proposed in [49] and prove to be effective in privacy manifolds in the final-layer hidden space. Local Intrinsic
preservation in database. DP can be applied to prediction Dimensionality [107] characterized dimensional proper-
outputs [41] [64] [166] [184] [76], loss function [89] [155], ties of adversarial regions and evaluated the spatial fill
and gradients [149] [26] [155] [11] [184] [187]. capability. Background Class [109] added a large and
• Homomorphic encryption (HE), which is an encryption func- diverse class of background images into datasets.
tion and enables the following two operations are value- • Transformation. Transforming inputs can defend adversar-
equivalent [139]: exercising arithmetic operations ⊕ on the ial attack to a large extent. Song et al. [150] found that AEs
ring of plain text and encrypting the result, encrypting mainly lay in the low probability regions of the training
operators first and then carry on the same arithmetic regions. So they purified an AE by moving it back towards
operations, i.e., En(x) ⊕ En(y) = En(x + y). In this way, the distribution adaptively. Guo et al. [61] explored model-
clients can encrypt their data and then send it to MLaaS. agnostic defenses on image-classification systems by im-
The server returns encrypted predictions without learning age transformations. Xie et al. [176] used randomization at
anything about the plain data. In the meantime, the clients inference time, including random resizing and padding.
have no idea about the model attributes [56] [101] [85] Tian et al. [156] considered that AEs are more sensitive to
[82]. certain image transformation operations, such as rotation
• Secure multi-party computation (SMC), stemming from and shifting, than normal images. Wang et al. [168] [167]
Yao’s Millionaires’ problem [180] and enabling a safe thought AEs are more sensitive to random perturbations
calculation of contract functions without trusted third than normal. Buckman et al. [32] used thermometer code
parties. In the context of deep learning, it extends to that and one-hot code discretization to increase the robustness
multiple parties collectively train a model and preserve of network to AEs.
their own data [164] [146] [134] [135]. As such, the training • Gradient regularization/masking. This method hides gradi-
data cannot be easily inferred by attackers residing at ents or reduces the sensitivity of models. Madry et al. [108]
either computing servers or the client side. realized it by optimizing a saddle point formulation,
• Training reconstitution. Cao et al. [37] put forward machine which included solving an inner maximization solved and
unlearning, which makes ML models completely forget a an outer minimization. Ross et al. [142] trained differen-
piece of training data and recover the effects to models tiable models that penalized the degree to infinitesimal
and features. Ohrimenko et al. [120] proposed a data- changes in inputs.
oblivious machine learning algorithm. Osia et al. [123] • Distillation. Papernot et al. [126] proposed Defensive Dis-
broke down large, complex deep models to enable scal- tillation, which could successfully mitigate AEs con-
able and privacy-preserving analytics by removing sensi- structed by FGSM and JSMA. Papernot et al. [132] also
tive information with a feature extractor. used the knowledge extracted in distillation to reduce the
PA defense. Poisoning attack can be mitigated through two magnitude of network gradient.
aspect: protecting data, including avoiding data tampering, • Data preprocessing. Liang et al. [98] introduced scalar quan-
18
[21] A. Awad, J. Traub, and S. Sakr. Adaptive watermarks: A concept [42] P. Chen, Y. Sharma, H. Zhang, J. Yi, and C. Hsieh. EAD: elastic-
drift-based approach for predicting event-time progress in data net attacks to deep neural networks via adversarial examples.
streams. In 22nd International Conference on Extending Database In Proceedings of the Thirty-Second AAAI Conference on Artificial
Technology (EDBT), Lisbon, Portugal, pages 622–625, March 26-29, Intelligence, New Orleans, Louisiana, USA, pages 10–17, February
2019. 2-7, 2018.
[22] H. Bae, J. Jang, D. Jung, H. Jang, H. Ha, and S. Yoon. Security [43] Y. Chen, C. Caramanis, and S. Mannor. Robust high dimensional
and privacy issues in deep learning. CoRR, abs/1807.11655, 2018. sparse regression and matching pursuit. CoRR, abs/1301.2725,
[23] S. Baluja and I. Fischer. Learning to attack: Adversarial trans- 2013.
formation networks. In Proceedings of the Thirty-Second AAAI [44] M. Cheng, T. Le, P. Chen, H. Zhang, J. Yi, and C. Hsieh. Query-
Conference on Artificial Intelligence, New Orleans, Louisiana, USA, efficient hard-label black-box attack: An optimization-based ap-
February 2-7, 2018. proach. In 7th International Conference on Learning Representations,
[24] N. Baracaldo, B. Chen, H. Ludwig, and J. A. Safavi. Mitigating ICLR, New Orleans, LA, USA, May 6-9, 2019.
poisoning attacks on machine learning models: A data prove- [45] J. R. C. da Silva, R. F. Berriel, C. Badue, A. F. de Souza, and
nance based approach. In Proceedings of the 10th ACM Workshop T. Oliveira-Santos. Copycat CNN: stealing knowledge by per-
on Artificial Intelligence and Security, AISec@CCS, Dallas, TX, USA, suading confession with random non-labeled data. In Interna-
pages 103–110, November 3, 2017. tional Joint Conference on Neural Networks, IJCNN, Rio de Janeiro,
[25] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The security Brazil, pages 1–8, July 8-13, 2018.
of machine learning. Machine Learning, 81(2):121–148, Nov 2010. [46] J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training
[26] R. Bassily, A. D. Smith, and A. Thakurta. Private empirical risk of deep bidirectional transformers for language understanding.
minimization: Efficient algorithms and tight error bounds. In In Proceedings of the 2019 Conference of the North American Chapter
55th IEEE Annual Symposium on Foundations of Computer Science, of the Association for Computational Linguistics: Human Language
FOCS, Philadelphia, PA, USA, pages 464–473, October 18-21, 2014. Technologies (NAACL-HLT), pages 4171–4186, 2019.
[27] V. Beal. What is structured data? webopedia definition. [47] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li.
https://www.webopedia.com/TERM/S/structured data.html, Boosting adversarial attacks with momentum. In IEEE Conference
Aug. 2018. on Computer Vision and Pattern Recognition, CVPR, Salt Lake City,
[28] B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against UT, USA, pages 9185–9193, June 18-22, 2018.
support vector machines. In Proceedings of the 29th International [48] X. Du, X. Xie, Y. Li, L. Ma, Y. Liu, and J. Zhao. Deepstellar: model-
Conference on Machine Learning, ICML, Edinburgh, Scotland, UK, based quantitative analysis of stateful deep learning systems. In
June 26 - July 1, 2012. Proceedings of the ACM Joint Meeting on European Software Engi-
[29] B. Biggio, I. Pillai, S. R. Bulò, D. Ariu, M. Pelillo, and F. Roli. Is neering Conference and Symposium on the Foundations of Software
data clustering in adversarial settings secure? In AISec’13, Pro- Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-
ceedings of the ACM Workshop on Artificial Intelligence and Security, 30, 2019., pages 477–487.
Co-located with CCS, Berlin, Germany, pages 87–98, November 4, [49] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor.
2013. Our data, ourselves: Privacy via distributed noise generation. In
[30] F. Z. Borgesius. Discrimination, artificial intelligence, and 25th Annual International Conference on the Theory and Applications
algorithmic decision-making. https://rm.coe.int/discrimin of Cryptographic Techniques, St. Petersburg, Russia, pages 486–503,
ation-artificial-intelligence-and-algorithmic-decision-making/ May 28-June 1, 2006.
1680925d73, 2018.
[50] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith. Calibrating
[31] M. Brückner and T. Scheffer. Nash equilibria of static prediction noise to sensitivity in private data analysis. In Theory of Cryptog-
games. In 23rd Annual Conference on Neural Information Process- raphy, Third Theory of Cryptography Conference, TCC, New York, NY,
ing Systems, Vancouver, British Columbia, Canada, pages 171–179, USA, pages 265–284, March 4-7, 2006.
December 7-10, 2009.
[51] J. Feng, H. Xu, S. Mannor, and S. Yan. Robust logistic regression
[32] J. Buckman, A. Roy, C. Raffel, and I. Goodfellow. Thermometer
and classification. In Annual Conference on Neural Information
encoding: One hot way to resist adversarial examples. In Interna-
Processing Systems, Montreal, Quebec, Canada, pages 253–261, De-
tional Conference on Learning Representations, 2018.
cember 8-13, 2014.
[33] J. Buolamwini and T. Gebru. Gender shades: Intersectional accu-
racy disparities in commercial gender classification. In Conference [52] M. Fredrikson, S. Jha, and T. Ristenpart. Model inversion attacks
on Fairness, Accountability and Transparency, FAT, New York, NY, that exploit confidence information and basic countermeasures.
USA, pages 77–91, February 23-24, 2018. In Proceedings of the 22nd ACM SIGSAC Conference on Computer
and Communications Security, Denver, CO, USA, pages 1322–1333,
[34] C. Burkard and B. Lagesse. Analysis of causative attacks against
October 12-16, 2015.
svms learning from data streams. In Proceedings of the 3rd
ACM on International Workshop on Security And Privacy Analytics, [53] K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov.
IWSPA@CODASPY, Scottsdale, Arizona, USA, pages 31–36, March Property inference attacks on fully connected neural networks
24, 2017. using permutation invariant representations. In Proceedings of
[35] E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal the ACM SIGSAC Conference on Computer and Communications
component analysis? J. ACM, 58(3):11:1–11:37, 2011. Security, CCS, Toronto, ON, Canada, pages 619–633, October 15-
19, 2018.
[36] X. Cao and N. Z. Gong. Mitigating evasion attacks to deep neural
networks via region-based classification. In Proceedings of the [54] J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi. Black-box generation
33rd Annual Computer Security Applications Conference, Orlando, of adversarial text sequences to evade deep learning classifiers. In
FL, USA, pages 278–287, December 4-8, 2017. IEEE Security and Privacy Workshops, SP Workshops, San Francisco,
[37] Y. Cao and J. Yang. Towards making systems forget with machine CA, USA, pages 50–56, May 24, 2018.
unlearning. In IEEE Symposium on Security and Privacy, SP, San [55] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaud-
Jose, CA, USA, pages 463–480, May 17-21, 2015. huri, and M. T. Vechev. AI2: safety and robustness certification of
[38] N. Carlini and D. A. Wagner. Towards evaluating the robustness neural networks with abstract interpretation. In IEEE Symposium
of neural networks. In IEEE Symposium on Security and Privacy on Security and Privacy, SP, San Francisco, CA, USA, pages 3–18,
(SP), pages 39–57, 2017. May 21-23, 2018.
[39] N. Carlini and D. A. Wagner. Audio adversarial examples: [56] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. E. Lauter, M. Naehrig,
Targeted attacks on speech-to-text. In IEEE Security and Privacy and J. Wernsing. Cryptonets: Applying neural networks to en-
Workshops, SP Workshops, San Francisco, CA, USA, pages 1–7, May crypted data with high throughput and accuracy. In Proceedings of
24, 2018. the 33nd International Conference on Machine Learning, ICML 2016,
[40] A. Chakarov, A. V. Nori, S. K. Rajamani, S. Sen, and D. Vi- New York City, NY, USA, June 19-24, 2016, pages 201–210, 2016.
jaykeerthy. Debugging machine learning tasks. CoRR, [57] Y. Gong and C. Poellabauer. Crafting adversarial examples for
abs/1603.07292, 2016. speech paralinguistics applications. CoRR, abs/1711.03280, 2017.
[41] K. Chaudhuri and C. Monteleoni. Privacy-preserving logistic re- [58] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and
gression. In Proceedings of the Twenty-Second Annual Conference on harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
Neural Information Processing Systems, Vancouver, British Columbia, [59] S. Gu and L. Rigazio. Towards deep neural network architectures
Canada, pages 289–296, December 8-11, 2008. robust to adversarial examples. CoRR, abs/1412.5068, 2014.
20
[60] C. Guo, J. S. Frank, and K. Q. Weinberger. Low frequency In Proceedings of the 33rd Annual Computer Security Applications
adversarial perturbation. CoRR, abs/1809.08758, 2018. Conference, Orlando, FL, USA, December 4-8, 2017, pages 262–277.
[61] C. Guo, M. Rana, M. Cissé, and L. van der Maaten. Coun- [82] X. Jiang, M. Kim, K. E. Lauter, and Y. Song. Secure outsourced
tering adversarial images using input transformations. CoRR, matrix computation and application to neural networks. In
abs/1711.00117, 2017. Proceedings of the 2018 ACM SIGSAC Conference on Computer and
[62] J. Guo, Y. Jiang, Y. Zhao, Q. Chen, and J. Sun. Dlfuzz: differential Communications Security, CCS 2018, Toronto, ON, Canada, October
fuzzing testing of deep learning systems. In Proceedings of the 15-19, 2018, pages 1209–1222, 2018.
2018 ACM Joint Meeting on European Software Engineering Con- [83] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks
ference and Symposium on the Foundations of Software Engineering, for efficient text classification. In Proceedings of the 15th Confer-
ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November ence of the European Chapter of the Association for Computational
04-09, 2018, pages 739–743. Linguistics: Volume 2, Short Papers, pages 427–431. Association for
[63] I. A. Hamilton. Amazon built an ai tool to hire people but had Computational Linguistics, April 2017.
to shut it down because it was discriminating against women. [84] M. Juuti, S. Szyller, A. Dmitrenko, S. Marchal, and N. Asokan.
https://www.businessinsider.com/amazon-built-ai-to-hire-peo PRADA: protecting against DNN model stealing attacks. CoRR,
ple-discriminated-against-women-2018-10, Oct. 2018. abs/1805.02628, 2018.
[64] J. Hamm, Y. Cao, and M. Belkin. Learning privately from [85] C. Juvekar, V. Vaikuntanathan, and A. Chandrakasan. GAZELLE:
multiparty data. In Proceedings of the 33nd International Conference A low latency framework for secure neural network inference.
on Machine Learning, ICML, New York City, NY, USA, pages 555– In 27th USENIX Security Symposium, USENIX Security, Baltimore,
563, June 19-24, 2016. MD, USA, pages 1651–1669, August 15-17, 2018.
[65] R. Hasan, R. Sion, and M. Winslett. The case of the fake [86] A. Kantchelian, S. Afroz, L. Huang, A. C. Islam, B. Miller, M. C.
picasso: Preventing history forgery with secure provenance. In Tschantz, R. Greenstadt, A. D. Joseph, and J. D. Tygar. Ap-
7th USENIX Conference on File and Storage Technologies, February proaches to adversarial drift. In Proceedings of the ACM Workshop
24-27, 2009, San Francisco, CA, USA. Proceedings, pages 1–14, 2009. on Artificial Intelligence and Security, AISec, Berlin, Germany, pages
[66] J. Hayes and G. Danezis. Learning universal adversarial per- 99–110, November 4, 2013.
turbations with generative models. In IEEE Security and Privacy [87] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochender-
Workshops, SP Workshops, San Francisco, CA, USA, pages 43–49, fer. Towards proving the adversarial robustness of deep neural
May 24, 2018. networks. In Proceedings First Workshop on Formal Verification
[67] J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro. LOGAN: of Autonomous Vehicles, FVAV@iFM, Turin, Italy, 19th September.,
evaluating privacy leakage of generative models using genera- pages 19–26, 2017.
tive adversarial networks. CoRR, abs/1705.07663, 2017. [88] M. Kesarwani, B. Mukhoty, V. Arya, and S. Mehta. Model
[68] W. He, B. Li, and D. Song. Decision boundary analysis of extraction warning in mlaas paradigm. CoRR, abs/1711.07221,
adversarial examples. In International Conference on Learning 2017.
Representations, 2018. [89] D. Kifer, A. D. Smith, and A. Thakurta. Private convex optimiza-
[69] B. Hitaj, G. Ateniese, and F. Pérez-Cruz. Deep models under tion for empirical risk minimization with applications to high-
the GAN: information leakage from collaborative deep learning. dimensional regression. In The 25th Annual Conference on Learning
In Proceedings of the ACM SIGSAC Conference on Computer and Theory, COLT, Edinburgh, Scotland, pages 25.1–25.40, June 25-27,
Communications Security, CCS, Dallas, TX, USA, pages 603–618, 2012.
October 30-November 03, 2017. [90] J. Kim, R. Feldt, and S. Yoo. Guiding deep learning system
[70] W. Hu and Y. Tan. Black-box attacks against RNN based malware testing using surprise adequacy. In Proceedings of the 41st Interna-
detection algorithms. In The Workshops of the The Thirty-Second tional Conference on Software Engineering, ICSE 2019, Montreal, QC,
AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, Canada, May 25-31, 2019, pages 1039–1049.
USA, pages 245–251, February 2-7,. [91] J. Kos, I. Fischer, and D. Song. Adversarial examples for genera-
[71] W. Hu and Y. Tan. Generating adversarial malware examples for tive models. In IEEE Security and Privacy Workshops, SP Workshops,
black-box attacks based on GAN. CoRR, abs/1702.05983, 2017. San Francisco, CA, USA, May 24, 2018, pages 36–42.
[72] W. Hua, Z. Zhang, and G. E. Suh. Reverse engineering convolu- [92] F. Kreuk, A. Barak, S. Aviv-Reuven, M. Baruch, B. Pinkas, and
tional neural networks through side-channel information leaks. J. Keshet. Deceiving end-to-end deep learning malware detectors
In Proceedings of the 55th Annual Design Automation Conference, using adversarial examples. 2018.
DAC, San Francisco, CA, USA, pages 4:1–4:6, June 24-29, 2018. [93] A. Krizhevsky, V. Nair, and G. Hinton. CIFAR dataset. https:
[73] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvári. Learning //www.cs.toronto.edu/∼kriz/cifar.html, 2019.
with a strong adversary. CoRR, abs/1511.03034, 2015. [94] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial machine
[74] S. H. Huang, N. Papernot, I. J. Goodfellow, Y. Duan, and learning at scale. CoRR, abs/1611.01236, 2016.
P. Abbeel. Adversarial attacks on neural network policies. CoRR, [95] Y. LeCun, C. Cortes, and C. Burges. Mnist dataset. http://yann
abs/1702.02284, 2017. .lecun.com/exdb/mnist/, 2017.
[75] W. Huang and J. W. Stokes. Mtnet: A multi-task neural net- [96] T. Lee, B. Edwards, I. Molloy, and D. Su. Defending against
work for dynamic malware classification. In 13th International model stealing attacks using deceptive perturbations. CoRR,
Conference, Detection of Intrusions and Malware, and Vulnerability abs/1806.00054, 2018.
Assessment, DIMVA, San Sebastián, Spain, pages 399–418, July 7-8, [97] P. Li, J. Yi, and L. Zhang. Query-efficient black-box attack by
2016. active learning. In IEEE International Conference on Data Mining,
[76] N. Hynes, R. Cheng, and D. Song. Efficient deep learning on ICDM , Singapore, November 17-20, 2018, pages 1200–1205.
multi-source private data. CoRR, abs/1807.06689, 2018. [98] B. Liang, H. Li, M. Su, X. Li, W. Shi, and X. Wang. Detecting
[77] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Query-efficient adversarial image examples in deep networks with adaptive
black-box adversarial examples. CoRR, abs/1712.07113, 2017. noise reduction. 2017.
[78] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box adversar- [99] R. Light. Ai trends: Machine learning as a service (mlaas).
ial attacks with limited queries and information. In Proceedings https://learn.g2.com/trends/machine-learning-service-mlaas,
of the 35th International Conference on Machine Learning, ICML, Jan. 2018.
Stockholmsmässan, Stockholm, Sweden, pages 2142–2151, 2018. [100] C. Liu, B. Li, Y. Vorobeychik, and A. Oprea. Robust linear regres-
[79] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C. Nita-Rotaru, and sion against training data poisoning. In Proceedings of the 10th
B. Li. Manipulating machine learning: Poisoning attacks and ACM Workshop on Artificial Intelligence and Security, AISec@CCS
countermeasures for regression learning. In IEEE Symposium on 2017, Dallas, TX, USA, November 3, 2017, pages 91–102, 2017.
Security and Privacy, SP, San Francisco, California, USA, pages 19– [101] J. Liu, M. Juuti, Y. Lu, and N. Asokan. Oblivious neural network
35, May 21-23, 2018. predictions via minionn transformations. In Proceedings of the
[80] S. Jan, A. Panichella, A. Arcuri, and L. C. Briand. Automatic ACM SIGSAC Conference on Computer and Communications Secu-
generation of tests to exploit XML injection vulnerabilities in web rity, CCS, Dallas, TX, USA, October 30 - November 03, 2017, pages
applications. IEEE Trans. Software Eng., 45(4):335–362, 2019. 619–631.
[81] U. Jang, X. Wu, and S. Jha. Objective metrics and gradient [102] K. S. Liu, B. Li, and J. Gao. Generative model: Membership attack,
descent algorithms for adversarial examples in machine learning. generalization and diversity. CoRR, abs/1805.09898, 2018.
21
[103] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung. A [123] S. A. Ossia, A. S. Shamsabadi, A. Taheri, H. R. Rabiee, N. D.
survey on security threats and defensive techniques of machine Lane, and H. Haddadi. A Hybrid Deep Learning Architecture
learning: A data driven view. IEEE Access, 6:12103–12117, 2018. for Privacy-Preserving Mobile Analytics. CoRR, abs/1703.02952,
[104] Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. 2017.
Gunter, and K. Chen. Understanding membership inferences on [124] R. Pan. Static deep neural network analysis for robustness. In
well-generalized learning models. CoRR, abs/1802.04889, 2018. Proceedings of the ACM Joint Meeting on European Software Engi-
[105] J. Lyle and A. P. Martin. Trusted computing and provenance: neering Conference and Symposium on the Foundations of Software
Better together. In 2nd Workshop on the Theory and Practice of Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-
Provenance, TaPP’10, San Jose, CA, USA, February 22, 2010, 2010. 30, 2019., pages 1238–1240.
[106] L. Ma, F. Juefei-Xu, F. Zhang, J. Sun, M. Xue, B. Li, C. Chen, T. Su, [125] T. Pang, C. Du, and J. Zhu. Robust deep learning via re-
L. Li, Y. Liu, J. Zhao, and Y. Wang. Deepgauge: multi-granularity verse cross-entropy training and thresholding test. CoRR,
testing criteria for deep learning systems. In Proceedings of the abs/1706.00633, 2017.
33rd ACM/IEEE International Conference on Automated Software [126] N. Papernot and P. D. McDaniel. On the effectiveness of defen-
Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, sive distillation. CoRR, abs/1607.05113, 2016.
pages 120–131. [127] N. Papernot, P. D. McDaniel, and I. J. Goodfellow. Transferability
[107] X. Ma, B. Li, Y. Wang, S. M. Erfani, S. N. R. Wijewickrema, M. E. in machine learning: from phenomena to black-box attacks using
Houle, G. Schoenebeck, D. Song, and J. Bailey. Characterizing adversarial samples. CoRR, abs/1605.07277, 2016.
adversarial subspaces using local intrinsic dimensionality. CoRR, [128] N. Papernot, P. D. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik,
abs/1801.02613, 2018. and A. Swami. Practical black-box attacks against machine
[108] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. learning. In Proceedings of the 2017 ACM on Asia Conference
Towards deep learning models resistant to adversarial attacks. on Computer and Communications Security, AsiaCCS, Abu Dhabi,
CoRR, abs/1706.06083, 2017. United Arab Emirates, April 2-6, 2017, pages 506–519.
[109] M. McCoyd and D. A. Wagner. Background class defense against [129] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik,
adversarial examples. In 2018 IEEE Security and Privacy Work- and A. Swami. The limitations of deep learning in adversarial
shops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, settings. In IEEE European Symposium on Security and Privacy,
pages 96–102, 2018. EuroS&P 2016, Saarbrücken, Germany, March 21-24, 2016, pages
[110] S. Mei and X. Zhu. Using machine teaching to identify optimal 372–387, 2016.
training-set attacks on machine learners. In Proceedings of the [130] N. Papernot, P. D. McDaniel, A. Sinha, and M. P. Wellman. Sok:
Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25- Security and privacy in machine learning. In 2018 IEEE European
30, 2015, Austin, Texas, USA., pages 2871–2877, 2015. Symposium on Security and Privacy, EuroS&P 2018, London, United
[111] L. Melis, C. Song, E. D. Cristofaro, and V. Shmatikov. Inference Kingdom, April 24-26, 2018, pages 399–414, 2018.
attacks against collaborative learning. CoRR, abs/1805.04049, [131] N. Papernot, P. D. McDaniel, A. Swami, and R. E. Harang. Craft-
2018. ing adversarial input sequences for recurrent neural networks.
[112] D. Meng and H. Chen. Magnet: A two-pronged defense against In 2016 IEEE Military Communications Conference, MILCOM 2016,
adversarial examples. In Proceedings of the ACM SIGSAC Confer- Baltimore, MD, USA, November 1-3, 2016, pages 49–54, 2016.
ence on Computer and Communications Security, CCS, Dallas, TX,
[132] N. Papernot, P. D. McDaniel, X. Wu, S. Jha, and A. Swami.
USA, October 30 - November 03, 2017, pages 135–147.
Distillation as a defense to adversarial perturbations against deep
[113] S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Uni- neural networks. In IEEE Symposium on Security and Privacy, SP
versal adversarial perturbations. In 2017 IEEE Conference on 2016, San Jose, CA, USA, May 22-26, 2016, pages 582–597, 2016.
Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI,
[133] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and
USA, July 21-26, 2017, pages 86–94, 2017.
A. Thomas. Malware classification with recurrent networks. In
[114] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: A
2015 IEEE International Conference on Acoustics, Speech and Signal
simple and accurate method to fool deep neural networks. In
Processing, ICASSP 2015, South Brisbane, Queensland, Australia,
IEEE Conference on Computer Vision and Pattern Recognition, CVPR,
April 19-24, 2015, pages 1916–1920, 2015.
Las Vegas, NV, USA, June 27-30, 2016, pages 2574–2582.
[134] L. T. Phong, Y. Aono, T. Hayashi, L. Wang, and S. Moriai. Privacy-
[115] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Won-
preserving deep learning via additively homomorphic encryp-
grassamee, E. C. Lupu, and F. Roli. Towards poisoning of
tion. IEEE Trans. Information Forensics and Security, 13(5):1333–
deep learning algorithms with back-gradient optimization. In
1345, 2018.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and
Security, AISec@CCS 2017, Dallas, TX, USA, November 3, 2017, [135] L. T. Phong and T. T. Phuong. Privacy-preserving deep learning
pages 27–38, 2017. for any activation function. CoRR, abs/1809.03272, 2018.
[116] N. Narodytska and S. P. Kasiviswanathan. Simple black-box [136] S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M. P. Reyes, M.-L.
adversarial attacks on deep neural networks. In IEEE Confer- Shyu, S.-C. Chen, and S. S. Iyengar. A survey on deep learn-
ence on Computer Vision and Pattern Recognition Workshops, CVPR ing: Algorithms, techniques, and applications. ACM Computing
Workshops, Honolulu, HI, USA, July 21-26, 2017, pages 1310–1318. Surveys, 51(5):92:1–92:36, Sept. 2018.
[117] M. Nasr, R. Shokri, and A. Houmansadr. Machine learning [137] A. Pyrgelis, C. Troncoso, and E. D. Cristofaro. Knock knock,
with membership privacy using adversarial regularization. In who’s there? membership inference on aggregate location data.
Proceedings of the 2018 ACM SIGSAC Conference on Computer and 2017.
Communications Security (CCS), pages 634–646, 2018. [138] M. Rigaki and S. Garcia. Bringing a GAN to a knife-fight:
[118] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, Adapting malware communication to avoid detection. In 2018
U. Saini, C. A. Sutton, J. D. Tygar, and K. Xia. Exploiting machine IEEE Security and Privacy Workshops, SP Workshops 2018, San
learning to subvert your spam filter. In First USENIX Workshop Francisco, CA, USA, May 24, 2018, pages 70–75, 2018.
on Large-Scale Exploits and Emergent Threats, LEET, 2008. [139] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al. On data banks
[119] S. J. Oh, M. Augustin, M. Fritz, and B. Schiele. Towards reverse- and privacy homomorphisms. Foundations of secure computation,
engineering black-box neural networks. In International Confer- 4(11):169–180, 1978.
ence on Learning Representations, 2018. [140] I. Rosenberg, A. Shabtai, Y. Elovici, and L. Rokach. Low resource
[120] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, black-box end-to-end attack against state of the art API call based
K. Vaswani, and M. Costa. Oblivious multi-party machine learn- malware classifiers. CoRR, abs/1804.08778, 2018.
ing on trusted processors. In 25th USENIX Security Symposium, [141] I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici. Generic black-
USENIX Security 16, Austin, TX, USA, August 10-12, 2016., pages box end-to-end attack against state of the art API call based
619–636, 2016. malware classifiers. In 21st International Symposium, Research
[121] H. Olufowobi, R. Engel, N. Baracaldo, L. A. D. Bathen, S. Tata, in Attacks, Intrusions, and Defenses, RAID 2018, Heraklion, Crete,
and H. Ludwig. Data provenance model for internet of things Greece, September 10-12, 2018, Proceedings, pages 490–510.
(iot) systems. In Service-Oriented Computing, ICSOC 2016 Work- [142] A. S. Ross and F. Doshi-Velez. Improving the adversarial robust-
shops, Banff, AB, Canada, October 10-13, 2016., pages 85–91. ness and interpretability of deep neural networks by regularizing
[122] T. Orekondy, B. Schiele, and M. Fritz. Knockoff nets: Stealing their input gradients. In Proceedings of the Thirty-Second AAAI
functionality of black-box models. June 2019. Conference on Artificial Intelligence (AAAI), pages 1660–1669, 2018.
22
[143] A. Salem, Y. Zhang, M. Humbert, M. Fritz, and M. Backes. [164] S. Wagh, D. Gupta, and N. Chandran. Securenn: Efficient and
Ml-leaks: Model and data independent membership inference private neural network training. IACR Cryptology ePrint Archive,
attacks and defenses on machine learning models. CoRR, 2018:442, 2018.
abs/1806.01246, 2018. [165] B. Wang and N. Z. Gong. Stealing hyperparameters in machine
[144] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Du- learning. In IEEE Symposium on Security and Privacy (SP), San
mitras, and T. Goldstein. Poison frogs! targeted clean-label Francisco, California, USA, 21-23 May 2018, pages 36–52, 2018.
poisoning attacks on neural networks. CoRR, abs/1804.00792, [166] D. Wang, M. Ye, and J. Xu. Differentially private empirical
2018. risk minimization revisited: Faster and more general. In Annual
[145] M. Sharif, L. Bauer, and M. K. Reiter. On the suitability of lp- Conference on Neural Information Processing Systems, Long Beach,
norms for creating and preventing adversarial examples. In 2018 CA, USA, 4-9 December 2017, pages 2719–2728, 2017.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) [167] J. Wang, G. Dong, J. Sun, X. Wang, and P. Zhang. Adversarial
Workshops, Salt Lake City, UT, USA, June 18-22, 2018, pages 1605– sample detection for deep neural network through model muta-
1613. tion testing. In Proceedings of the 41st International Conference on
[146] R. Shokri and V. Shmatikov. Privacy-preserving deep learning. In Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31,
Proceedings of the 22nd ACM SIGSAC Conference on Computer and 2019, pages 1245–1256.
Communications Security, Denver, CO, USA, October 12-16, 2015, [168] J. Wang, J. Sun, P. Zhang, and X. Wang. Detecting adversarial
pages 1310–1321, 2015. samples for deep neural networks through mutation testing.
[147] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership CoRR, abs/1805.05010, 2018.
inference attacks against machine learning models. In 2017 IEEE [169] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana. Formal
Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, security analysis of neural networks using symbolic intervals. In
May 22-26, 2017, pages 3–18, 2017. 27th USENIX Security Symposium, USENIX Security 2018, Balti-
[148] C. Song, T. Ristenpart, and V. Shmatikov. Machine learning more, MD, USA, August 15-17, 2018., pages 1599–1614, 2018.
models that remember too much. In Proceedings of ACM SIGSAC [170] X. O. Wang, K. Zeng, K. Govindan, and P. Mohapatra. Chaining
Conference on Computer and Communications Security, CCS 2017, for securing data provenance in distributed information net-
Dallas, TX, USA, October 30 - November 03, 2017, pages 587–601. works. In 31st IEEE Military Communications Conference, MIL-
[149] S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient COM, Orlando, FL, USA, October 29 - November 1, 2012, pages 1–6.
descent with differentially private updates. In IEEE Global Con- [171] Y. Wang and K. Chaudhuri. Data poisoning attacks against online
ference on Signal and Information Processing, GlobalSIP 2013, Austin, learning. CoRR, abs/1808.08994, 2018.
TX, USA, December 3-5, 2013, pages 245–248, 2013.
[172] H. Xiao, B. Biggio, G. Brown, G. Fumera, C. Eckert, and F. Roli.
[150] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixelde-
Is feature selection secure against training data poisoning? In
fend: Leveraging generative models to understand and defend
Proceedings of the 32nd International Conference on Machine Learning,
against adversarial examples. CoRR, abs/1710.10766, 2017.
ICML 2015, Lille, France, 6-11 July 2015, pages 1689–1698, 2015.
[151] A. Stasinopoulos, C. Ntantogian, and C. Xenakis. Commix:
[173] H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli.
automating evaluation and exploitation of command injection
Support vector machines under adversarial label contamination.
vulnerabilities in web applications. Int. J. Inf. Sec., 18(1):49–72,
Neurocomputing, 160:53–62, 2015.
2019.
[152] J. Steinhardt, P. W. Koh, and P. S. Liang. Certified defenses for [174] H. Xiao, H. Xiao, and C. Eckert. Adversarial label flips attack on
data poisoning attacks. In Annual Conference on Neural Information support vector machines. In 20th European Conference on Artificial
Processing Systems, 4-9 December 2017, Long Beach, CA, USA, pages Intelligence (ECAI), Montpellier, France, August 27-31, 2012, pages
3520–3532, 2017. 870–875, 2012.
[153] Y. Sun, M. Wu, W. Ruan, X. Huang, M. Kwiatkowska, and [175] Q. Xiao, K. Li, D. Zhang, and W. Xu. Security risks in deep
D. Kroening. Concolic testing for deep neural networks. In learning implementations. In 2018 IEEE Security and Privacy (SP)
Proceedings of the 33rd ACM/IEEE International Conference on Auto- Workshops, San Francisco, CA, USA, May 24, 2018, pages 123–128.
mated Software Engineering, ASE 2018, Montpellier, France, Septem- [176] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. L. Yuille. Mitigating ad-
ber 3-7, 2018, pages 109–119. versarial effects through randomization. CoRR, abs/1711.01991,
[154] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. 2017.
Goodfellow, and R. Fergus. Intriguing properties of neural [177] X. Xie, L. Ma, F. Juefei-Xu, M. Xue, H. Chen, Y. Liu, J. Zhao,
networks. CoRR, abs/1312.6199, 2013. B. Li, J. Yin, and S. See. Deephunter: a coverage-guided fuzz
[155] K. Talwar, A. Thakurta, and L. Zhang. Private empirical risk testing framework for deep neural networks. In Proceedings of the
minimization beyond the worst case: The effect of the constraint 28th ACM SIGSOFT International Symposium on Software Testing
set geometry. CoRR, abs/1411.5417, 2014. and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019., pages
[156] S. Tian, G. Yang, and Y. Cai. Detecting adversarial examples 146–157.
through image transformation. In Proceedings of the Thirty-Second [178] L. Xu, W. Jia, W. Dong, and Y. Li. Automatic exploit generation
AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, for buffer overflow vulnerabilities. In 2018 IEEE International
USA, February 2-7, 2018, pages 4139–4146, 2018. Conference on Software Quality, Reliability and Security (QRS) Com-
[157] Y. Tian, K. Pei, S. Jana, and B. Ray. Deeptest: automated testing panion, Lisbon, Portugal, July 16-20, 2018, pages 463–468.
of deep-neural-network-driven autonomous cars. In Proceedings [179] W. Xu, D. Evans, and Y. Qi. Feature squeezing: Detecting
of the 40th International Conference on Software Engineering, ICSE adversarial examples in deep neural networks. In 25th Annual
2018, Gothenburg, Sweden, May 27 - June 03, 2018, pages 303–314. Network and Distributed System Security Symposium, NDSS 2018,
[158] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. D. Mc- San Diego, California, USA, February 18-21, 2018, 2018.
Daniel. Ensemble adversarial training: Attacks and defenses. [180] A. C. Yao. Protocols for secure computations (extended abstract).
CoRR, abs/1705.07204, 2017. In 23rd Annual Symposium on Foundations of Computer Science,
[159] F. Tramèr, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. Chicago, Illinois, USA, 3-5 November 1982, pages 160–164, 1982.
McDaniel. The space of transferable adversarial examples. CoRR, [181] W. You, P. Zong, K. Chen, X. Wang, X. Liao, P. Bian, and
abs/1704.03453, 2017. B. Liang. Semfuzz: Semantics-based automatic generation of
[160] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. proof-of-concept exploits. In Proceedings of the 2017 ACM SIGSAC
Stealing machine learning models via prediction apis. In 25th Conference on Computer and Communications Security (CCS), Dallas,
USENIX Security Symposium, USENIX Security 16, Austin, TX, TX, USA, October 30 - November 03, 2017, pages 2139–2154.
USA, August 10-12, 2016., pages 601–618, 2016. [182] X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang,
[161] S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei. Towards de- H. Huang, X. Wang, and C. A. Gunter. Commandersong: A
mystifying membership inference attacks. CoRR, abs/1807.09173, systematic approach for practical adversarial voice recognition.
2018. In 27th USENIX Security Symposium, USENIX Security 2018, Bal-
[162] M. Veale, R. Binns, and L. Edwards. Algorithms that remem- timore, MD, USA, August 15-17, 2018., pages 49–64, 2018.
ber: Model inversion attacks and data protection law. CoRR, [183] V. Zantedeschi, M. Nicolae, and A. Rawat. Efficient defenses
abs/1807.04644, 2018. against adversarial attacks. In Proceedings of the 10th ACM
[163] J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Workshop on Artificial Intelligence and Security, AISec@CCS 2017,
Softw., 11(1):37–57, 1985. Dallas, TX, USA, November 3, 2017, pages 39–49, 2017.
23