Full Text 01
Full Text 01
Full Text 01
TECHNOLOGY,
SECOND CYCLE, 30 CREDITS
STOCKHOLM, SWEDEN 2018
JIN HU
Abstract
Deep learning methods get impressive performance in many Natu-
ral Neural Processing (NLP) tasks, but it is still difficult to know what
happened inside a deep neural network. In this thesis, a general overview
of Explainable AI and how explainable deep learning methods applied
for NLP tasks is given. Then the Bi-directional LSTM and CRF (Bi-
LSTM-CRF) model for Named Entity Recognition (NER) task is intro-
duced, as well as the approach to make this model explainable. The
approach to visualize the importance of neurons in Bi-LSTM layer of
the model for NER by Layer-wise Relevance Propagation (LRP) is pro-
posed, which can measure how neurons contribute to each prediction
of a word in a sequence. Ideas about how to measure the influence of
CRF layer of the Bi-LSTM-CRF model is also described.
iv
Sammanfattning
Djupa inlärningsmetoder får imponerande prestanda i många naturli-
ga Neural Processing (NLP) uppgifter, men det är fortfarande svårt att
veta vad hände inne i ett djupt neuralt nätverk. I denna avhandling, en
allmän översikt av förklarliga AI och hur förklarliga djupa inlärnings-
metoder tillämpas för NLP-uppgifter ges. Då den bi-riktiga LSTM och
CRF (BiLSTM-CRF) modell för Named Entity Recognition (NER) upp-
gift införs, liksom tillvägagångssättet för att göra denna modell för-
klarlig. De tillvägagångssätt för att visualisera vikten av neuroner i Bi-
LSTM-skiktet av Modellen för NER genom Layer-Wise Relevance Pro-
pagation (LRP) föreslås, som kan mäta hur neuroner bidrar till varje
förutsägelse av ett ord i en sekvens. Idéer om hur man mäter påver-
kan av CRF-skiktet i Bi-LSTM-CRF-modellen beskrivs också.
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Sustainability and Ethics . . . . . . . . . . . . . . . . . . . 3
1.6 Research Methodology . . . . . . . . . . . . . . . . . . . . 4
1.7 Project Environment . . . . . . . . . . . . . . . . . . . . . 4
1.8 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.9 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . 4
v
vi CONTENTS
3 Approaches 22
3.1 Bi-LSTM-CRF Model for NER . . . . . . . . . . . . . . . . 22
3.2 t-SNE for Embedding Visualization . . . . . . . . . . . . . 24
3.3 Layer-wise Relevance Propagation for Bi-LSTM . . . . . 25
3.4 Explanation for the CRF Layer . . . . . . . . . . . . . . . 28
Bibliography 47
List of Tables
vii
List of Figures
viii
Abbreviations
AI Artificial Intelligence.
NN Neural Network.
ix
Chapter 1
Introduction
1.1 Motivation
Machine learning methods have been applied widely in many fields
like recommender system [1], chatbots [2], and self-driving cars [3].
However, some machine learning methods are opaque, non-intuitive,
and difficult for people to understand, the effectiveness of machine
learning are limited.
In recent years, deep learning methods, as a type of machine learn-
ing methods, had yielded state-of-art performance in many problems.
But it is possible that people worry about how to make sure a deep
learning model makes the right decision when huge numbers of pa-
rameters, many rounds of iteration and optimization from deep learn-
ing models are taken into consideration. For example, many medical
systems apply deep learning techniques to help diagnosis, as Wang et
al. [4] uses it to detect breast cancer, but the result needs to be checked
by human doctors in order to make sure the diagnosis is reasonable.
The more complicated the model is, the more difficult to explain how
the result comes out so that people probably suspect the prediction.
If the model can explain itself, it will gain more trust from users, and
1
2 CHAPTER 1. INTRODUCTION
1.3 Purpose
After some literature review, the purpose of this thesis project is stated
as: Investigate different explainable deep learning methods for NLP,
implement an appropriate explainable method for the named entity
recognizer based on Bi-LSTM-CRF, and make people who have some
background knowledge get some explainability from the visualiza-
tion.
1.4 Goals
The goal of this thesis is to help developers at Seavus Stockholm AB to
debug and evaluate the performance of their product in the NER field,
and provide comparisons of different explainable methods so that they
can choose when they develop AI solutions in the future.
1.8 Delimitations
The research of explainable AI includes two aspects, namely explain-
able model and explainable interface. It is important to design meth-
ods to make the explanation acceptable for normal users, however, this
is out of scope of this thesis. This thesis focuses on developing explain-
able model in NLP field for people who have technology background.
6
CHAPTER 2. BACKGROUND AND RELATED WORKS 7
ez − e−z
tanh(z) = (2.4)
ez + e−z
Figure 2.2 shows an example of a multi-layer NN. Multiple neu-
rons form layers of units, each neuron is connected to all neurons in
the previous layer by edges with different weights, and several con-
nected layers compose the neural network. The first layer is the input
layer which receive inputs, and the last layer which yields the output
is called the output layer, the intermediate layers are regarded as hidden
layers. When information comes in, it flows from the input layer to the
output layer and lights the correspond neuron in the output layer to
indicate the result.
8 CHAPTER 2. BACKGROUND AND RELATED WORKS
∂L(θ)
θ =θ−η (2.6)
∂θ
Usually, there are a lot of training samples, so it is not an efficient
way to calculate gradient separately for each training input. To deal
with this issue, we choose a small number of training samples ran-
domly, each set of training samples is referred as a mini-batch, and then
we update parameters after calculating the loss of each mini-batch.
CHAPTER 2. BACKGROUND AND RELATED WORKS 9
LSTM
RNNs can keep the memory of every timestep before the current one
to help predict the result. For example, to predict the next word of a
text, we only need to consider a few adjacent words, the word to be
10 CHAPTER 2. BACKGROUND AND RELATED WORKS
predicted has little distance from its related words. In this case, the
RNN works. But what if the relevant information is far away from the
word to be predicted? It is difficult for standard RNNs to associate the
information to be predicted and its relevant information once the gap
between them is increasing.
To solve the problem caused by "long-term dependencies", Hochre-
iter and Schmidhuber [17] proposed Long Short-Term Memory which
aims to remember information for short term. Comparing to stan-
dard RNNs, LSTM performs better at overcoming vanishing gradients
problem by using the unit with more complex architecture as shown in
Figure 2.4. There are three gates designed to control how the informa-
tion pass through the cell. These three gates are described as follows:
• Forget gate: The forget gate controls how the information from
the current input xt and the output from the previous time step
ht−1 flow in the current cell, after calculated by an activation
function σ. Following equation 2.7 it outputs a vector ft with all
elements between 0 and 1. This vector points which information
is allowed to pass.
• Input gate: The input gate decides how much new input infor-
mation should be added to the cell state. First, a sigmoid function
decides which information needs to be updated, and a tanh func-
tion generalizes the Ĉt which means the contents available for
update. Then, the old cell state Ct−1 can be replaced by adding
CHAPTER 2. BACKGROUND AND RELATED WORKS 11
new information into the cell state and get Ct . This process is
represented by equations 2.8:
it = σ(Wi · [ht−1 , xt ] + bi )
Ĉt = tanh(WC · [ht−1 , xt ] + bC ) (2.8)
Ct = ft ∗ Ct−1 + it ∗ Ĉt
Note: in equations 2.7, 2.8, and 2.9, W∗ and b∗ means weights and biases for
the same cell in different time step respectively.
Bi-directional LSTM
Standard LSTM only considers the influence of past information. Some-
times future feature can also influence the prediction. Bi-directional
LSTM (Bi-LSTM) [18] enables the use of both past features and future
features for a certain time step. The structure of Bi-LSTM is showed
in Figure 2.5. For example, if a input sample sentence has 10 words
(10 timesteps) x1 , x2 , · · · , x10 , there exists two separate LSTM cells, the
Bi-LSTM network works as follows:
Word2Vec
One of the most frequently used methods of word embedding is Word2Vec
[20, 21] which is based on neural networks which can map words to
low-dimensional space. Word2Vec is based on two neural network
models: the Continuous Bag-of-Words Model (CBOW) and the The Skip-
Gram Model. For example, in figure 2.6, the center word is ’hiking’,
the window size is 2. If Skip-Gram model has been used, it aims to
14 CHAPTER 2. BACKGROUND AND RELATED WORKS
generate each word in the context given center word, which means the
model would learned by predicting words ’to’, ’go’, ’on’, and ’Sunday’
from word ’hiking’. While the CBOW model trys to learn by predict-
ing ’hiking’ by ’to’, ’go’, ’on’, and ’Sunday’. The word embedding pro-
duced by Word2Vec can learn some certain semantic patterns because
it uses contexts words to train. Due to its ability to evaluate the seman-
tic similarities and make semantic analogies (like "King" - "Queen" ≈
"man" - "woman"") between words, Word2Vec becomes very popular.
Glove
After works of Word2Vec published, Pennington, Socher, and Man-
ning [22] tried to make use of statistical information of the corpus and
proposed another word embedding method called Glove. Based on
the observation that two words are more relevant if they have shorter
semantic distance, Glove uses word-pair co-occurrence to make re-
lated words more distinct. For example, there are three words i, j,
and k, if word i and word j are relevant, while word i and word k
are irrelevant, the ratio of co-occurrence P (j|i)/P (k|i) should have a
comparatively large value, if word pairs i, j and i, k are not relevant
or relevant at the same time, the ratio is close to 1. So that this ratio
could differentiate relevant words and irrelevant words, and get rele-
vant word-pair. Glove is a count-based method, it can be regarded as a
method to decrease the dimension of the co-occurrence matrix. There
is not much difference on performance between Glove and Word2Vec,
but Glove is more appropriate for tasks with a large volume of data
because it is faster on parallelization. In this thesis, we use pre-trained
Glove word vectors to initiate the embedding layer of the Bi-LSTM-
CRF model for NER.
B-PER O O O O B-LOC
deep neural networks are applied for various tasks in NLP field. For
example, in tasks of dialogue system, Zhou et al. [37] used the LSTM
encoder on the top of CNN for multi-turn human-computer conversa-
tion which takes utterance into consideration to catch utterance-level
relations in the text. For POS tagging, Andor et al. [38] proposed
a transition-based approach combined with feed-forward neural net-
work, and Huang, Xu, and Yu [28] also used Bi-LSTM to predict POS
tags. Various deep models have become the new state-of-art methods
in NLP field. Correspondingly there appears some explainable deep
models focus on NLP tasks or adapted existed explainable methods to
NLP tasks, related works will be introduced more in section 2.5.5.
features of the input data. Usually, simple models like linear classi-
fier are easy to interpret, while a complicated model like a deep neural
network is difficult to understand owing to its layer-wise structure and
nonlinearity of computation. In this thesis, we define the explainabil-
ity of an AI system as the ability to explain the reason why it makes
the decision in a human-understandable way, and describe more about
the explainable model for deep neural networks.
There comes an intuitive question. Why do we need explainabil-
ity? According to Samek, Wiegand, and Müller [45], reasons can be
described from four aspects: trust from users, modification of the sys-
tem, learn from the system, and moral and legal issues.
• Moral and legal issues. Though AI affects our daily life gradu-
ally, related legal issues such as how to assign the responsibility
when AI makes the wrong decision, have not received much at-
tention until recent years. It is difficult to find perfect solutions
for these legal issues because we rely on black-box models.
which has been studied thoroughly and its application has been tested
well, then the explainability for the model is unlikely to be a prerequi-
site.
Model-unaware explanations
Most works about Model-unaware explanations often derive explana-
tions based on sensitivity. For example, Simonyan, Vedaldi, and Zis-
serman [48] use the squared partial derivatives as the class saliency
score of an image classifier with respect to a given input image, and
highlight the most sensitivity part on this image which gives spatial
support to the prediction. Similarly, Li et al. [49] compute the first-
order derivative as the salience score of a unit from different RNN clas-
sifiers for sentiment analysis, and generate heatmap matrices as expla-
CHAPTER 2. BACKGROUND AND RELATED WORKS 19
Model-aware explanations
Methods derive model-aware explanations often make use of the pa-
rameters of the model. For CNNs, Selvaraju et al. [53] use the gra-
dients from the final convolutional layer to visualize important pix-
els in the input images corresponding to the classification. Bach et
al. [54] use Layer-wise Relevance Propagation (LRP) to flow the rele-
vance score from the output layer to the input layer of CNNs with lin-
ear sum-pooling and convolution or simple multiple perceptron. This
method could measure the contribution of each input variables. Hen-
dricks et al. [55] implement an image classifier based on CNN which
could output explanatory text by an RNN trained by descriptive la-
bels and descriptions of the images. Kuwajima and Tanaka [56] pro-
posed a general idea to give inference of decision for visual recognition
tasks by extracting the overlaps between highly activated features and
frequently activated features in each class. For RNNs, Murdoch and
Szlam [57] propose an approach to generates representative phrases
from the input for LSTM model, they validate these phrases are im-
portant features by using them to construct a simple rule-based model
20 CHAPTER 2. BACKGROUND AND RELATED WORKS
Model-unaware explanations
There is not much work about how to generate model-unaware expla-
nations globally in a model-agnostic way. Ribeiro, Singh, and Guestrin
[50] make the global explanations by presenting a set of representative
local explanations to users one at a time. This method is easy to fail
when there is too much training data, and users cannot remember a
lot representative local explanations to form a global view.
Model-aware explanations
Bau et al. [60] take the activation of each neuron for an image as the se-
mantic segmentation of concepts represented by this image. Through
this dissection process, they align neurons and human-understandable
concepts to assess how well a concept is represented by a single unit.
Based on the idea of disentangled patterns, Zhang et al. [61] use an ex-
planatory graph to represent the knowledge hierarchy of a pre-trained
CNN, which enables the logic-based rules as the representations of
the inner logic of the CNN knowledge, so that the explanation could
be more concise and meaningful.
CHAPTER 2. BACKGROUND AND RELATED WORKS 21
Approaches
In this chapter, we describe the Bi-LSTM-CRF model for NER, and how
to apply LRP [54] to explain the Bi-LSTM layer of the model, as well as
the approach to visualize word vectors and measure the effectiveness
of CRF layer. First, we introduce the model as a whole, then analyze
the explainability of Bi-LSTM-CRF model for NER layer by layer.
22
CHAPTER 3. APPROACHES 23
word vectors Glove1 [22] to initialize the embedding layer, then fol-
lows a bi-directional LSTM layer and a CRF layer. The structure of the
model is shown in Figure 3.1.
In this model, information from the past (via forward states) and
the future (via backward states) can be used by the Bi-LSTM layer,
while the CRF layer can capture the relation of contextual labels to
assist the prediction of the current label. To clarity this model, we ex-
plain different layers in the model combined with the example stated
in Figure 3.1.
In the input layer, the length of each input sequence will be padded
to the same length. We suppose l is the length of each input sequence.
The length of the input sequence is equal to the number of timesteps.
At time t, the input layer takes the token of the current input sequence
which could consist of one-hot-encoding word vectors or dense vec-
tors. For example, if t = 1, the input sentence is "EU rejects German
call", in this timestep the neural network takes the word representation
of the word "EU". After l timesteps passed, the whole input sentence
has been processed.
In the Bi-LSTM layer, each neuron consist of input gate, forget gate,
and output gate as described in section 2.1.2. These neurons are dev-
ided into two directions, which takes each input sequence in order or
in reverse order to process, so that the neural network can use both
1
http://nlp.stanford.edu/data/wordvecs/glove.6B.zip
24 CHAPTER 3. APPROACHES
calculated as:
(1) (1,3) (1,3)
R1 =R1←6 + R1←7
a1 · w14 (2,3) (2,3)
= · (R4←6 + R4←7 )+ (3.7)
a1 · w14 + a2 · w24 + a3 · w34
a1 · w15 (2,3) (2,3)
· (R5←6 + R5←7 )
a1 · w15 + a2 · w25 + a3 · w35
(1) (1)
We can also calculate R2 and R3 in similar way.
Take equation 3.10 into 3.9, we can get the result of 3.9 which is as same
as the result from 3.7.
Now we can make a general rule of how LRP works. Suppose Rk
is the relevance score of neuron k in layer l + 1 in a multilayer neural
network, Rj is the relevance score of neuron j in layer l. We can follow
equation 3.11 to calculate Rj as:
X xw
Rj = P j jk Rk (3.11)
k j xj wjk +
Rj←k = · Rk (3.12)
xk + · sign(xk )
where sign(xj ) = (1xj ≥0 − 1xj ≤0 ), xj is the activation of neuron j, N is
the number of all neurons in lower-layer which connected by j.
To quantify how much one contextual word vector contributes to a
hidden state, equation 3.13 is introduced, where u is the word vector,
v is the current hidden state.
XX
Ru←v = ru←v (3.13)
u v
4.1 Dataset
In the experiments of the named entity recognizer implemented by
Bi-LSTM CRF model in this thesis, we use the CoNLL 2003 English
named entity dataset [70], which is a public dataset contains indepen-
dent named entity tags. Table 4.1 shows the size of sentences and to-
kens in training, validating and test datasets.
30
CHAPTER 4. EXPERIMENT AND RESULT ANALYSIS 31
• 8 GB DDR4 RAM
• Windows 10 64-bit OS
Library Version
Python 3.5.4
Keras 2.1.5
TensorFlow 1.5.0
Sickit-learn 0.19.1
Pandas 0.22.0
numpy 1.14.2
matplotlib 2.2.2
Zoom in the figure above, we can visualize how words close to each
other in a certain region as shown in figure 4.2. Two axes represent
the position in each direction for the word vectors after decrease the
dimension. For example, in figure 4.2, these words which are close to
CHAPTER 4. EXPERIMENT AND RESULT ANALYSIS 33
contribute to the prediction of the target word. 4.4(a) and 4.4(b) repre-
sent forward hidden states and backward hidden states respectively.
From figure 4.4 we can observe that the word "manager" (the one
before "clive") and the word "lloyd" (the one after "clive") contribute
more to the hidden states of "clive" than other words, which means
these two adjacent words shows more importance when the model is
predicting the named entity tag for "clive". Considering the word be-
fore "clive" is used to describe a person’s occupation, and the word
after "clive" is the intermediate word of the person’s name, also these
36 CHAPTER 4. EXPERIMENT AND RESULT ANALYSIS
two words are closest to the target word, it is easy to understand why
the neural network thinks these two words contributes most to the
prediction of the target word. The relevance of other contextual words
decreases when the distance of them to the target word becomes larger.
It is reasonable to say that the model captures the relation between
nearby words and the target word in this instance. But from 4.4(b), it
is clear that the relevance of the word "clive" is more concentrated on
itself. To be more specific, we visualize how every unit in the Bi-LSTM
Figure 4.5: Heatmap of every unit for a content word in Bi-LSTM layer
in two directions
layer contribute to the result of a certain word. In figure 4.5, the rele-
vance score of every dimension of hidden states for predicting the tag
CHAPTER 4. EXPERIMENT AND RESULT ANALYSIS 37
of the word "clive" is shown. It follows the trends in figure 4.4, namely
nearby contextual words contributes more to the target word. But we
notice there are some salient dimensions in both forward and back-
ward order. From 4.5(a), the 63th unit of the word vectors represent
"tour" and "clive" contribute more relevance, while in the backward
state, the 69rd unit of the word representation of "clive" shows most
importance. According to this visualization of every unit’s relevance,
we can find have a more direct impression of which units influence the
prediction.
We can also check the contribution of contextual words to any tar-
get word. For example, from figure 4.3, we can notice that the pre-
dicted tag for the word "australia" is "B-LOC" (the beginning of a loca-
tion). While from the heat map as shown in figure 4.6(a), the important
words are "manager" and "dressing" (both are a bit far away from the
target word), though "australia" itself shows quite a relevance. It is
hard to explain why a location can be tagged as the named entity "B-
LOC", we can only conclude that contextual words mentioned above
give the most contribution to the prediction for this target word.
Figure 4.7: Visualize relevance of embedding layer for the last hidden
state
observe that the Bi-LSTM keep focus on contextual words near the last
word.
For the case mentioned above, we can also visualize the relevance
of the word vectors as shown in figure 4.8. We can observe that for
the forward sequence, the word "tour" shows more relevance while in
the backward sequence the word "on" shows most importance when
predicting the tag of the target word "clive". Compare to figure 4.4, we
can show how relevance transfers among units in different layers.
tags, from that we can observe the tag of the word "major" is predicted
wrongly. The true tag is ’B-PER’ while the prediction is ’O’. We visu-
alize the forward hidden states and backward hidden states for this
word "major" and take it as the target word as shown in figure 4.11.
According to 4.11(a), in the forward hidden states of the word "major",
4.12. From this figure, we can observe the difference between results
from Bi-LSTM layer and final prediction generated by CRF layer, in or-
der to know how CRF layer changes the results. We can also observe
the influence from transition score matrix in CRF layer according to
the difference between column ’emission’ and output from CRF layer
(column ’crfout’). However, the result does not show some relations
between results from the intermediate layer and the final output. Per-
haps it is because the parameters of the CRF layer are trained together
with other parameters rather than step by step. Therefore further in-
vestigation regards the effect of transition score matrix to every word
should be made, but this is out of the scope of this thesis since it fo-
cuses more on the explanation of deep neural networks while CRF is
a statistical modeling method. This could be a part of future work as
mentioned in section 5.2.
Chapter 5
5.1 Discussion
In this thesis, we applied different methods to make Bi-LSTM-CRF
model explainable according to features of each layers. To understand
the deep model used in the named entity recognizer, we visualize the
behavior of this model. For example, t-SNE is used to visualize the
relation of different word vectors. Considering the pre-trained Glove
embedding has high dimensions for each word, we use t-SNE to re-
duce dimension of word vectors, map them to a 2-dimension coordi-
nate system, and use the distance between coordinates to represent the
similarity among words. Though t-SNE performs good on visualiza-
tion, it can cause large memory usage and long running time. If there
is a high-dimensional dataset and we do not know if it is separable, it
is suitable to project it to low-dimensional space by t-SNE and check
the separability of the dataset.
To visualize how the deep neural network behaves, sensitivity-
based or saliency-based methods which measure the importance of
each neuron could be useful. Though they are simple and intuitive,
they suffer from noise generated by high non-linearity of complex
models. In this thesis, we applied LRP to evaluate the importance
of each unit in each layer of the neural network. Although LRP is
a model-aware method, we make it adapted to NER problem. Now
it can be used for multiple types of NN. From experiments in [45]
and [62], LRP outperformed saliency-based method Sensitivity Analy-
sis [48] because LRP showed better performance on recognizing units
with most contributions for the prediction. Thus, if the deep model of
44
CHAPTER 5. DISCUSSION AND CONCLUSION 45
47
48 BIBLIOGRAPHY
[33] Danqi Chen and Christopher Manning. “A fast and accurate de-
pendency parser using neural networks”. In: Proceedings of the
2014 conference on empirical methods in natural language processing
(EMNLP). 2014, pp. 740–750.
[34] Ronan Collobert and Jason Weston. “A unified architecture for
natural language processing: Deep neural networks with multi-
task learning”. In: Proceedings of the 25th international conference
on Machine learning. ACM. 2008, pp. 160–167.
[35] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neu-
ral machine translation by jointly learning to align and trans-
late”. In: arXiv preprint arXiv:1409.0473 (2014).
[36] Jacob Devlin et al. “Fast and robust neural network joint models
for statistical machine translation”. In: Proceedings of the 52nd An-
nual Meeting of the Association for Computational Linguistics (Vol-
ume 1: Long Papers). Vol. 1. 2014, pp. 1370–1380.
[37] Xiangyang Zhou et al. “Multi-view response selection for human-
computer conversation”. In: Proceedings of the 2016 Conference on
Empirical Methods in Natural Language Processing. 2016, pp. 372–
381.
[38] Daniel Andor et al. “Globally normalized transition-based neu-
ral networks”. In: arXiv preprint arXiv:1603.06042 (2016).
[39] William John Clancey. The epistemology of a rule-based expert sys-
tem: A framework for explanation. Tech. rep. STANFORD UNIV CA
DEPT OF COMPUTER SCIENCE, 1982.
[40] William Swartout, Cecile Paris, and Johanna Moore. “Explana-
tions in knowledge systems: Design for explainable expert sys-
tems”. In: IEEE Expert 6.3 (1991), pp. 58–64.
[41] Robert Neches, William R. Swartout, and Johanna D. Moore. “En-
hanced maintenance and explanation of expert systems through
explicit models of their development”. In: IEEE Transactions on
Software Engineering 11 (1985), pp. 1337–1351.
[42] Finale Doshi-Velez and Been Kim. “Towards a rigorous science
of interpretable machine learning”. In: arXiv preprint arXiv:1702.08608
(2017).
[43] Tim Miller. “Explanation in artificial intelligence: insights from
the social sciences”. In: arXiv preprint arXiv:1706.07269 (2017).
BIBLIOGRAPHY 51
[44] Derek Doran, Sarah Schulz, and Tarek R Besold. “What does ex-
plainable AI really mean? A new conceptualization of perspec-
tives”. In: arXiv preprint arXiv:1710.00794 (2017).
[45] Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller.
“Explainable artificial intelligence: Understanding, visualizing
and interpreting deep learning models”. In: arXiv preprint arXiv:1708.08296
(2017).
[46] Leo Breiman. Classification and regression trees. Routledge, 2017.
[47] Benjamin Letham et al. “Interpretable classifiers using rules and
bayesian analysis: Building a better stroke prediction model”. In:
The Annals of Applied Statistics 9.3 (2015), pp. 1350–1371.
[48] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. “Deep
inside convolutional networks: Visualising image classification
models and saliency maps”. In: arXiv preprint arXiv:1312.6034
(2013).
[49] Jiwei Li et al. “Visualizing and understanding neural models in
NLP”. In: arXiv preprint arXiv:1506.01066 (2015).
[50] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why
should i trust you?: Explaining the predictions of any classifier”.
In: Proceedings of the 22nd ACM SIGKDD international conference
on knowledge discovery and data mining. ACM. 2016, pp. 1135–
1144.
[51] Mike Wu et al. “Beyond sparsity: Tree regularization of deep
models for interpretability”. In: arXiv preprint arXiv:1711.06178
(2017).
[52] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Model-
agnostic interpretability of machine learning”. In: arXiv preprint
arXiv:1606.05386 (2016).
[53] Ramprasaath R Selvaraju et al. “Grad-CAM: Visual Explanations
from Deep Networks via Gradient-Based Localization.” In: ICCV.
2017, pp. 618–626.
[54] Sebastian Bach et al. “On pixel-wise explanations for non-linear
classifier decisions by layer-wise relevance propagation”. In: PloS
one 10.7 (2015), e0130140.
[55] Lisa Anne Hendricks et al. “Generating visual explanations”. In:
European Conference on Computer Vision. Springer. 2016, pp. 3–19.
52 BIBLIOGRAPHY
www.kth.se