Deep learning, based on artificial neural networks, has become prevalent in anomaly detection due to its capability to learn expressive feature representations and/or anomaly scores for complex data such as text, audio, images, videos and graph [
161]. A wealth of deep anomaly detection methods, including those based on AE,
Long Short-Term Memory (
LSTM), CNN,
Generative Adversarial Network (
GAN) and other neural networks, have been proposed and have been shown to be more accurate than traditional methods when it comes to detecting anomalies in complex data. However, although deep anomaly detection methods tend to have high detection accuracy, they are often criticized for their poor interpretability. For this reason, some studies have attempted to leverage post-hoc XAI techniques to improve the interpretability of corresponding neural networks. Importantly, which XAI techniques are available may vary depending on the specific neural network used. For instance, AE-based models typically employ reconstruction errors to explain anomalies, while LSTM-based models generally leverage SHAP techniques to interpret anomalies. Therefore, we will present the review results according to the type of neural network used to perform anomaly detection, which is correlated with the data type that can be used (e.g., CNNs for images and RNNs for sequential data).
7.1 Explaining AutoEncoders
An AE is a type of neural network that first encodes the given data instances into some low-dimensional feature representation space and then decodes them back under the constraint of minimizing the reconstruction error. Several types of AEs have been introduced, including vanilla AE such as replicator neural network,
Sparse AutoEncoders (
SAE), DAE,
Contractive AutoEncoders (
CAE), VAE, and other variants [
16]. AEs are widely used for anomaly detection, based on the assumption that anomalies are more difficult to reconstruct from the compressed feature representation space than normal instances.
First of all, Shapley values-based techniques such as SHAP are typically used to obtain feature contributions for explaining AEs. For instance, Giurgiu & Schumann [
72] extend SHAP to explain anomalies identified via a GRU-based AutoEncoder in multivariate time series data. Specifically, they modify kernel SHAP [
128] to output the windows that contribute the most to the anomaly and also the windows that counteract the most to the anomaly as explanations. Besides, to detect and explain anomalies in mobile
Radio Access Network (
RAN) data, Chawla et al. [
40] set up a Sparse SAE-based anomaly detection algorithm and then applies kernel SHAP to explain the results. Furthermore, Jakubowski et al. [
91] propose a VAE model combined with Shapley values to detect and interpret anomalies in an asset degradation process. Concretely, they compute Shapley values to generate both local and global explanations for anomalies. Additionally, Serradilla et al. [
192] utilise different machine learning approaches to detect, predict and explain anomalies in press machines to achieve predictable maintenance. To interpret an anomaly detected by AE, they first leverage t-SNE [
214] to visualise the learned latent feature spaces. Next, they employ the GradientExplainer tool [
127], which combines SHAP, Integrated Gradients [
206], and SmoothGrad [
201], to analyze which input features are associated with the anomaly.
Second, many methods attempt to track reconstruction errors to obtain feature contribution by exploring the internal structure of AEs. Therefore, these methods are generally model-specific. For instance, Ikeda et al. [
88] design a
Multimodal AutoEncoder (
MAE) model to detect anomalies emerging in ICT systems. More importantly, by using sparse optimization, they also propose an algorithm to estimate the contributing dimensions in an AE to anomalies as explanations. Besides, Nguyen et al. [
151] introduce a framework called GEE to detect and explain anomalies in network traffic. Specifically, they train a VAE model on a normal dataset to learn the normal behaviour of a network, and then employ gradient-based fingerprinting technique to identify the main features causing the anomaly. Similarly, Memarzadeh et al. [
140] propose a DGM-based on VAE. Particularly, they achieve model interpretability by evaluating feature importance through the random-permutation method. Additionally, Chen et al. [
45] put forward DAEMON, which trains an
Adversarial AutoEncoder (
AAE) to learn the typical pattern of multivariate time series, and then use the reconstruction error to identify and explain anomalies. Meanwhile, to monitor wireless spectrum and identify unexpected behaviour, Rajendran et al. [
171] present an AAE-based anomaly detection method named SAIFE. Since the AAE is trained in three phases, viz.
reconstruction,
regularization, and
semi-supervised [
130], SAIFE attempts to localize the anomalous regions based on the reconstruction errors coupled with the semi-supervised features, providing explanations for the anomalies. Furthermore, Ikeda et al. [
89] set up an anomaly detection model based on VAE, and then estimate the features that contribute the most to the identified anomalies as explanations. Concretely, they present an approximative probabilistic model based on the trained VAE to estimate contributing features via exploring the so-called
true latent distribution. The
true latent distribution defines how an anomalous instance would be if it were normal. Importantly, they argue that directly estimating feature contribution based on the deviating latent distribution or reconstruction errors will lead to high false positives and/or negatives.
Third, some researchers attempt to utilise surrogate models such as LIME and rule learners to explain AEs. For example, Wu & Wang [
220] propose a neural network-based model incorporating LIME techniques to detect and interpret fraudulent credit card transactions. Specifically, the anomaly detection model contains an AE and an MLP classifier, which are trained in an adversarial manner. To interpret an anomaly, they apply three independent LIME-based models to explain the AE, MLP, and AE & MLP models, respectively. Besides, Song et al. [
203] develop the EXAD system to identify and interpret anomalies from Apache Spark traces. First, the EXAD system adapts AE and LSTM to perform anomaly detection. Second, they propose three ways to explain anomalies. The first one is to build a conjunction of the atomic predicates, which can be solved by a greedy algorithm but cannot guarantee the performance. To overcome this limitation, the second one attempts to use an entropy-based reward function to build atomic predicates. Furthermore, they present these constructed predicates in a Conjunctive Normal Form. The third one is to approximate the anomaly detection neural networks using a decision tree. From the decision tree, they generate explanations in a DNF. Additionally, De Moura et al. [
56] present the
Lane Change Detector (
LCD) model to detect and explain when the surrounding vehicles of an ego vehicle change their lanes. Specifically, the LCD model consists of three independent AE models trained on three different datasets. On this basis, they set up a decision rule set based model by extracting rules from the reconstruction errors produced by these three separate models, to determine when an anomaly happens. Besides, Gnoss et al. [
73] first annotate journal entries with previously trained AutoEncoders and then train three XAI models using these annotations. First, they utilise Decision Tree and Linear Regression, two intrinsically interpretable models, to simulate AE. The feature importance values of Decision Tree and the odd ratio values are calculated to show which feature is relevant to the anomalies. Additionally, they also leverage SHAP to explain the AE model.
Fourth, visualisation techniques such as Heatmaps and Saliency Maps are often constructed to help explain AEs. For instance, Kitamura & Nonaka [
104] set up an encoder-decoder-based model to detect anomalies in images. To generate explanations for an anomaly, they first develop a feature extractor that is trained on a dataset consisting of normal images and their corresponding reconstructed images. Second, using this feature extractor to extract latent features, their method attempts to find the difference in the feature-level between the input image and the reconstructed image. On this basis, their method localizes and visualizes abnormal regions as explanations for the anomaly. Besides, Feng et al. [
68] develop a Two-Stream AE-based model to detect abnormal events in videos and then utilise a Feature Map Visualization method to interpret the anomalies. Moreover, Guo et al. [
77] set up a Sequence-to-Sequence VAE-based model to detect anomalies in event sequences. To reveal anomalous events, they investigate the differences between the anomalous sequence together with its reconstructed sequence and a set of normal sequences close to the anomalous sequence in the latent space. Importantly, they build a visualization tool to facilitate the comparisons. In addition, Szymanowicz et al. [
208] develop a method for detecting and automatically explaining anomalous events in video. They first design an encoder-decoder architecture based on U-Net [
177] to detect anomalies, thereby generating saliency maps by computing per-pixel differences between actual and predicted frames. Second, based on the per-pixel squared errors in the saliency maps, they introduce an explanation module that can provide spatial location and human-understandable representation of the identified anomalous event.
Finally, a wide range of methods such as feature selection, Markov Chain Monte Carlo, and providing similar historic anomalies, are also explored to facilitate the interpretability of AE-based anomaly detection. For example, Chakraborttii & Litz [
35] develop an AE-based model to detect
Solid-State Drive (
SSD) failures. To produce explanations, they investigate the reconstruction error per feature, wherein a feature with a reconstruction error greater than the average error is considered a significant cause. Particularly, they apply three types of feature selection techniques, viz. Filter, Wrapper and Embedded, to select important features to train the AE model, facilitating the interpretability of resulted anomaly detection model. Besides, Li et al. [
115] develop a VAE and
genetic algorithm (
GA)-based framework, called VAGA, to detect anomalies in high-dimensional data and search corresponding abnormal subspaces. Concretely, for each identified anomaly, they utilize a GA to search the subspace where the anomaly deviates most. Additionally, Li et al. [
117] introduce
InterFusion, a model based on
hierarchical Variational AutoEncoder (
HVAE) and
Markov Chain Monte Carlo (
MCMC) for detecting and explaining anomalies in multivariate time series data. Specifically, given an anomaly, they set up a MCMC-based method to find a set of the most anomalous metrics as explanations. Furthermore, Assaf et al. [
14] develop a
Convolutional AutoEncoders (
ConvAE)-based anomaly detection method and an explainability framework to detect and explain anomalies in data storage systems, respectively. Particularly, for each anomaly, they attempt to use cosine similarity over the embedding space to find similar historical anomalies, thereby explaining the anomaly through association.
Discussion: AEs are the most widely used deep learning method to detect anomalies in tabular data, sequence data, image data, video data and graph data. As a result, a plethora of methods are also proposed to explain AEs. Concretely, XAD techniques such as reconstruction error-based feature contribution, Kernel SHAP, GradientExplainer, LIME, rule extraction and feature map visualisation are often leveraged to obtain explanations. Importantly, most of these explanation methods only provide weak interpretability, as they only explain a single anomaly at a time by exploring some important properties of AE-based detection models.
7.2 Explaining Recurrent Neural Networks
A
Recurrent Neural Network (
RNN) is a specific type of neural network that is capable of learning features and long-term dependencies in sequential data [
183]. Specifically, sequential data refers to any data that is ordered into sequences, including time series, text streams, DNA sequences, audio clips, video clips, and so on. To address the different challenges of modelling sequential data, various RNN architectures have been proposed. More concretely, frequently used RNNs include deep RNNs with MLP,
Bidirectional RNN (
BiRNN),
Recurrent Convolutional Neural Networks (
RCNN),
Multi-Dimensional Recurrent Neural Networks (
MDRNN), LSTM,
Gated Recurrent Unit (
GRU), Memory Networks,
Structurally Constrained Recurrent Neural Network (
SCRNN),
Unitary Recurrent Neural Networks (
Unitary RNN), and so on. Particularly, by assuming normal instances are temporally more predictable than anomalous instances, RNNs are extensively used to identify anomalies in sequential data because of their ability to model temporal dependencies.
First of all, Shapley values-based techniques such as SHAP are the most typical method used to obtain feature contributions, aiming to explain anomalies identified by RNNs. For instance, Zou and Petrosian [
210] utilise Decision Trees [
42] and DeepLog [
63] to detect anomalies in system logs, and then explain the results using the Shapley value approach. To explain an anomaly, they treat each event in the logs as a player without examining the model structure to generate Shapley values. Moreover, Hwang & Lee [
87] propose a bidirectional stackable LSTM-based anomaly detection model for industrial control system anomaly detection. For each identified anomaly, they employ SHAP values to obtain a contribution score of each feature as an explanation. Similarly, Jakubowski et al. [
92] examine the issue of anomaly detection when hot rolling slabs into coils. They utilise LSTM to construct a modified AutoEncoder architecture in order to find anomalies. Importantly, they are able to pinpoint the origin of the majority of the abnormalities identified by the deep learning model through analysis of the SHAP interpretation. Furthermore, Nor et al. [
152] present a probabilistic LSTM-based model combined with SHAP to detect and interpret anomalies in gas turbines. More importantly, they evaluate the quality of post-hoc explanations from two aspects, viz.
local accuracy and
consistency. Specifically,
local accuracy describes the relationship between feature contributions and predictions, while
consistency checks whether the interpretation is consistent with changes in the input features.
Second, some researchers attempt to utilise surrogate models such as LIME to explain RNN. For example, Herskind Sejr et al. [
84] create a predictive neural network-based unsupervised system by training an LSTM model and using reconstruction errors to assess data abnormalities. Importantly, the system offers two layers of anomaly interpretation: deviations from model predictions, and interpretations of model predictions, in order to make the process transparent to developers and users. They employ Mean Absolute Error to illustrate how observations diverge from assumptions at the first level. For the second level, they simulate a black-box model to provide an explanation using LIME. Additionally, Mathonsi & van Zyl [
136] present
Multivariate Exponential Smoothing Long Short-Term Memory (
MES-LSTM) that combines statistics and deep learning. Particularly, they integrate SHAP and LIME and introduce a metric—called Mean Discovery Score—that aims at showing which predictors are most strongly associated with the anomalies.
Third, other methods such as
Layer-wise Relevance Propagation (
LRP), Integrated Gradients, and Attention Mechanism, are also leveraged to explain RNN-based anomaly detection. For instance, due to the complexity of log systems and the unstructured nature of the resulting logs, Patil et al. [
166] use LSTM to detect anomalies in such systems. To generate explanations for each identified anomaly, they utilise LRP to generate relevance scores for every feature at every timestep. Moreover, Han et al. [
80] present
InterpretableSAD, a Negative Sampling based method for detecting and interpreting anomalies in sequential log data. First, due to the scarcity of anomalous instances, they adapt a data augmentation strategy via negative sampling to generate a dataset that contains sufficient anomalous samples. Second, they train an LSTM model based on this augmented labelled dataset. Third, they apply Integrated Gradients to identify anomalous events that lead to the outlyingness. Furthermore, to detect anomalies in system logs, Brown et al. [
28] implement four attention mechanisms in LSTM and prove that compared to Bidirectional LSTM, the attention mechanism augmented LSTM not only retains high performance but also provides information about feature importance and relationship mapping between features, which provides explainability.
Discussion: RNNs are primarily employed to detect anomalies in sequence data. Typical XAD techniques for interpreting anomalies identified by RNN-based models include Shapley-value-based methods, surrogate models, and other versatile techniques such as LRP, Integrated Gradients, and Attention Mechanism. These post-hoc explanation methods are usually computationally expensive, making it difficult to provide real-time explanations.
7.3 Explaining Convolutional Neural Networks
A CNN is a specific type of neural network inspired by the visual cortex of animals. CNNs are widely used in the computer vision field because of their strong ability to extract features from image data with convolution structures. Moreover, CNNs have also been shown to be useful for extracting complex hidden features in sequential data [
74]. Accordingly, a variety of CNN architectures have been proposed, including LeNet, AlexNet, GoogleNet, VGGNet, Inception V4, ResNet, and so on. Some studies have attempted to utilize CNNs for anomaly detection, especially in the fields of intrusion detection, image anomaly detection, and so on.
First, one line of research attempts to utilise surrogate models such as LIME and rule learners to explain CNN. For example, Cheong et al. [
47] set up a
SpatioTemporal Convolutional Neural Network–based Relational Network (
STCNN-RN) to detect anomalous events in financial markets. For each anomaly, they apply LIME to provide a local explanation by indicating the contribution of each feature. Besides, Levy et al. [
114] propose an end-to-end anomaly detection model named AnoMili, which can also provide real-time explanations. Specifically, AnoMili consists of four stages. First, they introduce a physical intrusion detection mechanism by using AE. Second, if no anomalous device is discovered, they train a CNN-based classifier on voltage signals of each device, aiming to detect spoofing attacks. Third, they utilise LSTM to build a context-based anomaly detection mechanism, which detects anomalous messages based on their context. Finally, to interpret an anomalous message, they leverage decision tree to locally approximate the detection result and also apply SHAP TreeExplainer [
128] to identify the most important features in real-time.
Second, visualisation techniques are often combined with other techniques such as Gradient Backpropagation and LRP to explain CNN-based anomaly detection. For instance, Saeki et al. [
182] present a CNN-based method to detect and explain machinery faults based on vibration data. For each detected anomaly, they utilize grad-CAM [
190], which is a gradient-based localization approach, to obtain an importance map in the feature space. Fourth, they combine the results of grad-CAM with a visualization approach called Guided Backpropagation [
204]. Concretely, this method can visualize the predictions via backpropagation from the output space to the input space, generating explanations for the anomaly. Moreover, Chong et al. [
48] introduce a CNN-based Teacher–Student Network-based model combined with LRP technique to detect and explain anomalies. To interpret an anomaly, they provide an example-based explanation by showing its top prototypes (namely, top nearest neighbours). Importantly, they apply LRP to show a pixel-level similarity between the anomaly and each of its top prototypes. Additionally, Szymanowicz et al. [
207] introduce a model to detect and explain anomalies in videos. Specifically, they implement R-CNN to detect objects in video, and then employ Dual Relation Graph for human-object interaction recognition. The video is encoded with a collection of
human-object interaction vectors (
HOI vectors) for each frame. When the likelihood of the HOI vector in a scenario is less than a threshold, an anomaly is proclaimed. After using PCA to reduce the dimension of non-anomalies, they train a
Gaussian Mixture Model (
GMM). A video frame is deemed abnormal if any of its HOI vectors are lower than the threshold probability under the GMM. The distance between the anomalous HOI vector and the usual HOI vector is then weighted and visualized as a 2D heatmap to help understand abnormalities.
Third, some researchers attempt to directly utilise the semantic anomaly scores as explanations. For instance, Hinami et al. [
85] utilise a general CNN model and context-sensitive anomaly detectors to identify and explain abnormal events in films. Specifically, they set up a Fast R-CNN-based model to learn multiple concepts in videos and then extract semantic features. On this basis, they apply a context-sensitive anomaly detector to obtain semantic anomaly scores, which can be seen as explanations for anomalies.
Discussion: CNN-based anomaly detection models are mainly leveraged to detect anomalies in image data. To explain anomalies identified by CNNs, XAD techniques such as surrogate models (LIME and rule learners), Gradient Backpropagation, LRP, and visualisations are commonly used. However, some post-hoc explanation methods, especially surrogate models, may suffer from poor explanation fidelity. In other words, the generated explanations may not reflect the actual anomaly detection process of CNNs.
7.4 Explaining Other Deep Neural Networks
In addition to AEs, RNNs and CNNs, other DNNs—such as GANs, Deep OCSVM, and Deviation Network (DevNet)—can also be used for anomaly detection. Therefore, the interpretation of these types of networks is also relevant.
First, some studies propose explanation methods for general DNNs. For instance, Amarasinghe et al. [
8] propose a framework for explainable DNN-based anomaly detection. Specifically, they assume the anomaly detection is performed in a supervised setting and leverage LRP to obtain the input feature relevance for making a decision. Besides, Sipple [
196] trains an anomaly detector using Neural Network with negative sampling to detect device failures in the Internet of Things. For each identified anomaly, they leverage Integrated Gradients techniques to attribute the anomaly score to each feature and provide a contrastive nearest normal instance as explanations.
Second, some researchers utilise techniques such as self-attention learning-based feature selection or gradient back propagation-based feature contribution to explain a DevNet. For instance, Xu et al. [
221] propose
Attention-guided Triplet deviation network for Outlier interpretatioN (
ATON) to explain anomalies in a post-hoc fashion. Specifically, ATON is composed of two main modules, viz. the feature embedding module and the customized self-attention learning module. The feature embedding module transforms the original feature space into an embedding space with extended high-level information. Meanwhile, given an anomaly, the customized self-attention learning module can obtain the contribution of each learned feature to its separability. Based on the embedding module and the corresponding attention coefficients, they distil a subset of the original features that lead to the separability of the anomalous instance. Meanwhile, Pang et al. [
160] put forward FASD, a weakly-supervised framework to detect anomalies when a few labeled anomalies of interest are available. Specifically, they instantiate this framework as a DevNet model, which assumes that the anomaly scores of normal instances are drawn from a Gaussian prior distribution and the anomaly scores of anomalies come from the upper tail of the prior. To interpret an anomaly, they compute the contribution of each input feature to the final anomaly score through gradient-based back propagation.
Third, deep Taylor decomposition [
144] is leveraged to explain models such as OCSVM, KDE, and so on. For example, Kauffmann et al. [
96] first convert the OCSVM models to neural networks, and then they modify the deep Taylor decomposition method to be applicable to these neural networks. In addition, they show its superiority to other explanation methods such as Distance Decomposition, Gradient-Based Method, SHAP Values, and Edge Detection, which are commonly used in deep learning to produce pixel-wise explanations of decisions. However, this method itself has many parameters to tune when applied to different methods or datasets, sometimes rendering the explanation method itself not explainable. Moreover, it also makes many strong assumptions and approximations. Similarly, Kauffmann et al. [
97] reveal the widespread occurrence of Clever Hans phenomena in unsupervised anomaly detection models. Concretely, they propose an XAI procedure based on Deep Taylor Decomposition to highlight relevant features for detecting anomalies and apply it on models including AutoEncoder reconstruction-based detectors, Deep One-Class and KDE-based detectors, generating pixel-wise explanations of outlyingness.
Finally, visualisation techniques can be leveraged to help explain anomalies. For instance, Liu et al. [
125] create the deep temporal clustering framework seq2cluster, which can cluster and detect anomalies in time series with varying lengths. The Temporal Segmentation, Temporal Compression network, and GMM Estimation modules make up seq2cluster. In particular, each sequence is divided into non-overlapping temporal segments via the Temporal Segmentation module. A low-dimensional representation of each time segment is what the Temporal Compression network aims at achieving. Moreover, the Estimation Network for GMMs utilises the latent space representation to perform density estimation. Therefore, data instances can be clustered in latent space to find anomalies based on the likelihood of each segment sample. The results of anomaly detection can also be more easily interpreted when anomalies found in the latent space are adequately visualized.
Discussion: In addition to the above-mentioned DNNs, namely, AEs, RNNs, CNNs, GANs, Deep OCSVM, and DevNet, other DNNs such as Graph Neural Networks [
39] and Transformers [
119] have become prevalent in anomaly detection. Therefore, the interpretation methods of these DNNs are also relevant.
7.5 Summary
To wrap up our review of post-model XAD techniques for DNNs, Table
4 gives an overview of all techniques discussed and we have several high-level observations.
First, most deep post-model XAD techniques are model-specific in the sense that they are only applicable to a family of specific neural networks or all neural networks. This is in stark contrast with most shallow post-model XAD techniques, which are typically model-agnostic. This is because these deep post-model XAD techniques provide explanations by exploring the internal structure of the neural network. By doing so, although these explanation methods cannot be generalized to other anomaly detection models, the resulting explanations are usually faithful as the Explanation-Definition is in compliance with the Detection-Definition. However, techniques such as SHAP, LIME, and some rule learners are model-agnostic and are, therefore, more likely to suffer from poor fidelity.
Second, nearly all deep post-model XAD techniques provide only feature-based explanations; the only exceptions are References [
14,
48,
72], which also produce sample-based explanations. Regarding the techniques used, the Shapley values-based approach is the most popular one. More importantly, one can see that most deep post-model XAD techniques are proposed to explain anomalies detected in sequential data such as time series and system logs.
Third, nearly all deep post-model XAD techniques can only provide local explanations. In other words, they can only explain a single anomaly at a time. Due to the complexity of neural networks, it is extremely challenging, if possible, to understand the entire decision-making process. To help end-users understand why an instance is reported as anomalous, deep post-model XAD techniques often inspect some important properties of the neural networks, such as feature contribution to reconstruction errors, thereby providing weak interpretability.