Slides PyConfr Bordeaux Calcagno

LSTM VARIATIONAL
AUTOENCODERS FOR
NETWORK SIGNAL
ANOMALY DETECTION
FACUNDO CALCAGNO
@FMCALCAGNO
1.
Tessella
Let’s start with where I work.
3
Tessella
Who we are? What do we do? What make us different?

Tessella, Altran World We partner with our We are more than 200
Class Centre for Analytics, clients to build Data Scientist worldwide
is an international data exceptional digital implementing DS
science, analytics and AI business capabilities that techniques to accelerate
technology consulting connect people and data decision making
services provider intelligently. processes.
4
2.
Unsupervised
Learning
What to do when data is not labelled?
5
6
Unsupervised Learning
○ Finding unknown patters without pre-existing labels.
○ Clearest example: Cluster Analysis is used to group datasets with

shared attributes to extrapolate algorithmic relations.
○ More sophisticated example: density estimation in your data by

intending to infer an “a apriori” probability distribution
7
3.
Anomaly
Detection
Some important concepts
8
9
Anomaly Detection
○ Identification of rare events or observations which will raise suspicions by

being different from the larger part of the data.
○ Common cases: bank fraud, medical problems, abnormal server activity, etc.
○ Classical methods to attack this problem: SVM’s, HMM, Clustering analysis,

Bayesian Networks and… Autoencoders
10
4.
Deep Learning
What’s all the buzz about?
11
DEEP
LEARNING
Deep learning is a collection of algorithms used in machine learning, used to model

high-level abstractions in data through the use of model architectures, which are
composed of multiple nonlinear transformations. It is part of a broad family of
methods used for machine learning that are based on learning representations of
data.
12
DEEP LEARNING
Large amounts of Data Different types of filters

for different tasks
Availability of really big datasets
open and free to use. FC Filters, Conv Filters, LSTMs,
Dropout, etc. A collection of different
Optimization Algorithms filters that can be selected for
different tasks
Application of SDG and its variants
DL Frameworks Computing Capabilities

Google’ and Facebook GPU’s, TPU’s and Pods TPU reduced
competition to rule the DL world training times enormously.
has helped the environment to
be more mature . 13
5.
RNN Networks
Lets go into some applicable AI !
14
RECURRENT NEURAL NETWORKS
RNN’s
Networks with loops A copy of itself

With this type of networks we can A recurrent neural network can be
allow information to persist. thought of as multiple copies of the
same network, each passing a
message to a successor
15
THE PROBLEM OF LONG TERM
DEPENDENCIES
Gaps Theory vs Practice
The Gap between the relevant In theory RNN’s should be capable of

information and the point where is learning long term dependencies. However,
needed can be very large, and RNNs fail in practice it is not what Happens.
to learn to connect distant information. The problem was explored in depth
by Hochreiter and Bengio. 16
6.
LSTM Networks
Solving the Long Term Dependency problem!
17
18
LONG SHORT TERM MEMORY NETWORKS
LSTM’s
A conveyor belt . Introduction of Gates

The Cell State, the horizontal line running The LSTM does have the ability to remove or
through the top of the diagram, runs though add information to the cell state, carefully
the entire chain and controls the flow. regulated by structures called gates. 19
LSTM 101
FORGET GATE
The Forget Gate Importance of forgetting

The first step in our LSTM is to decide Lstm’s excel in knowing which
what information we’re going to throw information to save for the future , which
away from the cell state. won’t be relevant and when to forget it.
20
This gate has that function.
LSTM 101
INPUT LAYER GATE
The Input Layer Gate

This gate is used to decide which information
we are going to store in the cell state.
21
LSTM 101
OUTPUT LAYER GATE
The Output Layer Gate

Finally, we need to decide what we’re going
to output. This output will be based on our
cell state, but will be a filtered version. 22
7.
Autoencoders
Who can we learn data representations using

Deep Learning?
23
AUTOENCODERS
Dimensionality Reduction
An autoencoder neural network is an unsupervised learning algorithm that applies
backpropagation, setting the target values to be equal to the inputs.
By making the network learn how to reproduce the input it forces the Compressing
Layer (or Latent Variables) to learn a “compressed” representation of the input. 24
VARIATIONAL AUTOENCODERS
Gaussian distributed Latent Variables

VAEs inherit the architecture of traditional autoencoders and use this to learn a
data generating distribution, which allows us to take random samples from the
latent space. The variational autoencoder uses variational inference to generate its
approximation to this posterior distribution. In the variational autoencoder, p is
specified as a standard Normal distribution with mean zero and variance one. 25
VARIATIONAL AUTOENCODERS
KL DIVERGENCE
Kullback-Leiber Divergence
The KL Divergence is a measure on how “off” two probabilities distributions P(X) and Q(X)
q between two probability distributions.
are. In other words it measures the distance
Variational Autoencoders uses a Reversed KL Divergence to minimize the difference
between the true distribution P(z|x) and a Gaussian distribution 26
8.
Croissant Model
Let’ go deep into our “Croissant” Model!
A Bi-Directional LSTM Variational Autoencoder

27
CROISSANT MODEL
Reconstructed Sequence Data

Encoder Decoder
Sequence Input Data
Mean
Bi- Bi-
Random Average
Directional Average Directional
Sampling
LSTM LSTM
Sigma
Latent Variables
28
CROISSANT MODEL
Inference Time
Reconstructed Sequence Data

Encoder Decoder
Sequence Input Data
Mean
Bi- Bi-
Random Average
Directional Average Directional
Sampling
LSTM LSTM
Sigma
Latent Variables
Latent Variable Mean

We use this output as our Anomaly
Score! We apply K-mean to
generate a threshold by signal. 29
TensorFlow Code
from keras import backend as K

from keras.models import Model
from keras.layers import Input, RepeatVector
from keras.layers import CuDNNLSTM
from keras.layers import Bidirectional
from keras.layers import Concatenate Imports
Tensorflow and Keras
from keras.layers.core import Dense, Lambda
necessary imports
from keras import objectives
from keras.optimizers import RMSprop
from keras.utils import multi_gpu_model
import tensorflow as tf
30
TensorFlow Code
class CroissantModel:
def __init__(self,
input_dim,
timesteps, Parameters
batch_size, Models parameters
intermediate_dim,
latent_dim,
epsilon_std=1.,
gpus=1,
learning_rate=0.001):
31
TensorFlow Code
def generate_model(self):
with tf.device('/gpu:0'):
x = Input(shape=(self.timesteps, self.input_dim,),
name="Main_input_VAE")
# LSTM encoding
h1 = Bidirectional(
CuDNNLSTM(self.intermediate_dim, The Encoder
kernel_initializer='random_uniform', Coding the Bi-Directional
input_shape=(self.timesteps,self.input_dim,) LSTM Encoder
),
merge_mode='ave')(x)
# VAE Z layer
z_mean = Dense(self.latent_dim)(h1) The Latent Variables
z_log_sigma = Dense(self.latent_dim)(h1)
Z mean and Z log sigma
def sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(
shape=(self.batch_size, self.latent_dim), Sampling
mean=0.,stddev=self.epsilon_std Sampling the generated
) distribution with
return z_mean + z_log_sigma * epsilon
parameters
z = Lambda(sampling,output_shape=(sef.latent_dim,) Z mean and Z log sigma
)([z_mean, z_log_sigma]) 32
TensorFlow Code
decoder_h = Bidirectional(
CuDNNLSTM(
self.intermediate_dim,
kernel_initializer='random_uniform', The Decoder
input_shape=(self.timesteps,self.latent_dim,), Coding the Bi-
return_sequences=True
Directional LSTM
),
merge_mode='ave' Decoder
)
33
TensorFlow Code
h_decoded = RepeatVector(self.timesteps)(z)
h_decoded = decoder_h(h_decoded)
# decoded layer
x_decoded_mean = decoder_mean(h_decoded)
# end-to-end autoencoder
vae = Model(x, x_decoded_mean)
# encoder, from inputs to latent space Model Generation

Generating the 3 output
encoder = Model(x, z_mean)
# generator, from latent space to reconstructed inputs models:

decoder_input = Input(shape=(self.latent_dim,)) The VAE, the Encoder and the
_h_decoded = RepeatVector(self.timesteps)(decoder_input) decoder
_h_decoded = decoder_h(_h_decoded)
_x_decoded_mean = decoder_mean(_h_decoded)
generator = Model(decoder_input, _x_decoded_mean)
34
TensorFlow Code
def vae_loss(x_loss, x_decoded_mean_loss):

"""
Loss function for the Variational AUto-Encoder
:param x_loss:
:param x_decoded_mean_loss:
:return:
"""
xent_loss = objectives.mse(x_loss, x_decoded_mean_loss)
kl_loss = - 0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma))
loss = xent_loss + kl_loss
return loss
opt_rmsprop = RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-4, decay=0)
Loss Function Optimization Algorithm

50% Mean Square loss and RMSProp with Decaying
50% KL divergence loss learning rate using Tensorflow
callbacks 35
TensorFlow Code
if self.gpus > 1:
try:
vae = multi_gpu_model(vae, gpus=self.gpus)
except:
print("Error in Multi GPU")
vae.compile(optimizer=opt_rmsprop, loss=vae_loss)
return vae, encoder, generator
Keras MultiGPU Model Compilation

MultiGPU activation if
possible Return 3 Models
36
TensorFlow Code
Training
croissant = Cm.CroissantModel(
input_dim=input_dim,
timesteps=timesteps,
batch_size=args.batch_size,
intermediate_dim=args.intermediate_dim,
latent_dim=args.latent_dim,
epsilon_std=1.,
gpus=gpus,
learning_rate=args.learning_rate
)
vae, enc, dec = croissant.generate_model()
history = vae.fit_generator(
generator=sequence_train,
validation_data=sequence_test,
use_multiprocessing=args.multi_proc,
max_queue_size=args.gen_queue_size,
workers=args.gen_workers,
epochs=args.epochs,
verbose=1,
callbacks=all_callbacks,
shuffle=args.shuffle_dataset
)
37
TensorFlow Code
Inference
croissant_gpu = Cm.CroissantModel(
input_dim=input_dim,
timesteps=args.window_size,
batch_size=args.batch_size,
intermediate_dim=args.intermediate_dim,
latent_dim=args.latent_dim,
gpus=len(args.n_gpus))
vae, enc, dec = croissant_gpu.generate_model()
vae.load_weights(ckp_filepath)
inference_model (vae,
enc,
interfaces_files,
str(folder_ingestion),
args.window_size,
args.window_shift,
args.batch_size,
columns_to_load,
args.max_files,
outputFolderName,
len(args.n_gpus),
export_files=args.export_files,
export_encoding=args.include_encoding) 38
9.
Results
Let’ show some of the signals
39
Results
40
Results
41
Results
42
Results
43
Results
44
Results
45
“
Deep-learning will transform every single industry.
Healthcare and transportation will be transformed by deep-
learning. I want to live in an AI-powered society.
Andrew Ng
46
LSTM VARIATIONAL
AUTOENCODERS FOR
NETWORK SIGNAL
ANOMALY DETECTION
FACUNDO CALCAGNO
@FMCALCAGNO

Slides PyConfr Bordeaux Calcagno

Uploaded by

Copyright:

Available Formats

Slides PyConfr Bordeaux Calcagno

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides PyConfr Bordeaux Calcagno

Uploaded by

Copyright:

Available Formats

LSTM VARIATIONAL

Let’s start with where I work.

Who we are? What do we do? What make us different?

What to do when data is not labelled?

○ Finding unknown patters without pre-existing labels.

○ Clearest example: Cluster Analysis is used to group datasets with

○ More sophisticated example: density estimation in your data by

Some important concepts

○ Identification of rare events or observations which will raise suspicions by

○ Classical methods to attack this problem: SVM’s, HMM, Clustering analysis,

What’s all the buzz about?

Deep learning is a collection of algorithms used in machine learning, used to model

Large amounts of Data Different types of filters

DL Frameworks Computing Capabilities

Lets go into some applicable AI !

Networks with loops A copy of itself

Gaps Theory vs Practice

The Gap between the relevant In theory RNN’s should be capable of

Solving the Long Term Dependency problem!

A conveyor belt . Introduction of Gates

The Forget Gate Importance of forgetting

The Input Layer Gate

The Output Layer Gate

Who can we learn data representations using

Gaussian distributed Latent Variables

Let’ go deep into our “Croissant” Model!

A Bi-Directional LSTM Variational Autoencoder

Reconstructed Sequence Data

Reconstructed Sequence Data

Latent Variable Mean

from keras import backend as K

# encoder, from inputs to latent space Model Generation

# generator, from latent space to reconstructed inputs models:

def vae_loss(x_loss, x_decoded_mean_loss):

opt_rmsprop = RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-4, decay=0)

Loss Function Optimization Algorithm

return vae, encoder, generator

Keras MultiGPU Model Compilation

Let’ show some of the signals

You might also like