Nothing Special   »   [go: up one dir, main page]

Slides PyConfr Bordeaux Calcagno

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

LSTM VARIATIONAL

AUTOENCODERS FOR
NETWORK SIGNAL
ANOMALY DETECTION
FACUNDO CALCAGNO
@FMCALCAGNO
1.
Tessella

Let’s start with where I work.

3
Tessella

Who we are? What do we do? What make us different?


Tessella, Altran World We partner with our We are more than 200
Class Centre for Analytics, clients to build Data Scientist worldwide
is an international data exceptional digital implementing DS
science, analytics and AI business capabilities that techniques to accelerate
technology consulting connect people and data decision making
services provider intelligently. processes.

4
2.
Unsupervised
Learning

What to do when data is not labelled?

5
6
Unsupervised Learning

○ Finding unknown patters without pre-existing labels.

○ Clearest example: Cluster Analysis is used to group datasets with


shared attributes to extrapolate algorithmic relations.

○ More sophisticated example: density estimation in your data by


intending to infer an “a apriori” probability distribution

7
3.
Anomaly
Detection

Some important concepts

8
9
Anomaly Detection

○ Identification of rare events or observations which will raise suspicions by


being different from the larger part of the data.

○ Common cases: bank fraud, medical problems, abnormal server activity, etc.

○ Classical methods to attack this problem: SVM’s, HMM, Clustering analysis,


Bayesian Networks and… Autoencoders

10
4.
Deep Learning

What’s all the buzz about?

11
DEEP
LEARNING

Deep learning is a collection of algorithms used in machine learning, used to model


high-level abstractions in data through the use of model architectures, which are
composed of multiple nonlinear transformations. It is part of a broad family of
methods used for machine learning that are based on learning representations of
data.
12
DEEP LEARNING

Large amounts of Data Different types of filters


for different tasks
Availability of really big datasets
open and free to use. FC Filters, Conv Filters, LSTMs,
Dropout, etc. A collection of different
Optimization Algorithms filters that can be selected for
different tasks
Application of SDG and its variants

DL Frameworks Computing Capabilities


Google’ and Facebook GPU’s, TPU’s and Pods TPU reduced
competition to rule the DL world training times enormously.
has helped the environment to
be more mature . 13
5.
RNN Networks

Lets go into some applicable AI !

14
RECURRENT NEURAL NETWORKS
RNN’s

Networks with loops A copy of itself


With this type of networks we can A recurrent neural network can be
allow information to persist. thought of as multiple copies of the
same network, each passing a
message to a successor
15
THE PROBLEM OF LONG TERM
DEPENDENCIES

Gaps Theory vs Practice

The Gap between the relevant In theory RNN’s should be capable of


information and the point where is learning long term dependencies. However,
needed can be very large, and RNNs fail in practice it is not what Happens.
to learn to connect distant information. The problem was explored in depth
by Hochreiter and Bengio. 16
6.
LSTM Networks

Solving the Long Term Dependency problem!

17
18
LONG SHORT TERM MEMORY NETWORKS
LSTM’s

A conveyor belt . Introduction of Gates


The Cell State, the horizontal line running The LSTM does have the ability to remove or
through the top of the diagram, runs though add information to the cell state, carefully
the entire chain and controls the flow. regulated by structures called gates. 19
LSTM 101
FORGET GATE

The Forget Gate Importance of forgetting


The first step in our LSTM is to decide Lstm’s excel in knowing which
what information we’re going to throw information to save for the future , which
away from the cell state. won’t be relevant and when to forget it.
20
This gate has that function.
LSTM 101
INPUT LAYER GATE

The Input Layer Gate


This gate is used to decide which information
we are going to store in the cell state.
21
LSTM 101
OUTPUT LAYER GATE

The Output Layer Gate


Finally, we need to decide what we’re going
to output. This output will be based on our
cell state, but will be a filtered version. 22
7.
Autoencoders

Who can we learn data representations using


Deep Learning?

23
AUTOENCODERS

Dimensionality Reduction
An autoencoder neural network is an unsupervised learning algorithm that applies
backpropagation, setting the target values to be equal to the inputs.
By making the network learn how to reproduce the input it forces the Compressing
Layer (or Latent Variables) to learn a “compressed” representation of the input. 24
VARIATIONAL AUTOENCODERS

Gaussian distributed Latent Variables


VAEs inherit the architecture of traditional autoencoders and use this to learn a
data generating distribution, which allows us to take random samples from the
latent space. The variational autoencoder uses variational inference to generate its
approximation to this posterior distribution. In the variational autoencoder, p is
specified as a standard Normal distribution with mean zero and variance one. 25
VARIATIONAL AUTOENCODERS
KL DIVERGENCE

Kullback-Leiber Divergence

The KL Divergence is a measure on how “off” two probabilities distributions P(X) and Q(X)
q between two probability distributions.
are. In other words it measures the distance
Variational Autoencoders uses a Reversed KL Divergence to minimize the difference
between the true distribution P(z|x) and a Gaussian distribution 26
8.
Croissant Model

Let’ go deep into our “Croissant” Model!

A Bi-Directional LSTM Variational Autoencoder


27
CROISSANT MODEL

Reconstructed Sequence Data


Encoder Decoder
Sequence Input Data

Mean
Bi- Bi-
Random Average
Directional Average Directional
Sampling
LSTM LSTM

Sigma

Latent Variables

28
CROISSANT MODEL
Inference Time

Reconstructed Sequence Data


Encoder Decoder
Sequence Input Data

Mean
Bi- Bi-
Random Average
Directional Average Directional
Sampling
LSTM LSTM

Sigma

Latent Variables

Latent Variable Mean


We use this output as our Anomaly
Score! We apply K-mean to
generate a threshold by signal. 29
TensorFlow Code

from keras import backend as K


from keras.models import Model
from keras.layers import Input, RepeatVector
from keras.layers import CuDNNLSTM
from keras.layers import Bidirectional
from keras.layers import Concatenate Imports
Tensorflow and Keras
from keras.layers.core import Dense, Lambda
necessary imports
from keras import objectives
from keras.optimizers import RMSprop
from keras.utils import multi_gpu_model
import tensorflow as tf
30
TensorFlow Code

class CroissantModel:

def __init__(self,
input_dim,
timesteps, Parameters
batch_size, Models parameters
intermediate_dim,
latent_dim,
epsilon_std=1.,
gpus=1,
learning_rate=0.001):

31
TensorFlow Code
def generate_model(self):
with tf.device('/gpu:0'):
x = Input(shape=(self.timesteps, self.input_dim,),
name="Main_input_VAE")
# LSTM encoding
h1 = Bidirectional(
CuDNNLSTM(self.intermediate_dim, The Encoder
kernel_initializer='random_uniform', Coding the Bi-Directional
input_shape=(self.timesteps,self.input_dim,) LSTM Encoder
),
merge_mode='ave')(x)
# VAE Z layer
z_mean = Dense(self.latent_dim)(h1) The Latent Variables
z_log_sigma = Dense(self.latent_dim)(h1)
Z mean and Z log sigma
def sampling(args):
z_mean, z_log_sigma = args
epsilon = K.random_normal(
shape=(self.batch_size, self.latent_dim), Sampling
mean=0.,stddev=self.epsilon_std Sampling the generated
) distribution with
return z_mean + z_log_sigma * epsilon
parameters
z = Lambda(sampling,output_shape=(sef.latent_dim,) Z mean and Z log sigma
)([z_mean, z_log_sigma]) 32
TensorFlow Code

decoder_h = Bidirectional(
CuDNNLSTM(
self.intermediate_dim,
kernel_initializer='random_uniform', The Decoder
input_shape=(self.timesteps,self.latent_dim,), Coding the Bi-
return_sequences=True
Directional LSTM
),
merge_mode='ave' Decoder
)

33
TensorFlow Code

h_decoded = RepeatVector(self.timesteps)(z)
h_decoded = decoder_h(h_decoded)

# decoded layer
x_decoded_mean = decoder_mean(h_decoded)

# end-to-end autoencoder
vae = Model(x, x_decoded_mean)

# encoder, from inputs to latent space Model Generation


Generating the 3 output
encoder = Model(x, z_mean)

# generator, from latent space to reconstructed inputs models:


decoder_input = Input(shape=(self.latent_dim,)) The VAE, the Encoder and the
_h_decoded = RepeatVector(self.timesteps)(decoder_input) decoder
_h_decoded = decoder_h(_h_decoded)

_x_decoded_mean = decoder_mean(_h_decoded)
generator = Model(decoder_input, _x_decoded_mean)

34
TensorFlow Code

def vae_loss(x_loss, x_decoded_mean_loss):


"""
Loss function for the Variational AUto-Encoder
:param x_loss:
:param x_decoded_mean_loss:
:return:
"""
xent_loss = objectives.mse(x_loss, x_decoded_mean_loss)
kl_loss = - 0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma))
loss = xent_loss + kl_loss
return loss

opt_rmsprop = RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-4, decay=0)

Loss Function Optimization Algorithm


50% Mean Square loss and RMSProp with Decaying
50% KL divergence loss learning rate using Tensorflow
callbacks 35
TensorFlow Code

if self.gpus > 1:
try:
vae = multi_gpu_model(vae, gpus=self.gpus)
except:
print("Error in Multi GPU")

vae.compile(optimizer=opt_rmsprop, loss=vae_loss)

return vae, encoder, generator

Keras MultiGPU Model Compilation


MultiGPU activation if
possible Return 3 Models
36
TensorFlow Code
Training
croissant = Cm.CroissantModel(
input_dim=input_dim,
timesteps=timesteps,
batch_size=args.batch_size,
intermediate_dim=args.intermediate_dim,
latent_dim=args.latent_dim,
epsilon_std=1.,
gpus=gpus,
learning_rate=args.learning_rate
)
vae, enc, dec = croissant.generate_model()
history = vae.fit_generator(
generator=sequence_train,
validation_data=sequence_test,
use_multiprocessing=args.multi_proc,
max_queue_size=args.gen_queue_size,
workers=args.gen_workers,
epochs=args.epochs,
verbose=1,
callbacks=all_callbacks,
shuffle=args.shuffle_dataset
)
37
TensorFlow Code
Inference
croissant_gpu = Cm.CroissantModel(
input_dim=input_dim,
timesteps=args.window_size,
batch_size=args.batch_size,
intermediate_dim=args.intermediate_dim,
latent_dim=args.latent_dim,
gpus=len(args.n_gpus))
vae, enc, dec = croissant_gpu.generate_model()
vae.load_weights(ckp_filepath)
inference_model (vae,
enc,
interfaces_files,
str(folder_ingestion),
args.window_size,
args.window_shift,
args.batch_size,
columns_to_load,
args.max_files,
outputFolderName,
len(args.n_gpus),
export_files=args.export_files,
export_encoding=args.include_encoding) 38
9.
Results

Let’ show some of the signals

39
Results

40
Results

41
Results

42
Results

43
Results

44
Results

45

Deep-learning will transform every single industry.
Healthcare and transportation will be transformed by deep-
learning. I want to live in an AI-powered society.

Andrew Ng

46
LSTM VARIATIONAL
AUTOENCODERS FOR
NETWORK SIGNAL
ANOMALY DETECTION
FACUNDO CALCAGNO
@FMCALCAGNO

You might also like