Nothing Special   »   [go: up one dir, main page]

Neural Networks

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 61

Neural

Networks
• Idea behind artifi cial neural network is to mimic human brain

• Most studied /researched topic

• have many real life applications

• E.g. Google Translate, Google Assistant


Human Brain
• H u m a n b ra i n h a s a ve r y h i g h n u m b e r o f
neuron.

• All neurons are interconnected in a network.

• T h i s c o n n e c t i v i t y h e l p t h e m fl o w i n f o r m a t i o n
a n d d a t a , a n d p r o c e s s i t a n d g e n e ra t e s
output.

• T h e h u m a n b ra i n c a n l e a r n f r o m e x p e r i e n c e
a n d t ra i n d a t a i t s e l f.

• T h e c a p a c i t y o f l e a r n i n g a n d t ra i n i n g m a ke
h u m a n b ra i n a ve r y i m p o r t a n t o r g a n
Neural Network

Neuron
We want to build such
machines in artifi cial neural
network which works in
same manner as human
brain.
Artifi cial Neural Network

• inspired by the biological neurons within the human body which activate
under certain circumstances resulting in a related action performed by the
body in response.

• consist of various layers of interconnected artificial neurons powered by


activation functions that help in switching them ON/OFF.

• Like traditional machine learning algorithms, here too, there are certain
values that neural network learn in the training phase.
Artifi cial Neural Network

• is a computational learn ing system that uses a network of functions to understand


and translate a data in put of o ne form into a desired o utput, usually in another form.
Th e concept of the artificial neural netwo rk was inspired by human biology and the
way neuron s of the human b rain functio n together to understand inpu ts from human
senses.

• Neural n etworks are just one of many too ls an d approaches used in machine
learning algorithms. Th e neural network itself may b e u sed as a p iece in many
different machin e learning alg orithms to pro cess complex data inp uts into a sp ace
th at computers can understand .

• Neural n etworks are bein g applied to many real-life problems today, includin g
speech and image recogn ition, spam email filtering, finance, and medical diagnosis,
to name a few.
Components of Neural Network

Weights are numeric values that are multiplied by inputs. In backpropagation, they are
modified to reduce the loss. In simple words, weights are machine learned values from
Neural Networks. They self-adjust depending on the difference between predicted outputs
vs training inputs.
Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.

•Input layer represents dimensions of the input vector.

•Hidden layer represents the intermediary nodes that divide the input space into regions
with (soft) boundaries. It takes in a set of weighted input and produces output through an
activation function.

•Output layer represents the output of the neural network.


Single hidden layer Neural Network
Two hidden layer Neural Network
Importance of Neural Network

Wi t h o u t N e u r a l N e t w o r k : L e t ' s h a v e a l o o k a t t h e e x a m p l e g i v e n b e l o w. H e r e w e h a v e a
machine, such that we have trained it with four types of cats, as you can see in the image
b e l o w. A n d o n c e w e a r e d o n e w i t h t h e t r a i n i n g , w e w i l l p r o v i d e a r a n d o m i m a g e t o t h a t
particular machine that has a cat. Since this cat is not similar to the cats through which we have
trained our system and the background is also changed, so without the neural network, our
m a c h i n e w o u l d n o t i d e n t i f y t h e c a t i n t h e p i c t u r e . B a s i c a l l y, t h e m a c h i n e w i l l g e t c o n f u s e d i n
figuring out where the cat is.

Wi t h N e u r a l N e t w o r k : H o w e v e r, w h e n w e t a l k a b o u t t h e c a s e w i t h a n e u r a l n e t w o r k , e v e n i f w e
have not trained our machine with that particular cat. But still, it can identify certain features
of a cat that we have trained on, and it can match those features with the cat that is there in that
particular image and can also identify the cat. So, with the help of this example, you can
clearly see the importance of the concept of a neural network.
Working of Artifi cial Neural Network

• each neuron receives a multiplied version of inputs and random weights,


which is then added with a static bias value (unique to each neuron layer);
this is then passed to an appropriate activation function which decides the
final value to be given out of the neuron.

• There are various activation functions available as per the nature of input
values. Once the output is generated from the final neural net layer, loss
function (input vs output)is calculated, and backpropagation is performed
where the weights are adjusted to make the loss minimum. Finding
optimal values of weights is what the overall operation focuses around.
Working of Artifi cial Neural Network

Perceptron
Working of Artificial Neural Networks

In stead of directly getting into the working of Artificial Neural Networks, lets
b reak d own and try to understand Neural Network's basic unit, which is called
a Perceptro n .

So , a perceptron can be defined as a neural n etwork with a single layer that classifies
the lin ear d ata. It further constitutes four major components, which are as follows;

1.In p uts

2.Weig h ts and Bias

3.Su mmatio n Functions

4.Activ atio n or transformation function


The main logic behind the concept of Perceptron is as follows:

The inputs (x) are fed into the input layer, which undergoes multiplication
with the allotted weights (w) followed by experiencing addition in order to
form weighted sums. Then these inputs weighted sums with their
corresponding weights are executed on the pertinent activation function.
Weights and Bias

As and when the input variable is fed into the network, a random value is given as a
weight of that particular input, such that each individual weight represents the
importance of that input in order to make correct predictions of the result.

However, bias helps in the adjustment of the curve of activation function so as to


accomplish a precise output.

Summation Function

After the weights are assigned to the input, it then computes the product of each
input and weights. Then the weighted sum is calculated by the summation function in
which all of the products are added.
Activation Function

The main objective of the activation function is to perform a mapping of a


weighted sum upon the output. The transformation function comprises of
activation functions such as tanh, ReLU, sigmoid, etc.

The activation function is categorized into two main parts:

1.Linear Activation Function

2.Non-Linear Activation Function


Linear Activation Function
In the linear activation function, the output of
functions is not restricted in between any range. Its
r a n g e i s s p e c i f i e d f r o m - i n f i n i t y t o i n f i n i t y. F o r e a c h
individual neuron, the inputs get multiplied with the
weight of each respective neuron, which in turn leads
to the creation of output signal proportional to the
input. If all the input layers are linear in nature, then
the final activation of the last layer will actually be
the linear function of the initial layer's input.
Non- linear function
These are one of the most widely used activation function. It
helps the model in generalizing and adapting any sort of data
in order to perform correct differentiation among the output.
It solves the following problems faced by linear activation
functions:

•S i n c e t h e n o n - l i n e a r f u n c t i o n c o m e s u p w i t h d e r i v a t i v e
functions, so the problems related to backpropagation has
been successfully solved.

•F o r t h e c r e a t i o n o f d e e p n e u r a l n e t w o r k s , i t p e r m i t s t h e
stacking up of several layers of the neurons.
1. Sigmoid or Logistic
Activation Function

•Formula:

•Range: (0, 1)

•Common Use: Output layer of


binary classification problems.
2. Tanh or Hyperbolic
Tangent Activation Function

Th e t an h act i v at i o n f u n ct i o n wo r k s mu ch b et t er t h an t h at
o f t h e s i g mo i d f u n ct i o n , o r s i mp l y we can s ay i t i s an
ad v an ced v er s i o n o f t h e s i g mo i d act i v at i o n f u n ct i o n . S i n ce
i t h as a v al u e r an g e b et ween - 1 t o 1 , s o i t i s u t i l i zed b y t h e
h i d d en l ay er s i n t h e n eu r al n et wo r k , an d b ecau s e o f t h i s
r eas o n , i t h as mad e t h e p r o ces s o f l ear n i n g mu ch eas i er.
3. ReLU(Rectified Linear
Unit) Activation Function

ReLU is one of the most widely used activation function by


the hidden layer in the neural network. Its value ranges
f r o m 0 t o i n f i n i t y. I t c l e a r l y h e l p s i n s o l v i n g o u t t h e
problem of backpropagation. It tends out to be more
expensive than the sigmoid, as well as the tanh activation
function. It allows only a few neurons to get activated at a
p a r t i c u l a r i n s t a n c e t h a t l e a d s t o e ff e c t u a l a s w e l l a s e a s i e r
computations.
4. Softmax Function

It is one of a kind of sigmoid function whereby solving the problems of


classifications. It is mainly used to handle multiple classes for which it
squeezes the output of each class between 0 and 1, followed by dividing it
by the sum of outputs. This kind of function is specially used by the
classifier in the output layer.
Feed Forward /Forward
Propagation
“ T he p ro c ess o f re c eiv ing an input to produce some
kind of ou tpu t to ma ke some kind of prediction is
know n a s Fe e d Forw a rd ." Feed Forw ard neural
netw ork is th e c ore of many other important neural
netw ork s su c h a s co nv olution neural netw ork.

Each Input to an artificial neuron has a weight


associated with it. The inputs are first multiplied with
their respective weights and a bias is added to the
result. we can call this the weighted sum. The weighted
sum then goes through an activation function.

So, An artificial neuron can be thought of as a simple or


multiple linear regression model with an activation
function at the end.
Feed Forward Propagation

Calculating the input data multiplied by the networks weight plus


the bias and then going through the activation function
Input values

X1=0.05

X2=0.10

Initial weight

W1=0.15 w5=0.40

W2=0.20 w6=0.45

W3=0.25 w7=0.50

W4=0.30 w8=0.55

Bias Values

b1=0.35 b2=0.60

Target Values

T1=0.01
T2=0.99
Forward Pass

To fi nd the value of H1 we fi rst multiply the input value from


the weights as

H1=x1×w1+x2×w2+b1

H1=0.05×0.15+0.10×0.20+0.35

H1=0.3775

To calculate the fi nal result of H1, we performed the sigmoid


function as
We will calculate the value of H2 in the same way as H1

H2=x1×w 3 +x2×w 4 +b1


H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid


function as
Now, we calculate the values of y1 and y2 in the same way as we calculate the
H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1
and H2 from the weights as

y1=H1×w 5 +H2×w 6 +b2


y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as


We will calculate the value of y2 in the same way as y1

y2=H1×w 7 +H2×w 8 +b2


y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as


Now, we will find the total error, which is simply the difference
between the outputs from the target outputs. The total error is
calculated as

So, the total error is

Now, we will backpropagate this error to update the weights using a


backward pass.
Backpropogation

Back prop agation is one of the important concepts of a neural network. Our task is
to classify our data best. For this, we have to update the weights of parameter and
bias, but how can we do that in a deep neural network? In the linear regression
model, we use gradient descent to optimize the parameter. Similarly here we also use
gradient descent algorithm using Backpropagation.

F or a single training example, Back prop agation algorithm calculates the gradient of
the error fu nction . Backpropagation can be written as a function of the neural
network. Backpropagation algorithms are a set of methods used to efficiently train
artificial neural networks following a gradient descent approach which exploits the
chain rule.
The main features of Backpropagation are the iterative, recursive and
efficient method through which it calculates the updated weight to improve
the network until it is not able to perform the task for which it is being
trained. Derivatives of the activation function to be known at network design
time is required to Backpropagation.

Now, how error function is used in Backpropagation and how


Backpropagation works? Let start with an example and do it mathematically
to understand how exactly updates the weight using Backpropagation.
Cost /Loss /Error function

When training a neural network, the cost value J quantifies the network’s error, i.e.,
its output’s deviation from the ground truth. We calculate it as the average error over
all the objects in the training set, and our goal is to minimize it.

For example, let’s say we have a network that classifies animals either as cats or
dogs. It has two output neurons and , where the former represents the probability
that the animal is a cat and the latter that it’s a dog. Given an image of a cat, we
expect and .

However, if the network outputs and , we can quantify our error on that image as the
squared distance:
1
𝐽 = ∑ ( 𝑦𝑖 − ^
2
𝑦𝑖)
𝑛 𝑖
We use the cost to update the weights and biases so that the actual outputs
get as close as possible to the desired values. To decide whether to increase
or decrease a coefficient, we calculate its partial derivative using
backpropagation. Let’s explain it with an example.
When training a neural network, the cost value J quantifi es the
network’s error, i.e., its output’s deviation from the ground
truth. We calculate it as the average error over all the objects in
the training set, and our goal is to minimize it.

We update the weight according to following

Where is the learning rate.


EXAMPLE:
Let’s say we have only one neuron in the input, hidden, and output
layers:

To update the weights and biases, we need to


see how J reacts to small changes in those
parameters. We can do that by computing the
partial derivatives of J with respect to them.
But before that, let’s recap how the variables
in our problem are related:
To update the weights and biases, we
need to see how J reacts to small
changes in those parameters. We can do
that by computing the partial derivatives
of J with respect to them. But before
that, let’s recap how the variables in our
problem are related:
So, if we want to see how changing aff ects the cost function, we
should compute the partial derivative by applying the chain rule
of Calculus:
Diff erent optimization algorithm are available for error
minimization:

• Gradient descent

• Adagrad

• Momentum

• Adam

• Ftrl

• RMSprop, etc
Types of Neural Network

• Multi-Layer Perceptrons (MLP)

• Convolutional Neural Networks (CNN)

• Recurrent Neural Networks (RNN)


Multilayer Perceptron (MLPs)

• A multilayer perceptron (MLP) is a class of a feedforward artificial neural


network (ANN).

• MLPs models are the most basic deep neural network, which is composed of a
series of fully connected layers.

• Today, MLP machine learning methods can be used to overcome the requirement
of high computing power required by modern deep learning architectures.

• Each new layer is a set of nonlinear functions of a weighted sum of all outputs
(fully connected) from the prior one.
Convolutional Neural Network (CNN)

• A convolutional neural network (CNN, or ConvNet) is another class of deep


neural networks.

• CNNs are most commonly employed in computer vision. Given a series of images
or videos from the real world, with the utilization of CNN, the AI system learns
to automatically extract the features of these inputs to complete a specific task,
e.g., image classification, face authentication, and image semantic segmentation.

• Different from fully connected layers in MLPs, in CNN models, one or multiple
convolution layers extract the simple features from input by executing
convolution operations. Each layer is a set of nonlinear functions of weighted
sums at different coordinates of spatially nearby subsets of outputs from the prior
layer, which allows the weights to be reused.
It is a specialized type of neural network that can learn spatial hierarchies of
features directly from pixel values by using filters that scan over the input
image and extract relevant features.

The key components of a CNN include convolutional layers, pooling layers,


and fully connected layers. The convolutional layer is the core building
block of the CNN and is responsible for learning the features of the input
image. It applies a set of filters to the input image, each of which extracts a
specific feature, such as edges, textures, or shapes.
The pooling layer is used to downsample the output of the convolutional
layer and reduce the dimensionality of the features. This helps to reduce
overfitting and improve the efficiency of the network. The fully connected
layer is the last layer of the network and is used to classify the input image
based on the learned features.

During training, the CNN uses backpropagation to adjust the weights and
biases of the network to optimize its performance. This process involves
propagating the error back through the network and updating the parameters
of the network to minimize the error.
CNNs have been used to achieve state-of-the-art performance in a variety of
image recognition and classification tasks, including object detection, face
recognition, and image segmentation. They are also used in natural language
processing for tasks such as text classification and sentiment analysis.

CNNs are a powerful tool for image analysis and recognition, and they have
shown great potential in many applications. Their ability to learn spatial
hierarchies of features directly from pixel values makes them well-suited for
a wide range of image-related tasks.
Convolutional Layers:

•Convolutional layers are the fundamental building blocks of CNNs.

•They apply convolution operations to the input data using learnable filters
or kernels.

•It applies a set of filters to the input image, each of which extracts a
specific feature, such as edges, textures, or shapes.

•The output of a convolutional layer is often referred to as feature maps.


Pooling Layers:

•Pooling layers (such as MaxPooling or AveragePooling) are used to reduce


the spatial dimensions of the feature maps.

•Pooling helps decrease the computational load and make the network more
robust to variations in the input data.

•The pooling layer is used to downsample the output of the convolutional


layer and reduce the dimensionality of the features.

•This helps to reduce overfitting and improve the efficiency of the network.
Fully Connected Layers:

•Fully connected layers are traditional neural network layers where each
neuron is connected to every neuron in the previous and subsequent layers.

•They are typically used in the later stages of the network to make
predictions based on the learned features.
• AlexNet. For image classification, as the first CNN neural network to win the ImageNet
Challenge in 2012, AlexNet consists of five convolution layers and three fully connected
layers. Thus, AlexNet requires 61 million weights and 724 million MACs (multiply-add
computation) to classify the image with a size of 227×227.

• VGG-16. To achieve higher accuracy, VGG-16 is trained to a deeper structure of 16 layers


consisting of 13 convolution layers and three fully connected layers, requiring 138 million
weights and 15.5G MACs to classify the image with a size of 224×224.
• G o o g le N e t. To im p r o v e a c c u ra c y w h ile r e d u c in g th e c o m p u ta tio n o f D N N in f e r e n c e ,
G o o g le N e t in tro d u c e s a n in c e p tio n m o d u le c o m p o se d o f d iffe r e n t siz e d filte rs . A s a
r e s u lt, G o o g le N e t a c h ie v e s a b e tte r a c c u r a c y p e r fo rm a n c e th a n V G G -1 6 w h ile o n ly
r e q u ir in g se v e n m illio n w e ig h ts a n d 1 . 4 3 G M A C s to p ro c e s s th e im a g e w ith th e s a m e
s iz e .

• R e s N e t. R e sN e t, th e sta te - o f- th e -a r t e ff o r t, u s e s th e “ sh o r tc u t” str u c tu re to re a c h a
h u m a n -le v e l a c c u r a c y w ith a to p -5 e r ro r r a te b e lo w 5 % . In a d d itio n , th e “ sh o r tc u t”
m o d u le is u se d to so lv e th e g r a d ie n t v a n ish in g p r o b le m d u r in g th e tr a in in g p ro c e ss ,
m a k in g it p o ssib le to tr a in a D N N m o d e l w ith a d e e p e r stru c tu r e . T h e p e r fo rm a n c e o f
p o p u la r C N N s a p p lie d f o r A I v isio n ta sk s g r a d u a lly in c r e a se d o v e r th e y e a rs,
s u rp a s s in g h u m a n v is io n (5 % e rr o r r a te in th e c h a rt b e lo w ) .
Recurrent Neural Network (RNN)

• A recurrent neural network (RNN) is another class of artificial neural


networks that use sequential data feeding. RNNs have been developed to
address the time-series problem of sequential input data.

• The input of RNN consists of the current input and the previous samples.
Therefore, the connections between nodes form a directed graph along a
temporal sequence. Furthermore, each neuron in an RNN owns an internal
memory that keeps the information of the computation from the previous
samples.
• RNN models are widely used in Natural Language Processing (NLP) due to the
superiority of processing the data with an input length that is not fixed. The task
of the AI here is to build a system that can comprehend natural language spoken
by humans, e.g., natural language modeling, word embedding, and machine
translation.

• In RNNs, each subsequent layer is a collection of nonlinear functions of


weighted sums of outputs and the previous state. Thus, the basic unit of RNN is
called “cell”, and each cell consists of layers and a series of cells that enables
the sequential processing of recurrent neural network models.

You might also like