Lecture08 NN1
Lecture08 NN1
Lecture08 NN1
Networks
Part I
Lecture 8
Reference(s): An Introduction to Neural Networks for Beginners, Dr. Andy
Thomas (Adventures in Machine Learning).
Artificial Intelligence: A Guide to Intelligence Systems, Michael Negnevitsky,
3rd Edition, 2011, Addison Wesley, ISBN 978-1408225745
+1 if X b
where output,Y = 1 if X b
+1 if X b
Y=
1 if X b
• This type of activation function is called a sign activation function
• Thus the actual output of the neuron with a sign activation function
can be represented as: n n
Y = sign[X]
;Where X = ( xi wi ) b
i 1
( xi wi ) b
or
i 1
(2)
2 x
1 e
f ( x) tanh( x) 2 x
1 e
• It looks very similar to sigmoid function; in fact, tanh function is a
scaled sigmoid function.
• As sigmoid function, this is also a nonlinear function, defined in
the range of values (-1, 1).
• The gradient (or slope) is stronger for tanh than sigmoid (the
derivatives of its exponential components are more steep).
Asst. Prof. Dr. Anilkumar K.G 18
Activation Function: tanh
• Deciding between sigmoid and tanh will depend on gradient
strength requirement of an application. Like the sigmoid, tanh also
has the missing slope problem. Figure5 shows the tanh activation
function:
Figure 5
Asst. Prof. Dr. Anilkumar K.G 19
Activation Function: ReLU
• Rectified Linear Unit (ReLU) is the most used activation function
for applications based on CNN (Convolutional NN).
• It is a simple condition and has advantages over the other
functions.
• The function is defined by the following formula:
f(x) = 0 when (x < 0)
x when (x >= 0)
• f(x) = max (x, 0)
• The range of output is between 0 and infinity.
• ReLU finds applications in computer vision and speech
recognition.
Figure 6
Asst. Prof. Dr. Anilkumar K.G 21
Which activation functions to use?
• The sigmoid is the most used activation function, but it
suffers from the following setbacks:
– Since it uses logistic model, the computations are time
consuming and complex.
– It cause gradients to vanish and no signals pass through
the neurons at some point of time.
– It is slow in convergence.
– It is not zero centered.
Figure 11