Nothing Special   »   [go: up one dir, main page]

Neural Network: Throughout The Whole Network, Rather Than at Specific Locations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Neural Network

Neural network is defined as a mathematical model of human neurons’ interconnection.


Though the human neurons are very sophisticated, abstract mathematical modelling of neurons
can help to simulate the neurons’ learning. The brain consists of a densely interconnected set
of nerve cells or basic information processing units called neurons .Human brain consist of
nearly 10 billion neurons and 60 trillion connections synapses between them [9]. Combination
of these multiple neurons simultaneously can perform its functions much faster than the fasted
available computers. This is the main inspiration for the endeavor towards ANN.

Individual neuron has a very simple structure but an assembly of such elementary units
constitutes a tremendous processing power. A neuron consists of a cell body, soma, a number
of fibers called dendrites, and a single long fiber called the axon. While dendrites branch into
a network around the soma, the axon stretches out to the dendrites and somas of other neurons.
Human brain can be considered as a highly complex, nonlinear and parallel information-
processing system. Information is stored and processed in a neural network simultaneously
throughout the whole network, rather than at specific locations

Owing to the plasticity, connections between neurons leading to the ‘right answer’ are
strengthened while those leading to the ‘wrong answer’ weaken which leads neural networks
have the ability to learn through experience. Learning is a fundamental and essential
characteristic of biological neural networks.

An artificial neural network consists of a number of very simple and highly interconnected
processors, also called neurons, which are analogous to the biological neurons in the brain. The
neurons are connected by weighted links passing signals from one neuron to another. Each
neuron receives a number of input signals through its connections; however, it never produces
more than a single output signal. The output signal is transmitted through the neuron’s outgoing
connection (corresponding to the biological axon). The outgoing connection, in turn, splits into
a number of branches that transmit the same signal (the signal is not divided among these
branches in any way). The outgoing branches terminate at the incoming connections of other
neurons in the network. Figure 6.4 represents connections of a typical ANN, and Table 1 shows
the analogy between biological and artificial neural networks [10].

Table Error! No text of specified style in document.-1: Analogy between biological and
artificial neural network
Biological neural network Artificial neural network

Soma Axon Output Synapse Neuron

Dendrite Input

Axon Output

Synapse Weight

Figure Error! No text of specified style in document..1: Architecture of a Typical ANN and
Human Neuron

1.1 Mathematical Model

ANN is layered structure i.e. from input to output layer there may be various hidden layers. If
the learning is in its simple form like linear then the layers will be few. But it increases with
non-linearity. The neurons connected to the external environment of input and output layers.
The weights are updated to bring the network behavior to desired output. [11] Individual neuron
is an elementary information-processing unit means for computing its activation level given
the inputs and numerical weights. Neuron as a computing element receives several signals from
its input links computes a new activation level and sends it as an output signal through the
output links [12].
Figure Error! No text of specified style in document..2: Weight in Neuron
The neuron computes the weighted sum of the input signals and compares the result with a
threshold value Ө. If the net input is less than the threshold, the neuron output is -1. But if the
net input is greater than or equal to the threshold, the neuron becomes activated and its output
attains a value 1 [13]

0 𝑖𝑓 ∑ 𝑤𝑖 𝑥𝑖 ≤ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝑖
𝑌= 2.1
1 𝑖𝑓 ∑ 𝑤𝑖 𝑥𝑖 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
{ 𝑖

Where xi is the value of input i, wi is the weight of input i, n is the number of neuron inputs,
and Y is the output of the neuron. This type of activation function is called a sign function. Y
is the output of the neuron

Above equation (i) can be simplified as

0 𝑖𝑓 𝑤𝑥 + 𝑏 ≤ 0
𝑌={ 2.2
1 𝑖𝑓 𝑤𝑥 + 𝑏 > 0

Where wx ≡ ∑i wi xi where w and x are vector who components are the weights and inputs
respectively,b ≡ −Ө(threshold)

Making small adjustments in the weights to reduce the difference between the actual and
desired outputs ANN converses. The initial weights are randomly assigned then adjusted to
obtain the output consistent with the training examples reducing the error i.e. using gradient
descent technique. This sort of technique which lets us find weights and biases so that the
output from the network approximates y(x) for all training inputs x is desirable. To quantify a
function named cost function is defined
1
𝐶(𝑤, 𝑏) ≡ ∑‖𝑦(𝑥) − 𝑎𝐿 (𝑥)‖2 2.3
2𝑛
𝑥

Here, w denotes the collection of all weights in the network, b all the biases, n is the total
number of training inputs, a is the vector of outputs from the network when x is input, and the
sum is over all training inputs, x. C the quadratic cost function also known as the mean squared
error. The cost C \(w,b) becomes small, i.e.(w,b)≈0 , precisely when y(x) is approximately
equal to the output, a for all training inputs, x. So our training algorithm has done a good job
if it can find weights and biases so that C (w,b) if C(w,b) is large - that would mean that y(x)
is not close to the output a for a large number of inputs. So the aim of our training algorithm
will be to minimize the cost C(w,b) as a function of the weights and biases. In other words, to
find a set of weights and biases which make the cost as small as a gradient descent is
implemented which solves the minimization problems. The idea is to use gradient descent to
find the weights wk and biases bl which minimize the cost in equation. Gradient descent is the
optimal strategy for searching for a minimum. Gradient descent update with the weights and
biases replacing components wk and bl, weight and biases are updated by corresponding
components ∂C/∂wk ∂C/∂wk and ∂C/∂bl i.e.

∂C
wk → wk′ = wk − η 2.4
∂wk

∂C
bl → b′l = bl − η 2.5
∂bl

Where η is a small, positive parameter (known as the learning rate).

1.2 Feedforward Neural Network


As its name suggests, in feedforward neural network, input signals are propagated in a forward
direction on a layer-by-layer basis. Each layer in a multilayer neural network has its own
specific function. The input layer accepts input signals from the outside world and redistributes
these signals to all neurons in the hidden layer. Actually, the input layer rarely includes
computing neurons, and thus does not process input patterns. More the non-linearity in learning
the more will be hidden layers and their update activities of adjustments of weights. The output
layer accepts output signals, or in other words a stimulus pattern, from the hidden layer and
establishes the output pattern of the entire network. Neurons in the hidden layer detect the
features; the weights of the neurons represent the features hidden in the input patterns. These
features are then used by the output layer in determining the output pattern. With one hidden
layer, we can represent any continuous function of the input signals, and with two hidden layers
even discontinuous functions can be represented.

Figure Error! No text of specified style in document..3: Feedforward Neural Network


There is no obvious way to know what the desired output of the hidden layer should be. In
other words, the desired output of the hidden layer is determined by the layer itself. Learning
in a multilayer network proceeds the same way as for a perceptron. A training set of input
patterns is presented to the network. The network computes its output pattern, and if there is
an error – or in other words a difference between actual and desired output patterns – the
weights are adjusted to reduce this error. In a perceptron, there is only one weight for each
input and only one output. But in the multilayer network, there are many weights, each of which
contributes to more than one output.

1.3 Back-propagation neural network


For back-propagation technique is two phased procedure. Firstly, a training input pattern is
presented to the input layer. The network then propagates the input pattern from layer to layer
until the output pattern is generated by the output layer. If this pattern is deviated from the
desired output, an error is calculated and then propagated backwards through the network from
the output layer to the input layer. The weights are adjusted as the error is propagated back. As
with any other neural network, a back-propagation one is determined by the connections
between neurons (the network’s architecture), the activation function used by the neurons, and
the learning algorithm (or the learning law) that specifies the procedure for adjusting weights.
Typically, a back-propagation network is in each layer is connected to every other neuron in
the adjacent forward layer. A neuron determines its output in a manner similar to single neuron
perceptron as in equation (ii) the number of inputs, and threshold Ө is applied to the neuron.
Next, this input value is passed through the activation function. However, unlike a perceptron,
neurons in the back-propagation network use a sigmoid activation function:

1
𝜎(𝑧) ≡ 2.7
1 + 𝑒 −𝑧

With the inputs x1,x2 … and weights w1,w2 and bias equation can further simplified to learn
in backpropagation.

1
𝜎(𝑧) ≡ 2.8
1 + 𝑒𝑥𝑝(− ∑𝑖 𝑤𝑖 𝑥𝑖 − 𝑏)

The indices i, j and k here refer to neurons in the input, hidden and output layers, respectively.

Figure Error! No text of specified style in document..4: Back Propagation Neural Network
Input signals, x1 , x2 , …….. xn, are propagated through the network from left to right, and error
signals, C1,C2, ... Cn, from right to left. The symbol wij denotes the weight for the connection
between neuron i in the input layer and neuron j in the hidden layer, and the symbol w jk the
weight between neuron j in the hidden layer and neuron k in the output layer. If z ≡ w ⋅ x +
b is a large positive number. Then e−z ≈ 0 and so σ(z) ≈ 1σ. In other words, when z = w ⋅
x + b is large and positive, the output from the sigmoid neuron is approximately 1, If z = w ⋅
x + b is very negative. Thene−z →, and σ(z) ≈ 0. So when z = w ⋅ x + b is very negative, the
behavior of a sigmoid neuron also closely approximates a perceptron.

The activation function of the neuron in each layer is represented in terms of bias and weight
l
where blj foe for the bias of the jth neuron in the lth layer, wjk to denote the weight for the
connection from the kth neuron in the (l−1)th layer to the jth neuron in the lth layer and activation
alj of the jth neuron in the lthlayer is related to the activations in the (l−1)th layer .finally equation
can be written as below
𝑙
𝑎 = 𝜎 (∑ 𝑤𝑗𝑘 𝑎𝑘𝑙−1 + 𝑏𝑗𝑙 ) 2 .9
𝑘

The goal of backpropagation is to compute the partial derivatives ∂C/∂w and ∂C/∂b of the cost
function C with respect to any weight w or bias b in the network.

1.4 Activation function

The term Activation function is biologically inspired, in which brain neurons get signals from
other neurons, and decided whether or not to fire by taking the cumulative input into
account.An activation function is used by a unit in a neural network to decide what the
activation value of the unit should be based on a set of input values. Neural networks have to
implement complex mapping functions hence they need activation functions that are non-linear
in order to bring in the much needed non-linearity property that enables them to approximate
any function. The activation value of many such units can then be used to make a decision
based on the classification or predict value of input [14].

A sigmoid function, for example, produces a curve with an elongated “S” shape. It takes a real-
valued input and range it between 0 and 1. It is a special case of the logistic function,
differentiable and has a positive derivative at each point.

1
𝑓(𝑧) = 2.10
1 + exp(−𝑧)

Figure Error! No text of specified style in document..5:Output of Sigmoid Activation


Function [15]
Learning in Multilayer neural network much faster when the sigmoidal activation function is
represented by a hyperbolic tangent. The tanh(z) function is a rescaled version of the sigmoid,
and its output range is [ − 1,1] instead of [0,1].

ez − e−z
f(z) = tanh(z) = 2.11
ez + e−z

Figure Error! No text of specified style in document..6: Figure of Hyperbolic tangent


Activation Function [16]

You might also like