Nothing Special   »   [go: up one dir, main page]

SCT - QB - Anwers - p1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Unit-1

1. What is soft computing? State its applications.

Ans: The soft computing was introduced by Lotfi Aliasker Zade (LAZ) in 1992 and it is defined as” a
collection of methodologies that aim to exploit the tolerance for imprecision and uncertainty to achieve
tractability, robustness, and low solution cost.” It follows three computing paradigm, These are called
fuzzy logic, neural computing and probabilistic reasoning. In simple terms, you can understand soft
computing - an emerging approach that gives the amazing ability of the human mind. It can map a
human mind and the human mind is a role model for soft computing.
Elements of soft computing
Fuzzy Logic (FL): Fuzzy logic is basically designed to achieve the best possible solution to complex
problems from all the available information and input data.
Neural Network (ANN): An artificial neural network (ANN) emulates a network of neurons that makes a
human brain (means a machine that can think like a human mind). Thereby the computer or a machine
can learn things so that they can take decisions like the human brain.
Genetic Algorithms (GA): Genetic algorithm is almost based on nature and take all inspirations from it.
There is no genetic algorithm that is based on search-based algorithms, which find its roots in natural
selection and the concept of genetics.
Characteristics of soft computing
• It does not require any mathematical modeling for solving any given problem
• It gives different solutions when we solve a problem of one input from time to time
• Uses some biologically inspired methodologies such as genetics, evolution, particles swarming,
the human nervous system, etc.
• Adaptive in nature.
Applications of soft computing
• It is widely used in gaming products like Poker and Checker.
• In kitchen appliances, such as Microwave and Rice cooker.
• In most used home appliances - Washing Machine, Heater, Refrigerator, and AC as well.
• Apart from all these usages, it is also used in Robotics work (Emotional per Robot form).
• Image processing and Data compression are also popular applications of soft computing.
• Used for handwriting recognition.

1
2. Give comparison between soft and hard computing.(Any 6)
Soft Computing Hard Computing
Soft computing involves a computing paradigm Hard computing can never work with real-life
that can be judged based on real-life events and events and the model for hard computing cannot
thus a computational model can be constructed. get the computational model constructed
properly.
Soft computing involves logic with reasoning Hard computing on other hand involves binary
and needs reason with probabilistic thinking and logic with some of the proper computational
solution as well. models and strategies.
Soft computing has the proper features of Hard computing involves the computation of
approximation and can tolerate ambiguous features that involves the extraction of specific
parameters if in case it arises while performing features and categorical features as well.
computation activity.
Soft computing makes the model stochastic for Hard computing on other hand makes the
performing the type of soft computation. computational process deterministic.
Soft computing adjusts well with noisy and Hard computing adjusts well with the exact type
ambiguous data. of data as it is a perfect model for exactness.
Soft computing emerges and evolves quite well Hard computing emerges and evolves with the
with its own set of programs. need of writing the programs on the console.
It mostly deals with the entire programming This type of computation involves the entire
paradigm majorly written with approximate programming paradigm to be written with exact
results. and precise results.
Soft computing tries to perform a multivalued Whereas Hard computing tries to perform
logic. computation using double-valued logic.
Soft computing involves randomness by taking Hard computing does not involve random values
into consideration random values for evaluation. rather it involves values that are fixed and
accurate.

2
3. Explain the structure of neural network with suitable model.
Ans: Generally, the working of a human brain by making the right connections is the idea behind ANNs.
That was limited to use of silicon and wires as living neurons and dendrites.
Here, neurons, part of human brain. That was composed of 86 billion nerve cells. Also, connected to
other thousands of cells by Axons. Although, there are various inputs from sensory organs. That was
accepted by dendrites.
As a result, it creates electric impulses. That is used to
travel through the Artificial neural network. Thus, to
handle the different issues, neuron send a message to
another neuron.

As a result, we can say that ANNs are


composed of multiple nodes. That imitate biological
neurons of the human brain. Although, we connect
these neurons by links. Also, they interact with each
other.
Although, nodes are used to take input data. Further,
perform simple operations on the data. As a result,
these operations are passed to other neurons. Also,
output at each node is called its activation or node value.

As each link is associated with weight. Also, they are capable of


learning. That takes place by altering weight values. Hence, the
following illustration shows a simple ANN –
Types of Artificial Neural Networks
Generally, there are two types of ANN. Such as FeedForward and
Feedback.
a. FeedForward ANN
In this network flow of information is unidirectional. A unit used to send information to another unit that
does not receive any information. Also, no feedback loops are present in this. Although, used in
recognition of a pattern. As they contain fixed inputs and outputs.
b. FeedBack ANN
In this particular Artificial Neural Network, it allows feedback loops. Also, used in content addressable
memories.

3
4. What is Fuzzy logic? List applications Of Fuzzy Logic Control System.
Ans: Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. This approach is
similar to how humans perform decision making. And it involves all intermediate possibilities between
YES and NO. The conventional logic block that a computer understands takes precise input and
produces a definite output as TRUE or FALSE, which is equivalent to a human being’s YES or NO. The
Fuzzy logic was invented by Lotfi Zadeh who observed that unlike computers, humans have a different
range of possibilities between YES and NO, such as:

The Fuzzy logic works on the levels of possibilities of input to achieve a definite output. Now, talking
about the implementation of this logic:
• It can be implemented in systems with different sizes and capabilities such as micro-controllers,
large networked or workstation-based systems.
• Also, it can be implemented in hardware, software or a combination of both.
Applications: (Any 6)
FLC systems find a wide range of applications in various industrial and commercial products and
systems. In several applications- related to nonlinear, time-varying, ill-defined systems and also complex
systems – FLC systems have proved to be very efficient in comparison with other conventional control
systems. The applications of FLC systems include:
• Traffic Control
• Steam Engine
• Aircraft Flight Control
• Missile Control
• Adaptive Control
• Liquid-Level Control
• Helicopter Model
• Automobile Speed Controller
• Braking System Controller
• Process Control (includes cement kiln control)
• Robotic Control
• Elevator (Automatic Lift) control;
• Automatic Running Control
• Cooling Plant Control
• Water Treatment
• Boiler Control;
• Nuclear Reactor Control;
• Power Systems Control;
• Air Conditioner Control (Temperature Controller)
• Biological Processes
• Knowledge-Based System
• Fault Detection Control Unit

4
5. What is neural network? List applications Of Neural Network
Ans: A neural network is a processing device, either an algorithm or an actual hardware, whose design
was inspired by the design and functioning of animal brains and components thereof. ▸ Also known as
Artificial neural networks or neural net. They are very flexible and powerful-they have the ability to
learn. ▸ For neural networks, there is no need to devise an algorithm to perform a specific task that is
there is no need to understand the internal mechanisms of that task. ▸ Parallel architecture, Fast
response computation times, best suited for real time systems
APPLICATION SCOPE OF NEURAL NETWORKS
1. Air traffic control
2. Appraisal and valuation of property
3. Fraud detection
4. Medical diagnosis
5. analysis by neural networks
6. Betting on horse races, stock markets, sporting events, etc. could be based on neural network predictions
7. Data mining, cleaning and validation could be achieved by determining which records suspiciously diverge
from the pattern of their peers.
8. Expert consultants could package their intuitive expertise into a neural network to automate their services.
9. Echo patterns from sonar, radar, seismic and magnetic instruments could be used to predict their targets.
10. Econometric modeling based on neural networks should be more realistic than older models based on
classical statistics.

5
6. What is Genetic Algorithm? List applications Of Genetic Algorithm.
Ans: Genetic Algorithm (GA) is a search-based optimization technique based on the principles of
Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult
problems which otherwise would take a lifetime to solve. It is frequently used to solve optimization
problems, in research, and in machine learning. The aim of optimization is to find that point or set of
points in the search space.
GAs were developed by John Holland and his students and colleagues at the University of Michigan,
most notably David E. Goldberg and has since been tried on various optimization problems with a high
degree of success.
In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then
undergo recombination and mutation (like in natural genetics), producing new children, and the process
is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value
(based on its objective function value) and the fitter individuals are given a higher chance to mate and
yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”. In
this way we keep “evolving” better individuals or solutions over generations, till we reach a stopping
criterion. Genetic Algorithms are sufficiently randomized in nature, but they perform much better than
random local search (in which we just try various random solutions, keeping track of the best so far), as
they exploit historical information as well.

Applications Of Genetic Algorithm(Any 5-6)


Optimization − Genetic Algorithms are most commonly used in optimization problems wherein we
have to maximize or minimize a given objective function value under a given set of constraints. The
approach to solve Optimization problems has been highlighted throughout the tutorial.
Economics − GAs are also used to characterize various economic models like the cobweb model, game
theory equilibrium resolution, asset pricing, etc.
Neural Networks − GAs are also used to train neural networks, particularly recurrent neural networks.
Parallelization − GAs also have very good parallel capabilities, and prove to be very effective means in
solving certain problems, and also provide a good area for research.
Image Processing − GAs are used for various digital image processing (DIP) tasks as well like dense
pixel matching.
Vehicle routing problems − With multiple soft time windows, multiple depots and a heterogeneous
fleet.
Scheduling applications − GAs are used to solve various scheduling problems as well, particularly the
time tabling problem.
Machine Learning − as already discussed, genetics based machine learning (GBML) is a niche area in
machine learning.
Robot Trajectory Generation − GAs have been used to plan the path which a robot arm takes by
moving from one point to another.
Parametric Design of Aircraft − GAs have been used to design aircrafts by varying the parameters and
evolving better solutions.
DNA Analysis − GAs have been used to determine the structure of DNA using spectrometric data about
the sample.
Multimodal Optimization − GAs are obviously very good approaches for multimodal optimization in
which we have to find multiple optimum solutions.
Traveling salesman problem and its applications − GAs have been used to solve the TSP, which is a
well-known combinatorial problem using novel crossover and packing strategies.

6
7. Explain Classification with help of example.
Ans: Classification is the process of learning a model that clears different predetermined
classes of data. It specifies the class to which data elements belong to and is best used when the output
has finite and discrete values. It predicts a class for an input variable as well.
Classification is a process of categorizing a given set of data into classes. It can be performed on both
structured or unstructured data. The process starts with predicting the class of given data points. The
classes are often referred to as target, label or categories. It is a two-step process, comprised of a learning
step and a classification step. In learning step, a classification model is constructed and classification
step the constructed model is used to prefigure the class labels for given data
Classification belongs to supervised learning, which means that we know the input data (labeled in this
case) and we know the possible output of the algorithm.
There are 2 types of Classification:
1. Binomial/Binary: responds to problems with categorical answers (such as "yes" and "no", for
example)
2. Multi-Class: respond to more open answers such as "great", "regular" and "insufficient"

Example: Classification is commonly used in the financial sector. In a banking application, the customer
who applies for a loan may be classified as a safe and risky according to his/her age and salary. This type
of activity is also called supervised learning. In the era of online transactions where the use of cash has
decreased markedly, it is necessary to determine whether movements made through cards are safe.
Entities can classify transactions as correct or fraudulent using historical data on customer behavior to
detect fraud very accurately

8. Explain Clustering with help of example.


Ans: Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset.
It can be defined as "A way of grouping the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that has less or no similarities with
another group."It does it by finding some similar patterns in the unlabelled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar patterns.
It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals
with the unlabeled dataset.After applying this clustering technique, each cluster or group is provided
with a cluster-ID. ML system can use this id to simplify the processing of large and complex datasets. It
is commonly used for statistical data analysis.

Example: Let's understand the clustering technique with the real-world example of Mall which contains
plenty of things: When we visit any shopping mall, we can observe that the things with similar usage are
grouped together. Such as the Grocery Section which contains all the foodie items such as bread, grains,
dairy products etc. Other section like clothing section where t-shirts are grouped in one section, and
trousers are at other sections, similarly, at accessory sections, Jewellery stores, shoe stores, etc., are
grouped in separate sections, so that we can easily find out the things. The clustering technique also
works in the same way.

7
9. What is a Bayesian Network? Explain with help of example.
Ans: Bayesian networks aim to model conditional dependence, and therefore causation, by representing
conditional dependence by edges in a directed graph. Through these relationships, one can efficiently
conduct inference on the random variables in the graph through the use of factors. Bayesian networks are
probabilistic, because these networks are built from a probability distribution, and also use probability
theory for prediction and anomaly detection It represents a set of variables and its conditional
probabilities with a Directed Acyclic Graph (DAG). They are primarily suited for considering an event
that has occurred and predicting the likelihood that any one of the several possible known causes is the
contributing factor.
Example: Let’s assume that we’re creating a Bayesian Network that will model the marks (m) of a
student on his examination. The marks will depend on:
• Exam Level (e)– This discrete variable denotes the difficulty of the exam and has two values (0
for easy and 1 for difficult)
• IQ Level (i) – This represents the Intelligence Quotient level of the student and is also discrete in
nature having two values (0 for low and 1 for high)
The marks will intern predict whether or not he/she will get admitted (a) to a university.
The IQ will also predict the aptitude score (s) of the student.
With this information, we can build a Bayesian Network that will model the performance of a student
on an exam. The Bayesian Network can be represented as a DAG where each node denotes a variable
that predicts the performance of the student.
Above I’ve represented this distribution through a DAG and a
Conditional Probability Table. We can now calculate the Joint
Probability Distribution of these 5 variables, i.e. the product of
conditional probabilities:

The DAG clearly shows how each variable (node) depends on its parent node, i.e., the marks of the
student depends on the exam level (parent node) and IQ level (parent node). Similarly, the aptitude
score depends on the IQ level (parent node) and finally, his admission into a university depends on his
marks (parent node). This relationship is represented by the edges of the DAG.

8
10. What is probabilistic reasoning? Explain.
Ans: Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine probability
theory with logic to handle the uncertainty. We use probability in probabilistic reasoning because it
provides a way to handle the uncertainty that is the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as
"It will rain today," "behavior of someone for some situations," "A match between two teams or two
players." These are probable sentences for which we can assume that it will happen but not sure about it,
so here we use probabilistic reasoning. In probabilistic reasoning, there are two ways to solve problems
with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
As Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure
of the likelihood that an event will occur. The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
2. P(A) = 0, indicates total uncertainty in an event A.
3. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new
information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into
account. It is a combination of prior probability and new information.

Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.

9
11. Write a short note on Associative Neural Network
Ans: These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored patterns by
matching them with the given input pattern. These types of memories are also called Content-
Addressable Memory CAMCAM. Associative memory makes a parallel search with the stored patterns
as data files.
Following are the two types of associative memories we can observe −
• Auto Associative Memory
• Hetero Associative memory

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target vectors are
the same. The weights are determined so that the network stores a set of patterns.
Architecture
As shown in the following figure, the architecture of Auto Associative memory network has ‘n’ number
of input training vectors and similar ‘n’ number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1ton,j=1toni=1ton,j=1ton
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows −
xi=si(i=1ton)xi=si(i=1ton)
Step 4 − Activate each output unit as follows −
yj=sj(j=1ton)yj=sj(j=1ton)
Step 5 − Adjust the weights as follows −
wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to n
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
Step 5 − Apply the following activation function to calculate the output
yj=f(yinj)={+1−1ifyinj>0ifyinj⩽0yj=f(yinj)={+1ifyinj>0−1ifyinj⩽0

10
Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural network. However, in
this network the input training vector and the output target vectors are not the same. The weights are
determined so that the network stores a set of patterns. Hetero associative network is static in nature,
hence, there would be no non-linear and delay operations.
Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network
has ‘n’ number of input training vectors and ‘m’ number of output target vectors

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1ton,j=1tomi=1ton,j=1tom
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows −
xi=si(i=1ton)xi=si(i=1ton)
Step 4 − Activate each output unit as follows −
yj=sj(j=1tom)yj=sj(j=1tom)
Step 5 − Adjust the weights as follows −
wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
Step 5 − Apply the following activation function to calculate the output
yj=f(yinj)=⎧𝗅⎨⎪⎪+10−1ifyinj>0ifyinj=0ifyinj<0

11
12. Write a short note on Adaptive Resonance Theory
Ans: This network was developed by Stephen Grossberg and Gail Carpenter in 1987. It is based on
competition and uses unsupervised learning model. Adaptive Resonance Theory ART networks, as the
name suggests, is always open to new learning adaptive without losing the old patterns resonance.
Basically, ART network is a vector classifier which accepts an input vector and classifies it into one of
the categories depending upon which of the stored pattern it resembles the most.
Operating Principal
The main operation of ART classification can be divided into the following phases −
• Recognition phase − The input vector is compared with the classification presented at every
node in the output layer. The output of the neuron becomes “1” if it best matches with the
classification applied, otherwise it becomes “0”.
• Comparison phase − In this phase, a comparison of the input vector to the comparison layer
vector is done. The condition for reset is that the degree of similarity would be less than
vigilance parameter.
• Search phase − In this phase, the network will search for reset as well as the match done in the
above phases. Hence, if there would be no reset and the match is quite good, then the
classification is over. Otherwise, the process would be repeated and the other stored pattern
must be sent to find the correct match.
ART1: It is a type of ART, which is designed to cluster binary vectors. We can understand about this
with the architecture of it.
Architecture of ART1: It consists of the following two units −
Computational Unit − It is made up of the following −
• Input unit (F1 layer) − It further has the following two portions −
o F1aa layer InputportionInputportion − In ART1, there would be no
processing in this portion rather than having the input vectors only. It is
connected to F1bb layer interfaceportioninterfaceportion.
o F1bb layer InterfaceportionInterfaceportion − This portion combines the
signal from the input portion with that of F2 layer. F1bb layer is connected
to F2 layer through bottom up weights bij and F2 layer is connected to
F1bb layer through top down weights tji.
• Cluster Unit (F2 layer) − This is a competitive layer. The unit having the largest net
input is selected to learn the input pattern. The activation of all other cluster unit are set
to 0.
• Reset Mechanism − The work of this mechanism is based upon the similarity between
the top-down weight and the input vector. Now, if the degree of this similarity is less than
the vigilance parameter, then the cluster is not allowed to learn the pattern and a rest
would happen.
Supplement Unit − Actually the issue with Reset mechanism is that the layer F2 must have to be
inhibited under certain conditions and must also be available when some learning happens. That is why
two supplemental units namely, G1 and G2 is added along with reset unit, R. They are called gain
control units. These units receive and send signals to the other units present in the network. ‘+’ indicates
an excitatory signal, while ‘−’ indicates an inhibitory signal.

12
Algorithm
Step 1 − Initialize the learning rate, the vigilance parameter, and the weights as follows −
α>1and0<ρ≤1α>1and0<ρ≤1
0<bij(0)<αα−1+nandtij(0)=10<bij(0)<αα−1+nandtij(0)=1
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training input.
Step 4 − Set activations of all F1aa and F1 units as follows
F2 = 0 and F1aa = input vectors
Step 5 − Input signal from F1aa to F1bb layer must be sent like
si=xisi=xi
Step 6 − For every inhibited F2 node
yj=∑ibijxiyj=∑ibijxi the condition is yj ≠ -1
Step 7 − Perform step 8-10, when the reset is true.
Step 8 − Find J for yJ ≥ yj for all nodes j
Step 9 − Again calculate the activation on F1bb as follows
xi=sitJixi=sitJi
Step 10 − Now, after calculating the norm of vector x and vector s, we need to check the reset
condition as follows −
If ||x||/ ||s|| < vigilance parameter ρ,⁡then⁡inhibit ⁡node J and go to step 7
Else If ||x||/ ||s|| ≥ vigilance parameter ρ, then proceed further.
Step 11 − Weight updating for node J can be done as follows −
bij(new)=αxiα−1+||x||bij(new)=αxiα−1+||x||
tij(new)=xitij(new)=xi
Step 12 − The stopping condition for algorithm must be checked and it may be as follows −

• Do not have any change in weight.


• Reset is not performed for units.
• Maximum number of epochs reached.

13
Unit-2
1. Explain the architecture of simple artificial neural network.
Ans: A neural network consists of three layers. The first layer is the input
layer. It contains the input neurons that send information to the hidden
layer. The hidden layer performs the computations on input data and
transfers the output to the output layer. It includes weight, activation
function, cost function.
The connection between neurons is known as weight, which is the
numerical values. The weight between neurons determines the learning
ability of the neural network. During the learning of artificial neural
networks, weight between the neuron changes.
Working of ANN
Firstly, the information is feed into the input layer. Which then transfers it to the hidden layers, and
interconnection between these two layers assign weights to each input randomly at the initial point.
Then bias is add to each input neuron and after this, the weight sum which is a combination of weights
and bias is pass through the activation function. Activation Function has the responsibility of which
node to fire for feature extraction and finally output is calculate. Therefore this whole process is
known as Forward Propagation. After getting the output model to compare it with the original output
and the error is known and finally, weights are updates in backward propagation to reduce the error
and this process continues for a certain number of epochs (iteration). Finally, model weights get
updates and prediction is done.
Applications of ANN:There are many applications of ANN. Some of them are :
Medical: We can use it in detecting cancer cells and analysing the MRI images to give detailed
results.
Forecast: We can use it in every field of business decisions like in finance and the stock market, in
economic and monetary policy.
Image Processing: We can use satellite imagery processing for agricultural and defense use.
2. Give comparison between BNN and ANN.
Characteristics ANN BNN
In Short Artificial Neural Network. Biological Neural Network.
Speed Faster in processing information. Slower in processing information. The
Response time is in nanoseconds. response time is in milliseconds.
Processing Serial/ parallel processing. Massively parallel processing.
Size,Complexity Less size & complexity. It does not Highly complex and dense network of
perform complex pattern recognition interconnected neurons containing
tasks. neurons of the order of 1011 with 1015
of interconnections.
Storage Allocation for Storage to a new Allocation for storage to a new process
process is strictly irreplaceable as the is easy as it is added just by adjusting
old location is saved for the previous the interconnection strengths.
process.
Fault Torance Fault intolerant. Information once Information storage is adaptable means
corrupted cannot be new information is added by adjusting
retrieved in case of failure of the the interconnection strengths without
system. destroying old information
Control There is a control unit for controlling No specific control mechanism external
Mechanism computing activities to the computing task.

14
3. List and explain five basic types of neuron connection architectures.
Ans: Interconnection can be defined as the way processing elements (Neuron) in ANN are
connected to each other. Hence, the arrangements of these processing elements and geometry of
interconnections are very essential in ANN.
There exist five basic types of neuron connection architecture :
1. Single-layer feed-forward network
2. Multilayer feed-forward network
3. Single node with its own feedback
4. Single-layer recurrent network
5. Multilayer recurrent network

1. Single-layer feed-forward network


In this type of network, we have only two layers input
layer and the output layer but the input layer does not
count because no computation is performed in this layer.
The output layer is formed when different weights are
applied to input nodes and the cumulative effect per node
is taken. After this, the neurons collectively give the
output layer to compute the output signals.

2. Multilayer feed-forward network


This layer also has a hidden layer that is internal to the
network and has no direct contact with the external
layer. The existence of one or more hidden layers
enables the network to be computationally stronger, a
feed-forward network because of information flow
through the input function, and the intermediate
computations used to determine the output Z. There are
no feedback connections in which outputs of the model
are fed back into itself.

3. Single node with its own feedback


When outputs can be directed back as inputs to the same
layer or preceding layer nodes, then it results in feedback
networks. Recurrent networks are feedback networks with
closed loops. The above figure shows a single recurrent
network having a single neuron with feedback to itself.

4. Single-layer recurrent network


The network is a single-layer network with a feedback connection
in which the processing element’s output can be directed back to
itself or to another processing element or both. A recurrent neural
network is a class of artificial neural networks where connections
between nodes form a directed graph along a sequence. This

15
allows it to exhibit dynamic temporal behavior for a time sequence. Unlike feedforward neural
networks, RNNs can use their internal state (memory) to process sequences of inputs.

5. Multilayer recurrent network


In this type of network, processing element output can be directed to the processing element in the
same layer and in the preceding layer forming a
multilayer recurrent network. They perform the
same task for every element of a sequence, with
the output being dependent on the previous
computations. Inputs are not needed at each time
step. The main feature of a Recurrent Neural
Network is its hidden state, which captures some
information about a sequence.

4. List and explain Learning or training algorithm.


Ans: At its most basic, machine learning uses programmed algorithms that receive and analyse input data to
predict output values within an acceptable range. As new data is fed to these algorithms, they learn and
optimise their operations to improve performance, developing ‘intelligence’ over time.
There are four types of machine learning algorithms: supervised, semi-supervised, unsupervised and
reinforcement.
Supervised learning
In supervised learning, the machine is taught by example. The operator provides the machine learning
algorithm with a known dataset that includes desired inputs and outputs, and the algorithm must find a
method to determine how to arrive at those inputs and outputs. While the operator knows the correct
answers to the problem, the algorithm identifies patterns in data, learns from observations and makes
predictions. The algorithm makes predictions and is corrected by the operator – and this process
continues until the algorithm achieves a high level of accuracy/performance.
Under the umbrella of supervised learning fall: Classification, Regression and Forecasting.
1. Classification: In classification tasks, the machine learning program must draw a conclusion
from observed values and determine to
what category new observations belong. For example, when filtering emails as ‘spam’ or ‘not
spam’, the program must look at existing observational data and filter the emails accordingly.
2. Regression: In regression tasks, the machine learning program must estimate – and understand
– the relationships among variables. Regression analysis focuses on one dependent variable
and a series of other changing variables – making it particularly useful for prediction
and forecasting.
3. Forecasting: Forecasting is the process of making predictions about the future based on the
past and present data, and is commonly used to analyse trends.
Semi-supervised learning
Semi-supervised learning is similar to supervised learning, but instead uses both labelled and
unlabelled data. Labelled data is essentially information that has meaningful tags so that the algorithm
can understand the data, whilst unlabelled data lacks that information. By using this
combination, machine learning algorithms can learn to label unlabelled data.

16
Unsupervised learning
Here, the machine learning algorithm studies data to identify patterns. There is no answer key or
human operator to provide instruction. Instead, the machine determines the correlations and
relationships by analysing available data. In an unsupervised learning process, the machine learning
algorithm is left to interpret large data sets and address that data accordingly. The algorithm tries to
organise that data in some way to describe its structure. This might mean grouping the data into
clusters or arranging it in a way that looks more organised.
As it assesses more data, its ability to make decisions on that data gradually improves and becomes
more refined.
Under the umbrella of unsupervised learning, fall:
1. Clustering: Clustering involves grouping sets of similar data (based on defined criteria).
It’s useful for segmenting data into several groups and performing analysis on each data
set to find patterns.
2. Dimension reduction: Dimension reduction reduces the number of variables being
considered to find the exact information required.
Reinforcement learning
Reinforcement learning focuses on regimented learning processes, where a machine learning
algorithm is provided with a set of actions, parameters and end values. By defining the rules, the
machine learning algorithm then tries to explore different options and possibilities, monitoring and
evaluating each result to determine which one is optimal. Reinforcement learning teaches the machine
trial and error. It learns from past experiences and begins to adapt its approach in response to the
situation to achieve the best possible result.
Algrithms
• Naïve Bayes Classifier Algorithm (Supervised Learning - Classification)
The Naïve Bayes classifier is based on Bayes’ theorem and classifies every value as
independent of any other value. It allows us to predict a class/category, based on a given set of
features, using probability.

Despite its simplicity, the classifier does surprisingly well and is often used due to the fact it
outperforms more sophisticated classification methods.
• K Means Clustering Algorithm (Unsupervised Learning - Clustering)
The K Means Clustering algorithm is a type of unsupervised learning, which is used to
categorise unlabelled data, i.e. data without defined categories or groups. The algorithm works
by finding groups within the data, with the number of groups represented by the variable K. It
then works iteratively to assign each data point to one of K groups based on the features
provided.
• Support Vector Machine Algorithm (Supervised Learning - Classification)
Support Vector Machine algorithms are supervised learning models that analyse data used for
classification and regression analysis. They essentially filter data into categories, which is
achieved by providing a set of training examples, each set marked as belonging to one or the
other of the two categories. The algorithm then works to build a model that assigns new values
to one category or the other.
• Linear Regression (Supervised Learning/Regression)
Linear regression is the most basic type of regression. Simple linear regression allows us to
understand the relationships between two continuous variables.

17
• Logistic Regression (Supervised learning – Classification)
Logistic regression focuses on estimating the probability of an event occurring based on the
previous data provided. It is used to cover a binary dependent variable, that is where only two
values, 0 and 1, represent outcomes.
• Artificial Neural Networks (Reinforcement Learning)
An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of
which connects to layers on either side. ANNs are inspired by biological systems, such as the
brain, and how they process information. ANNs are essentially a large number of
interconnected processing elements, working in unison to solve specific problems.

ANNs also learn by example and through experience, and they are extremely useful for
modelling non-linear relationships in high-dimensional data or where the relationship amongst
the input variables is difficult to understand.
• Decision Trees (Supervised Learning – Classification/Regression)
A decision tree is a flow-chart-like tree structure that uses a branching method to illustrate
every possible outcome of a decision. Each node within the tree represents a test on a specific
variable – and each branch is the outcome of that test.
• Random Forests (Supervised Learning – Classification/Regression)
Random forests or ‘random decision forests’ is an ensemble learning method, combining
multiple algorithms to generate better results for classification, regression and other tasks. Each
individual classifier is weak, but when combined with others, can produce excellent results.
The algorithm starts with a ‘decision tree’ (a tree-like graph or model of decisions) and an
input is entered at the top. It then travels down the tree, with data being segmented into smaller
and smaller sets, based on specific variables.
• Nearest Neighbours (Supervised Learning)
The K-Nearest-Neighbour algorithm estimates how likely a data point is to be a member of one
group or another. It essentially looks at the data points around a single data point to
determine what group it is actually in. For example, if one point is on a grid and the algorithm
is trying to determine what group that data point is in (Group A or Group B, for example) it
would look at the data points near it to see what group the majority of the points are in.

Clearly, there are a lot of things to consider when it comes to choosing the right machine
learning algorithms for your business’ analytics. However, you don’t need to be a data scientist
or expert statistician to use these models for your business. At SAS, our products and solutions
utilise a comprehensive selection of machine learning algorithms, helping you to develop a
process that can continuously deliver value from your data.

18
5. List and explain different activation functions in neural networks.
Ans: The activation function is applied over the net input to calculate the output of an ANN.
There are several activation functions. Let us discuss a few in this section:
1. Identity function: It is a linear function and can be defined as

The output here remains the same as input. The input layer uses the identity activation function.
2. Binary step function: This function can be defined as

where 0 represents the threshold value. This function is most widely used in single-layer nets to
convert the net input to an output that is a binary (1 or 0).
3. Bipolar step function: This function can be defined as

where θ represents the threshold value. This function is also used in single-layer nets to convert
the net input to an output that is bipolar (+1 or -1).

4. Sigmoidal functions: The sigmoidal functions are widely used in back-propagation


nets because of the relationship between the value of the functions at a point and the
value of the derivative at that point which reduces the computational burden during
training.
▸ Sigmoidal functions are of two types:
▸ Binary sigmoid function: It is also termed as logistic sigmoid function or unipolar sigmoid
function. It can be defined as

▸ where λ is the steepness parameter. The derivative of this function is


▸ Here the range of the sigmoid function is from 0 to 1

Bipolar sigmoid function: This function is defined as

▸ where λ is the steepness parameter and the sigmoid function range is


between -1 and +1. The derivative of this function can be

▸ The bipolar sigmoidal function is closely related to hyperbolic tangent

function, which is written as


The derivative of the hyperbolic tangent function is

▸ If the network uses a binary data, it is better to convert it to bipolar form and
use the bipolar sigmoidal activation function or hyperbolic tangent function.

19
▸ 5. Ramp function: The ramp function is defined as

6. Define following terms

a. Weights
Ans: Weight is the parameter within a neural network that transforms input data within the
network's hidden layers.

b. Bias
Ans: Bias allows you to shift the activation function by adding a constant (i.e. the given bias) to the
input. Bias in Neural Networks can be thought of as analogous to the role of a constant in a linear
function, whereby the line is effectively transposed by the constant value.

c. Threshold
Ans: Threshold is a set value based upon which the final
output of the network may be calculated. The threshold
value is used in the activation function. The activation
function using threshold can be defined as

d. Learning rate
Ans: The learning rate is denoted by “a”. It is used to control the amount of weight adjustment at
each step of training. The learning rate, ranging from 0 to 1, determines the rate of learning at each
time step

e. Momentum factor
Ans: Convergence is made faster if a momentum factor is added to the weight updation process.
This is generally done in the back propagation network. If momentum has to be used, the weights
from one or more previous training patterns must be saved. Momentum helps the net in reasonably
large weight adjustments until the corrections are in the same general direction for several patterns

f. Vigilance Parameter
Ans: The vigilance parameter is denoted by “ρ”. It is generally used in adaptive resonance theory
(ART) network.The vigilance parameter is used to control the degree of similarity required for
patterns to be assigned to the same cluster unit. The choice of vigilance parameter ranges
approximately from 0.7 to 1 to perform useful work in controlling the number of clusters

7. For the given network calculate the output y neuron for given inputs and weights.
[x1, x2, x3]=[0.3,0.5,0.6] [w1,w2,w3]=[0.2,0.1,-0.3]
Ans:
[x1,x2,x3]=[0.3,0.5,0.6]
[w1,w2,w3]=[0.2,0.1,−0.3]
The net input can be calculated as,
yin=x1w1+x2w2+x3w3
yin=0.3×0.2+0.5×0.1+0.6×(−0.3)
yin=0.06+0.05−0.18=−0.07

20
8. Calculate the net input for given network.
Ans: [x1,x2,b]=[0.2,0.6,0.45]
[w1,w2]=[0.3,0.7]
The net input can be calculated as,
yin=b+x1w1+x2w2
yin=0.45+0.2×0.3+0.6×0.7
yin=0.45+0.06+0.42=0.93

9. Obtain the output of the neuron Y for the network


given below using activation
functions as: i) Binary sigmoidal and ii) bipolar
sigmoidal.
[x1, x2, x3]=[0.8,0.6,0.4] [w1,w2,w3,b]=[0.1,0.3,-0.2,0.35]

Ans: The given network has three input neurons with bias
and one output neuron. These form a single layer network,

21
10. Explain McCulloh-Pitts Neuron.
Ans: The McCulloch-Pitts neuron was the earliest neural network discovered in 1943. It is
usually called as M-P neuron.
▸ The M-P neurons are connected by directed weighted paths.
▸ The activation of a M-P neuron is binary, that is, at any time step the neuron may fire or may
not fire.
▸ The weights associated with the communication links may be excitatory (weight is positive) or
inhibitory (weight is negative). All the excitatory connected weights entering into a particular
neuron will have same weights.
▸ There is a fixed threshold for each neuron, and if the net input to the neuron is greater than
the threshold then the neuron fires. Also, it should be noted that inhibitory input would prevent
the neuron from firing.
▸ The M-P neurons are most widely used in the case of logic functions

22
11. What is Linear Separability concept?
Ans: An ANN does not give an exact solution for a nonlinear problem. However, it provides an
approximate solution to nonlinear problems. Linear separability is the concept wherein the
separation of input space into regions is based on whether the network response is positive or
negative.
A decision line is drawn to separate positive and negative responses. The decision line may also
be called as the decision-making Line or decision-support Line or linear-separable line. The
necessity of the linear separability concept was felt to clarify classify the patterns based upon
their output responses.
Generally, the net input calculated to the output unit is given as -
yin=b+∑ni=1(xiwi)
The linear separability of the network is based on the decision-boundary line. If there exist weight
for which the training input vectors having a positive (correct) response, or lie on one side of the
decision boundary and all the other vectors having negative, −1, response lies on the other side
of the decision boundary then we can conclude the problem is "Linearly Separable".
Example: Consider, a single layer network as shown in the
figure.
The net input for the network shown in the figure is given
as-
yin=b+x1w1+x2w2yin=b+x1w1+x2w2
The separating line for which the boundary lies between
the values x1x1 and x2x2, so that the net gives a positive response on one side and negative
response on the other side, is given as,
b+x1w1+x2w2=0b+x1w1+x2w2=0
If weight w2w2 is not equal to 0 then we get
Thus, the requirement for the positive response of the net is
b+x1w1+x2w2>0

23
12. Explain Hebb Network. OR Explain Hebb training algorithm.
Ans: Hebb Network was stated by Donald Hebb in 1949. According to Hebb’s rule, the weights
are found to increase proportionately to the product of input and output. It means that in a Hebb
network if two neurons are interconnected then the weights associated with these neurons can
be increased by changes in the synaptic gap.This network is suitable for bipolar data. The Hebbian
learning rule is generally applied to logic gates.
The weights are updated as:
W (new) = w (old) + x*y
Training Algorithm For Hebbian Learning Rule
The training steps of the algorithm are as follows:
• Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and n is the total
number of input neurons.
• Let s be the output. The activation function for inputs is generally set as an identity
function.
• The activation function for output is also set to y= t.
• The weight adjustments and bias are adjusted to:

13. Design a Hebb net to implement logical AND function (use bipolar inputs and targets)
Ans: Let us implement logical AND function with bipolar inputs using Hebbian Learning
X1 and X2 are inputs, b is the bias taken as 1, the target value is the output of logical AND
operation over inputs.

#1) Initially, the weights are set to zero and bias is also set as zero.
W1=w2=b=0
#2) First input vector is taken as [x1 x2 b] = [1 1 1] and target value is 1.
The new weights will be:

#3) The above weights are the final new weights. When the second input is passed, these become
the initial weights.
#4) Take the second input = [1 -1 1]. The target is -1.

24
#5) Similarly, the other inputs and weights are calculated.

14. Explain perceptron learning rule with suitable example.


Ans: Perceptron Networks are single-layer feed-forward networks. These are also called Single
Perceptron Networks. The Perceptron consists of an input layer, a hidden layer, and output layer.
The input layer is connected to the hidden layer through weights which may be inhibitory or excitery
or zero (-1, +1 or 0). The activation function used is a binary step function for the input layer and the
hidden layer.
The output is
Y= f (y)
The activation function is:

The weight updation takes place between the hidden layer and the output layer to match the target
output. The error is calculated based on the actual output and the desired output.

If the output matches the target then no weight updation takes place. The weights are initially set to
0 or 1 and adjusted successively till an optimal solution is found.
The weights in the network can be set to any values initially. The Perceptron learning will converge to
weight vector that gives correct output for all input training pattern and this learning happens in a
finite number of steps.

25
Example Of Perceptron Learning Rule
Implementation of AND function using a Perceptron network for bipolar inputs and output.
The input pattern will be x1, x2 and bias b. Let the initial weights be 0 and bias be 0. The threshold is
set to zero and the learning rate is 1.

#1) X1=1 , X2= 1 and target output = 1


W1=w2=wb=0 and x1=x2=b=1, t=1
Net input= y =b + x1*w1+x2*w2 = 0+1*0 +1*0 =0
As threshold is zero therefore:

From here we get, output = 0. Now check if output (y) = target (t).
y = 0 but t= 1 which means that these are not same, hence weight updation takes place.

The new weights are 1, 1, and 1 after the first input vector is presented.
#2) X1= 1 X2= -1 , b= 1 and target = -1, W1=1 ,W2=2, Wb=1
Net input= y =b + x1*w1+x2*w2 = 1+1*1 + (-1)*1 =1
The net output for input= 1 will be 1 from:
Therefore again, target = -1 does not match with the actual output =1. Weight updates take place.

Now new weights are w1 = 0 w2 =2 and wb =0

26
15. Explain Adaptive Linear Neuron (Adaline)
Ans: Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit. It was
developed by Widrow and Hoff in 1960. Some important points about Adaline are as follows −
• It uses bipolar activation function.
• It uses delta rule for training to minimize the Mean-Squared Error (MSE) between the actual
output and the desired/target output.
• The weights and the bias are adjustable.

Architecture
The basic structure of Adaline is similar to
perceptron having an extra feedback loop with
the help of which the actual output is compared
with the desired/target output. After comparison
on the basis of training algorithm, the weights and
bias will be updated.

Training Algorithm
Step 1 − Initialize the following to start the training −
• Weights
• Bias
• Learning rate αα
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate
must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −
xi=si(i=1ton)xi=si(i=1ton)
Step 5 − Obtain the net input with the following relation −
yin=b+∑inxiwiyin=b+∑inxiwi
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output −
f(yin)={1−1ifyin⩾0ifyin<0f(yin)={1ifyin⩾0−1ifyin<0
Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,
wi(new)=wi(old)+α(t−yin)xiwi(new)=wi(old)+α(t−yin)xi
b(new)=b(old)+α(t−yin)b(new)=b(old)+α(t−yin)
Case 2 − if y = t then,
wi(new)=wi(old)wi(new)=wi(old)
b(new)=b(old)b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
(t−yin)(t−yin) is the computed error.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the
highest weight change occurred during training is smaller than the specified tolerance.

27
16. Explain Multiple Adaptive Linear Neurons
Ans: Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists of
many Adalines in parallel. It will have a single output unit. Some important points about Madaline are
as follows −
• It is just like a multilayer perceptron, where Adaline will act as a hidden unit between the
input and the Madaline layer.
• The weights and the bias between the input and Adaline layers, as in we see in the Adaline
architecture, are adjustable.
• The Adaline and Madaline layers have fixed weights and bias of 1.
• Training can be done with the help of Delta rule.

Architecture
The architecture of Madaline consists of “n” neurons of the
input layer, “m” neurons of the Adaline layer, and 1 neuron of
the Madaline layer. The Adaline layer can be considered as
the hidden layer as it is between the input layer and the
output layer, i.e. the Madaline layer.

Training Algorithm
By now we know that only the weights and bias between the input and the Adaline layer are to be
adjusted, and the weights and bias between the Adaline and the Madaline layer are fixed.
Step 1 − Initialize the following to start the training −
• Weights
• Bias
• Learning rate αα
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate
must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-7 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −
xi=si(i=1ton)xi=si(i=1ton)
Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with the following relation −
Qinj=bj+∑inxiwijj=1tomQinj=bj+∑inxiwijj=1tom
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output at the Adaline and the
Madaline layer −
f(x)={1−1ifx⩾0ifx<0f(x)={1ifx⩾0−1ifx<0
Output at the hidden (Adaline) unit
Qj=f(Qinj)Qj=f(Qinj)
Final output of the network
y=f(yin)y=f(yin)
i.e. yinj=b0+∑mj=1Qjvjyinj=b0+∑j=1mQjvj
Step 7 − Calculate the error and adjust the weights as follows −

28
Case 1 − if y ≠ t and t = 1 then,
wij(new)=wij(old)+α(1−Qinj)xiwij(new)=wij(old)+α(1−Qinj)xi
bj(new)=bj(old)+α(1−Qinj)bj(new)=bj(old)+α(1−Qinj)
In this case, the weights would be updated on Qj where the net input is close to 0 because t = 1.
Case 2 − if y ≠ t and t = -1 then,
wik(new)=wik(old)+α(−1−Qink)xiwik(new)=wik(old)+α(−1−Qink)xi
bk(new)=bk(old)+α(−1−Qink)bk(new)=bk(old)+α(−1−Qink)
In this case, the weights would be updated on Qk where the net input is positive because t = -1.
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Case 3 − if y = t then
There would be no change in weights.
Step 8 − Test for the stopping condition, which will happen when there is no change in weight or the
highest weight change occurred during training is smaller than the specified tolerance.

17. Explain Backpropagation Network


Ans: Back Propagation Neural (BPN) is a multilayer neural network consisting of the input layer, at
least one hidden layer and output layer. As its name suggests, back propagating will take place in this
network. The error which is calculated at the output layer, by comparing the target output and the
actual output, will be propagated back towards the input layer.

Architecture
As shown in the diagram, the architecture of BPN has
three interconnected layers having weights on them.
The hidden layer as well as the output layer also has
bias, whose weight is always 1, on them. As is clear
from the diagram, the working of BPN is in two phases.
One phase sends the signal from the input layer to the
output layer, and the other phase back propagates the
error from the output layer to the input layer.

Training Algorithm
For training, BPN will use binary sigmoid activation
function. The training of BPN will have the following three phases.
• Phase 1 − Feed Forward Phase
• Phase 2 − Back Propagation of error
• Phase 3 − Updating of weights
All these steps will be concluded in the algorithm as follows
Step 1 − Initialize the following to start the training −
• Weights
• Learning rate αα
For easy calculation and simplicity, take some small random values.
Step 2 − Continue step 3-11 when the stopping condition is not true.
Step 3 − Continue step 4-10 for every training pair.
Phase 1

29
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation −
Qinj=b0j+∑i=1nxivijj=1topQinj=b0j+∑i=1nxivijj=1top
Here b0j is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i unit of
the input layer.
Now calculate the net output by applying the following activation function
Qj=f(Qinj)Qj=f(Qinj)
Send these output signals of the hidden layer units to the output layer units.
Step 6 − Calculate the net input at the output layer unit using the following relation −
yink=b0k+∑j=1pQjwjkk=1tomyink=b0k+∑j=1pQjwjkk=1tom
Here b0k ⁡is the bias on output unit, wjk is the weight on k unit of the output layer coming from j unit
of the hidden layer.
Calculate the net output by applying the following activation function
yk=f(yink)yk=f(yink)
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target pattern received at
each output unit, as follows −
δk=(tk−yk)f′(yink)δk=(tk−yk)f′(yink)
On this basis, update the weight and bias as follows −
Δvjk=αδkQijΔvjk=αδkQij
Δb0k=αδkΔb0k=αδk
Then, send δkδk back to the hidden layer.
Step 8 − Now each hidden unit will be the sum of its delta inputs from the output units.
δinj=∑k=1mδkwjkδinj=∑k=1mδkwjk
Error term can be calculated as follows −
δj=δinjf′(Qinj)δj=δinjf′(Qinj)
On this basis, update the weight and bias as follows −
Δwij=αδjxiΔwij=αδjxi
Δb0j=αδjΔb0j=αδj
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows −
vjk(new)=vjk(old)+Δvjkvjk(new)=vjk(old)+Δvjk
b0k(new)=b0k(old)+Δb0kb0k(new)=b0k(old)+Δb0k
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows −
wij(new)=wij(old)+Δwijwij(new)=wij(old)+Δwij
b0j(new)=b0j(old)+Δb0jb0j(new)=b0j(old)+Δb0j
Step 11 − Check for the stopping condition, which may be either the number of epochs reached or
the target output matches the actual output.

30
18. List and explain Learning Factors of Back-Propagation Network.
Ans: The backpropagation algorithm was originally introduced in the 1970s, but its importance
wasn’t fully appreciated until a famous paper in 1986 by David Rumelhart, Geoffrey Hinton, and
Ronald Williams. The paper describes several neural networks where backpropagation works far
faster than earlier approaches to learning, making it possible to use neural nets to solve problems
which had previously been insoluble. Today, the backpropagation algorithm is the workhorse of
learning in neural networks.
Although Backpropagation is the widely used and most successful algorithm for the training of a
neural network of all time, there are several factors which affect the Error-Backpropagation training
algorithms. These factors are as follows.
1. Initial Weights
Weight initialization of the neural network to be trained contribute to the final solution. Initially
before training the network weights of the network assigned to small random uniform values. If all
weights start out with equal weight values then there is a major chance of bad solution if required
final weights are unequal. Similarly, if initial random uniform weights are not uniform (between 0 to
1) then there are chances of stuck in global minima and also the step towards global minima are very
small as our learning rate is small. Learning of network is very slow if initial weights fall far from
global minima of the error plot as an error of network is a function of weights of the network. To get
faster learning of network initial weights of neural network should fall nearer to global minima of the
error plot. Many empirical studies of the algorithm point out that continuing training beyond a
certain low error plateau results in the undesirable drift of weights and this causes error to increase
and mapping function implemented by network goes on decreasing. To address this issue training
should be restarted with newly initialized random uniform weights.
2. Cumulative weight adjustment vs Incremental Updating
The Error backpropagation learning technique is based on a single pattern error detection in which it
requires small adjustment of weights which follow each training pattern in a training data. This
technique of adjusting of weights at every step when pattern applied to a network is called as
Incremental Updating. The Error backpropagation or Gradient Descent technique also implements
the gradient descent minimization of the overall error function computed over a complete cycle of
patterns, provided that learning constant sufficiently small. This scheme is known as cumulative
weight adjustment and error for this technique is calculated using the following expression.

Although both these techniques can bring satisfactory solutions, attention should be paid to the fact
that training works best under random conditions. For incremental updating, patterns should be
chosen randomly of different classes from training set so that the network should not overfit for
same class patterns.
3. The steepness of the activation function 𝜆
Gradient descent learning algorithm uses the continuous type of activation function, the most
common used activation function is the sigmoid function (unipolar and bipolar). This sigmoid
function is characterised by a factor called steepness factor 𝜆. the derivative of activation function
serves as multiple factors in building error signal term of a term of the neuron. Both, choice and
shape of activation function affect the speed of network learning. The derivative of activation
function(unipolar sigmoid) is given by,

31
𝙛 ′ ❪ 𝙣𝙚𝙩 ❫ = 𝟮×𝜆×𝙚𝙭𝑝 ❪-𝙣𝙚𝙩 ❫/[𝟭+𝙚𝙭𝑝 ❪-𝙣𝙚𝙩 ❫]ˆ𝟮
The following figure shows the slope function of the
activation function and illustrates how the stiffness 𝜆 affect
the learning of the network.

derivative of activation function for different 𝜆 values


for the fixed learning constant all adjustments of weights are
in proportion to the steepness coefficient 𝜆. When we use
the large value of 𝜆then we get a similar result when we use
large learning constant 𝜂.
4. Learning Constant 𝜂.
The effectiveness and convergence of Error backpropagation are based on the value of learning
constant ƞ. The amount by which weights of network updated is directly proportional to the learning
factor ƞ and hence it plays the important role in error signal term of a neuron.
𝚫𝙒 = −𝜂*𝘾*𝙛 ′❪𝙣𝙚𝙩❫
where 𝘾 is error signal term of neuron and 𝙛 ′❪𝙣𝙚𝙩❫ is the derivative of activation function.
When we use a larger value of ƞ, our network takes wider steps to reach global minima of error plot.
Due to a larger value of ƞ, there is a chance of missing global minima if error plot yields shorter global
minima. Similarly if we use a smaller value of 𝜂, our network takes shorter steps to reach global
minima of error plot but in this case, there is a chance of stuck in local minima of error plot. To
overcome these situation weights of a network are newly initialized with small random values and
retrain the network.
5. Momentum method
Momentum method deals with the convergence of gradient descent learning algorithm. Its purpose
is to accelerate the convergence of learning. This method supplements the current weight
adjustments with the fraction of most recent weight adjustment. This is usually done with the
following formulae,
𝚫𝙒 ❪𝙩 ❫ = −𝜂×𝛁𝐸 ❪𝙩 ❫ + 𝑎*𝚫𝙒 ❪𝙩−𝟏❫
where t and t-1 represent a current and previous step and 𝑎 is user selective momentum constant
(should be positive). The second term on the right side of the equation represents the momentum
term. For total N steps using the momentum method, the weight change can be expressed as
𝚫𝙒 ❪𝙩 ❫ = −𝜂×∑ 𝑎ˆ𝙣*𝛁𝐸 ❪𝙩−𝟏❫
Typically 𝑎 is chosen between 0.1 to 0.8. By adding the momentum term the weight adjustment is
enhanced by the fraction of adjustment of weights at the previous step. This approach leads to the
global minima of the error curve with a much greater step.

32
19. Write a short note on Radial Basis Function and Time Delay Network.
Ans: The radial basis function (RBF) is a classification and functional approximation neural network
developed by M.J.D. Powell. The network uses the most common nonlinearities such as sigmoidal
and Gaussian kernel functions. The Gaussian functions are also used in regularization networks. The
response of such a function is positive for all values of y; the response decreases to 0 as ly | 0. The
Gaussian function is generally defined as

The graphical representation of this Gaussian function is shown


When the Gaussian potential functions are being used, each
node is found to produce an identical output for inputs existing
within the fixed radial distance from the center of the kernel,
they are found to be radically symmetric, and hence the name
radial basis function network. The entire network forms a linear
combination of the nonlinear basis function.

Time Delay Network


The neural network has to respond to a sequence of patterns. Here the network is required to
produce a particular output sequence in response to a particular sequence of inputs. A shift register
can be considered as a tapped delay line. Consider a case of a multilayer perceptron where the
tapped outputs of the delay line are applied to its inputs. This constitutes a time delay neural
network (TDNN). The output consists of a finite temporal dependence on its inputs, given type of
network as where F is any nonlinearity fun
The multilayer perceptron with delay line is shown in Figure 3-14 When the function U(t) is a
weighted sum, then the TDNN is equivalent to a finite impulse response filter (PIR).In TDNN, when
the output is being fed back through a unit delay into the input layer, then the net computed here is
equivalent to an infinite impulse response (IIR) filter. Figure 3-15 shows TDNN with output feedback.

Thus, a neuron with a tapped delay line is called a TDNN unit, and a network which consists of TDNN
units is called a TDNN. A specific application of TDNNs is speech recognition. The TDNN can be
trained using the back-propagation learning rule with a momentum factor.

33
20. Write a short note on Functional Link Networks and Tree Neural Network.
Ans: The Functional Link Network are specifically designed for handling linearly non-separable
problems using appropriate input representation. Thus, suitable enhanced representation of the
input data has to be found out. This can be achieved by increasing the dimensions of the input space.
The input data which is expanded is used for training instead of the actual input data. In this case,
higher order input terms are chosen so that they are linearly independent of the original pattern
components. Thus, the input representation has been enhanced and
linear separability can be achieved in the extended space.
One of the functional link model networks is shown in Figure 3-16. This
model is helpful for learning continuous functions. For this model, the
higher-order input terms are obtained using the orthogonal basis
functions such as sinπx, cos π x, sin 2 π x, cos 2 π x, etc The most
common example of linear nonseparability is XOR problem. The
functional link networks help in solving this problem. Thus, it can be
easily seen that the functional link network in Figure 3-17 is used for solving this problem. The
functional link network consists of only one layer, therefore, it can be trained using delta learning
rule instead of the generalized the BPN delta

The tree neural networks (TNNs) are used for the pattern
recognition problem. The main concept of this network is
to use a small multilayer neural network at each decision-
making node of a binary classification tree for extracting
the non linear features. TNNs completely extract the power
of tree classifiers for using appropriate local features at the
different levels and nodes of the tree. A binary classification
tree is shown in Figure 3-18.
The decision nodes are present as circular nodes and the terminal nodes are present as square
nodes. The terminal node has class label denoted by ĉ associated with it. ▸ The rule base is formed
in the decision node (splitting rule in the form of f(x) < θ. The rule determines whether the pattern
moves to the right or to the left. Here, f(x) indicates the associated feature of pattern and “θ” is the
threshold. The pattern will be given the class label of the terminal node on which it has landed.
The classification here is based on the fact that the appropriate features can be selected at different
nodes and levels in the tree.

34
The algorithm for a TNN consists of two phases :
1. Tree growing phase: In this phase, a large tree is grown by recursively finding the rules for
splitting until all the terminal nodes have pure or nearly pure class membership, else it cannot split
further.
2. Tree pruning phase: Here a smaller tree is being selected from the pruned subtree to avoid the
overfilling of data

The training of TNN involves two nested optimization problems. In the inner optimization problem,
the BPN algorithm can be used to train the network for a given pair of classes. On the other hand, in
outer optimization problem, a heuristic search method is used to find a good pair of classes. The
TNN when tested on a character recognition problem decreases the error rate and size of the tree
relative to that of the standard classification tree design methods. The TNN can be implemented for
waveform recognition problem. It obtains comparable error rates and the training here is faster than
the large BPN for the same application. Also, TNN provides a structured approach to neural network
classifier design problems

21. Explain training and testing algorithm for auto associative neural net.
Ans: Auto associative Neural networks are the types of neural networks whose input and output
vectors are identical. These are special kinds of neural networks that are used to simulate and
explore the associative process. Association in this architecture comes from the instruction of a set of
simple processing elements called units which are connected through weighted connections. n these
networks, we performed training to store the vector either bipolar or binary. A stored vector can be
retrieved from a distorted or noisy vector if the input is similar to it.

Algorithm
We will be using Hebb Rule in the algorithm for setting weights because input and output vectors are
perfectly correlated since the input and output both have the same number of output units and input
units.
Hebb Rule:
• when A and B are positively correlated, then increase the strength of the connection between
them.
• when A and B are negatively correlated, then decrease the strength of the connection
between them.
• In practice, we use following formula to set the weights:
• where, W = weighted matrix
• T= Learning Rate
• S(p) : p-distinct n-dimensional prototype patterns

35
Training Algorithm
1. Initialize all weights for i= 1,2,3 …n and j= 1,2,3 …n such that: wij=0.
2. For each vector to be stored repeat the following steps:
3. Set activation for each input unit i= 1 to n: xi = si.
4. Set activation for each output unit j= 1 to n: yj = sj.
5. Update the weights for i= 1,2,3 …n and j= 1,2,3 …n such that : wij (new) = wij (old) + xiyj
Testing / Inference Algorithm:
For testing whether the input is ‘known’ and ‘unknown’ to the model, we need to perform the
following steps:
1. Take the weights that were generated during the training phase using Hebb’s rule.
2. For each input vector perform the following steps:
3. Set activation in the input units equal to input vectors.

4. Set activation in output units for j= 1,2,3 …n:


5. Apply activation function for j= 1, 2, 3 … n:

AANN recognizes the input vector to be known if the output unit after activation generated same
pattern as one stored in it.

22. Explain training and testing algorithm for Hetroassociative memory network,
Ans: Similar to Auto Associative Memory network, this is also a single layer neural network. However,
in this network the input training vector and the output target vectors are not the same. The weights
are determined so that the network stores a set of patterns. Hetero associative network is static in
nature, hence, there would be no non-linear and delay operations
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − Initialize all the weights to zero as wij = 0 i=1ton,j=1tomi=1ton,j=1tom
Step 2 − Perform steps 3-4 for each input vector.
Step 3 − Activate each input unit as follows −
xi=si(i=1ton)xi=si(i=1ton)
Step 4 − Activate each output unit as follows −
yj=sj(j=1tom)yj=sj(j=1tom)
Step 5 − Adjust the weights as follows −
wij(new)=wij(old)+xiyjwij(new)=wij(old)+xiyj
Testing Algorithm
Step 1 − Set the weights obtained during training for Hebb’s rule.

36
Step 2 − Perform steps 3-5 for each input vector.
Step 3 − Set the activation of the input units equal to that of the input vector.
Step 4 − Calculate the net input to each output unit j = 1 to m;
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
Step 5 − Apply the following activation function to calculate the output
yj=f(yinj)=⎧𝗅⎨⎪⎪+10−1ifyinj>0ifyinj=0ifyinj<0

23. Explain bi-directional associative memory.


Ans: Bidirectional associative memory (BAM), first proposed by Bart Kosko in the year 1988. The
BAM network performs forward and backward associative searches for stored stimulus responses.
The BAM is a recurrent hetero associative pattern-marching nerwork that encodes binary or bipolar
patterns using Hebbian learning rule. It associates patterns, say from set A to patterns from set B and
vice versa is also performed. BAM neural nets can respond to input from either layers (input layer
and output layer).
Bidirectional Associative Memory Architecture
The architecture of BAM network consists of two layers of
neurons which are connected by directed weighted pare
interconnecrions. The network dynamics involve two layers of
interaction. The BAM network iterates by sending the signals
back and forth between the two layers until all the neurons
reach equilibrium. The weights associated with the network
are bidirectional. Thus, BAM can respond to the inputs in
either layer. Figure shows a BAM network consisting of n units
in X layer and m units in Y layer. The layers can be connected in both directions(bidirectional) with
the result the weight matrix sent from the X layer to the Y layer is W and the weight matrix for signals
sent from the Y layer to the X layer is WT. Thus, the Weight matrix is calculated in both directions.
Testing Algorithm for Discrete Bidirectional Associative Memory
Step 0: Initialize the weights to srore p vectors. Also initialize all the activations to zero.
Step 1: Perform Steps 2-6 for each testing input.
Step 2: Ser the activations of X layer to current input pauern, i.e., presenting the input pattern x to X
layer and similarly presenting the input pattern y to Y layer. Even though, it is bidirectional memory,
at one time step, signals can be sent from only one layer. So, either of the input patterns may be the
zero vector
Step 3: Perform Steps 4-6 when the acrivacions are not converged.
Step 4: Update the activations of units in Y layer. Calculate the net input,
yinj=∑i=1nxiwijyinj=∑i=1nxiwij
Applying ilie activations, we obtain
yj=f(yinj)yj=f(yinj)
Send this signal to the X layer.
Step 5: Updare the activations of unirs in X layer. Calculate the net input,
xini=∑j=1myjwijxini=∑j=1myjwij

37
Applying ilie activations, we obtain
xi=f(xini)xi=f(xini)
Send this signal to the Y layer.
Step 6: Test for convergence of the net. The convergence occurs if the activation vectors x and y
reach equilibrium. If this occurs then stop, Otherwise, continue.

24. Explain Hopfield networks.


Ans: Hopfield neural network was Proposed by John J. Hopfield in 1982. It is an auto-associative fully
interconnected single layer feedback network. It is a symmetrically weighted network(i.e., W ij = Wji).
The Hopfield network is commonly used for auto-association and optimization tasks.
The Hopfield network is of two types
1. Discrete Hopfield Network
2. Continuous Hopfield Network
Discrete Hopfield Network
When this is operated in discrete line fashion it is called as discrete Hopfield network
The network takes two-valued inputs: binary (0, 1) or bipolar (+1, -1); the use of bipolar inpurs makes
the analysis easier. The network has symmetrical weights with no self-connections, i.e.,
Wij = Wji;
Wij = 0 if i = j

Architecture of Discrete Hopfield Network


The Hopfield's model consists of processing elements with two outputs,
one inverting and the other non-inverting. The outputs from each
processing element are fed back to the input of other processing
elements but not to itself.

Training Algorithm of Discrete Hopfield Network


During training of discrete Hopfield network, weights will be updated. As we know that we can have
the binary input vectors as well as bipolar input vectors.
Let the input vectors be denoted by s(p), p = 1, ... , P. Then the weight matrix W to store a set of input
vectors, where
In case of input vectors being binary, the weight matrix W = {wij} is given by
wij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠jwij=∑p=1P[2si(p)−1][2sj(p)−1]fori≠j
When the input vectors are bipolar, the weight matrix W = {wij} can be defined as
wij=∑p=1P[si(p)][sj(p)]fori≠j

Testing Algorithm of Discrete Hopfield Net


Step 0: Initialize the weights to store patterns, i.e., weights obtained from training algorithm using
Hebb rule.
Step 1: When the activations of the net are not converged, then perform Steps 2-8.
Step 2: Perform Steps 3-7 for each input vector X.
Step 3: Make the initial activations of the net equal to the external input vector X:
yi=xifori=1tonyi=xifori=1ton

38
Step 4: Perform Steps 5-7 for each unit yi. (Here, the units are updated in random order.)
Step 5: Calculate the net input of the network:
yini=xi+∑jyjwjiyini=xi+∑jyjwji
Step 6: Apply the activations over the net input to calculate the output:
yi=⎧𝗅⎨1yi0ifyini>θiifyini=θiifyini<θiyi={1ifyini>θiyiifyini=θi0ifyini<θi
where θi is the threshold and is normally taken as zero.
Step 7: Now feed back the obtained output yi to all other units. Thus, the activation vectors are
updated.
Step 8: Finally, test the network for convergence.

UNIT 3

1. Explain Maxnet.
Ans: This is also a fixed weight network, which serves as a subnet for selecting the node having the
highest input. All the nodes are fully interconnected and there exists symmetrical weights in all these
weighted interconnections.
Architecture
It uses the mechanism which is an iterative process and each node
receives inhibitory inputs from all other nodes through connections.
The single node whose value is maximum would be active or winner
and the activations of all other nodes would be inactive. Max Net uses

identity activation function with


The task of this net is accomplished by the self-excitation weight of +1 and mutual inhibition
magnitude, which is set like [0 < ɛ < 1m1m] where “m” is the total number of the nodes.

Testing Algorithm:
Step0: Initial Weights and initial activation are set.The weight is set as [0 < ɛ < 1/m] where “m” is
the total number of the nodes.
Let,

Step1:Perform step 2-4, when stopping condition is false.

Step2:Update activation of each node. For j=1 to m,

Step3:Save the activations obtained for use in next ilteration for j=1 to m,
Step4:Finally test the stoping condition fpr convergence of the network.The following is stoping
condtion. If more than one node has a non zero activation, continue; else stop

39
2. Explain Mexican Hat.
Ans: In 1989, Kohonen developed the Mexican hat network which is a more generalized contrast
enhancement network compared to the earlier Maxnet .Mexican Hat neural network is a part of the
unsupervised learning technique. Thus it has Three types of links that has found in such network:
Each neuron is connected by excitatory links.(Positive Connection/weight) to a number
of “cooperative neighbors”, neurons in close proximity.
Each neuron is connected over inhibitory weights (Negative
Connection/weights) to a number of "competitive
neighbors”, neurons present farther away. There are several
other farther neurons to which the connections between the
neurons are not established. All of these connections are
within a particular layer of neural net, the neurons also
receive some other external signals. This interconnection
pattern is repeated for several other neurons in the layer
Algorithm

40
3. Construct a Maxnet with four neurons and inhibitory weight =0.2, given the initial
activations (input signals) as follows:
a1(0)=0.3; a2(0)=0.5; a3(0)=0.7; a4(0)=0.9;

41
4. Explain Hamming Network
Ans: In most of the neural networks using unsupervised learning, it is essential to compute the
distance and perform comparisons. This kind of network is Hamming network, where for every given
input vectors, it would be clustered into different groups. Following are some important features of
Hamming Networks −
• Lippmann started working on Hamming networks in 1987.
• It is a single layer network.
• The inputs can be either binary {0, 1} of bipolar {-1, 1}.
• The weights of the net are calculated by the exemplar vectors.
• It is a fixed weight network which means the weights would remain the same even during
training
Max Net
This is also a fixed weight network, which serves as a subnet for selecting the node having
the highest input. All the nodes are fully interconnected and there exists symmetrical weights
in all these weighted interconnections.
Architecture
It uses the mechanism which is an iterative process and each
node receives inhibitory inputs from all other nodes through
connections. The single node whose value is maximum would be
active or winner and the activations of all other nodes would be
inactive. Max Net uses identity activation function with
f(x)={x0ifx>0ifx≤0
The task of this net is accomplished by the self-excitation weight of +1 and mutual inhibition
magnitude, which is set like [0 < ɛ < 1/m] where “m” is the total number of the nodes.

Testing Algorithm
Step0: Initialize the weights for i=1 to n
and j= 1 to m,
Step1: Perform step 2-4 for each input vector x
Step2: Calculate the net input to each unit y , i.e
Step3:Iniatlize the activation of maxnet
Yj(0)=yinj, j=1 to m
Step4:here maxnet is found to iterate
The Hamming network is found to retrieve only the closest class index and not the entire
vector. ▸ Hence, the Hamming network is a classifier, rather than being an associative
memory. ▸ The Hamming network can be modified to be an associative memory by just
adding an extra layer over the Maxnet, such that the winner unit, y i (k+1), present in the
Maxnet may trigger a corresponding stored weight vector. Such an associative memory
network can be called a Hamming memory network.

42
5. Explain the mechanism for Kohonen self-organizing feature maps.
Ans:
Self Organizing Map (or Kohonen Map or SOM) is a type of
Artificial Neural Network which is also inspired by biological
models of neural systems from the 1970s. It follows an
unsupervised learning approach and trained its network
through a competitive learning algorithm. SOM is used for
clustering and mapping (or dimensionality reduction)
techniques to map multidimensional data onto lower-
dimensional which allows people to reduce complex problems
for easy interpretation. SOM has two layers, one is the Input
layer and the other one is the Output layer. The architecture of
the Self Organizing Map with two clusters and n input features of any sample is given

How do SOM works?


Let’s say an input data of size (m, n) where m is the number of training examples
and n is the number of features in each example. First, it initializes the weights of
size (n, C) where C is the number of clusters. Then iterating over the input data, for
each training example, it updates the winning vector (weight vector with the shortest
distance (e.g Euclidean distance) from training example). Weight updation rule is
given by :
wij = wij(old) + alpha(t) * (xik - wij(old))
where alpha is a learning rate at time t, j denotes the winning vector, i denotes the
ith feature of training example and k denotes the kth training example from the input
data. After training the SOM network, trained weights are used for clustering new
examples. A new example falls in the cluster of winning vectors.

Algorithm

The steps involved are :


• Weight initialization
• For 1 to N number of epochs
• Select a training example
• Compute the winning vector
• Update the winning vector
• Repeat steps 3, 4, 5 for all training examples.
• Clustering the test sample

43
6. Construct and test the hamming network to cluster vector. Given the exemplar
vectors,e(1)=[1 -1 -1 -1]; e(2)=[-1 -1 -1 1] the bipolar input vector is x1=[-1 -1 1 -1].

Ans:

44
7. Explain learning vectors quantization.
Ans:
Learning Vector Quantization LVQLVQ, different from Vector quantization VQVQ and Kohonen Self-
Organizing Maps KSOMKSOM, basically is a competitive network which uses supervised learning. We
may define it as a process of classifying the patterns where each output unit represents a class. As it
uses supervised learning, the network will be given a set of training patterns with known
classification along with an initial distribution of the output class. After completing the training
process, LVQ will classify an input vector by assigning it to the same class as that of the output unit.
Architecture

45
Following figure shows the architecture of LVQ which is quite similar to the architecture of KSOM. As
we can see, there are “n” number of input units and “m” number of output units. The layers are fully
interconnected with having weights on them.

Training Algorithm
Step 1 − Initialize reference vectors, which can be done as follows −
• Step 1aa − From the given set of training vectors, take the first
“m” numberofclustersnumberofclusters training vectors and use them as weight vectors. The
remaining vectors can be used for training.
• Step 1bb − Assign the initial weight and classification randomly.
• Step 1cc − Apply K-means clustering method.
Step 2 − Initialize reference vector αα
Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.
Step 4 − Follow steps 5-6 for every training input vector x.
Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n

Step 6 − Obtain the winning unit J where Djj is minimum.


Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj then wj(new)=wj(old)+α[x−wj(old)]wj(new)=wj(old)+α[x−wj(old)]
if T ≠ Cj then wj(new)=wj(old)−α[x−wj(old)]wj(new)=wj(old)−α[x−wj(old)]
Step 8 − Reduce the learning rate αα.
Step 9 − Test for the stopping condition. It may be as follows −
• Maximum number of epochs reached.
• Learning rate reduced to a negligible value.

46
8. Explain counter propogation networks.
Ans: Counterpropagation networks were proposed by Hecht Nielsen in 1987. ▸ They are multilayer
networks based on the combinations of the input, output and clustering layers. ▸ The applications of
counterpropagation nets are data compression, function approximation and pattern association. ▸
Hecht Nielsen synthesized the architecture from a combination of a structure known as a competitive
network and Grossberg's outstar structure The counterpropagation network is basically constructed
from an instar-outstar model. ▸ This model is a three-layer neural network that performs input-
output data mapping, producing an output vector y in response to an input vector x, on the basis of
competitive learning. ▸ The three layers in an instar-outstar model are the input layer, the hidden
(competitive) layer and the output layer. ▸ The connections between the input layer and the
competitive layer are the instar structure, and the connections existing between the competitive
layer and the output layer are the outstar structure. ▸ The competitive layer is going to be a winner-
take-all network or a Maxnet with lateral feedback connections. ▸ There exists no lateral connection
within the input layer and the output layer. ▸ The connections between the layers are fully
connected. A counterpropagation net is an approximation of its training input vector pairs by
adaptively constructing a lookup-table. ▸ By this method, several data points can be compressed to
a more manageable number of look-up-table entries. ▸ The accuracy of the function approximation
and data compression is based on the number of entries in the look-up-table, which equals the
number of units in the cluster layer of the net. ▸ There are two stages involved in the training
process of a counterpropagation net. ▸ The input vectors are clustered in the first stage. Originally, it
is assumed that there is no topology included in the counterpropagation network. ▸ However, on
the inclusion of a linear topology, the performance of the net can be improved. The clusters are
formed using Euclidean distance method or dot product method. ▸ In the second stage of training,
the weights from the cluster layer units to the output units are tuned to obtain the desired response.
▸ There are two types of counterpropagation nets: ▸ (i) Full counterpropagation net and (ii)
forward-only counterpropagation net.
Full Counterpropagation Net ▸ Full counterpropagation net (full CPN) efficiently represents a large
number of vector pairs x:y by adaptively constructing a look-up-table. ▸ The approximation here is
x*:y* which is based on the vector pairs x:y, possibly with some distorted or missing elements in
either vector or both vectors.
Forward-Only Counterpropagation Net ▸ A simplified version of full CPN is the forward-only CPN.
The approximation of the function y = f (x) but not of x = f (y) can be performed using forward-only
CPN, i.e., it may be used if the mapping from x to y is well defined but mapping from y to x is not
defined. ▸ In forward-only CPN only the x-vectors are used to form the clusters on the Kohonen
units. ▸ Forward only CPN uses only the x vectors to form the clusters on the Kohonen units during
first phase of training.

47
9. Give the outline of adaptive resonance theory 1 networks.

Adaptive Resonance Theory ART networks, as the name suggests, is always open to new learning adaptive
without losing the old patterns resonance
The adaptive resonance model was developed to solve the problem of
instability occurring in feed-forward systems.
▸ The term "resonance" refers to resonant state of a neural network in which a category prototype
vector matches close enough to the current input vector. ART matching leads to this resonant state,
which permits learning. The network learns only in its resonant state.
▸ There are two types of ART:
▸ ART 1 and ART 2.

Adaptive resonance theory 1 (ART 1) network is designed for binary input vectors.
▸ The ART 1 net consists of two fields of units-input unit (F1unit) and output unit (F2
unit)-along with the reset control unit for controlling the degree of similarity of patterns placed on
the same cluster unit.
▸ There exist two sets of weighted interconnection path between F1and F2 layers.
▸ The supplemental unit present in the net provides the efficient neural control of the learning
process.
▸ Carpenter and Grossberg have designed ART 1 network as a real-time system.
▸ In ART 1 network, it is not necessary to present an input pattern in a particular order; it can be
presented in any order.
▸ ART 1 network can be practically implemented by analog circuits governing the differential
equations,
i.e., the bottom-up and top-down weights are controlled by differential equations.
▸ ART 1 network runs throughout autonomously. It does not require any external control signals and
can run stably with infinite patterns of input data.
ART 1 network is trained using fast learning method, in which the weights reach equilibrium during
each learning trial. ▸ During this resonance phase, the activations of F1 units do not change; hence
the equilibrium weights can be determined exactly. ▸ The ART 1 network performs well with perfect
binary input patterns, but it is sensitive to noise in the input data. Hence care should be taken to
handle the noise.

48
Computational units
▸ The computational unit for ART 1 consists of the following:
▸ 1. Input units (F1 unit - both input portion and interface portion).
▸ 2. Cluster units (F2 unit - output unit).
▸ 3. Reset control unit (controls degree of similarity of patterns placed on same cluster).

49
50
51
52
53

You might also like