FINAL PROJRCT (GROUP A) - IT (5th SEMESTER) DOCUMENT
FINAL PROJRCT (GROUP A) - IT (5th SEMESTER) DOCUMENT
FINAL PROJRCT (GROUP A) - IT (5th SEMESTER) DOCUMENT
We hereby declare that the project work being presented in the project proposal entitled “HAND
DRAWN GEOMETRIC SHAPES (TRIANGLE AND SQUARE) CLASSIFICATION USING
DEEP LEARNING” in partial fulfilment of the requirements for the award of the degree of
BACHELOR OF INFORMATION TECHNOLOGY at GURU NANAK INSTITUTE OF
TECHNOLOGY ,SODEPUR, KOLKATA, WEST BENGAL, is an authentic work carried out
under the guidance of MR. TRIDIP CHAKRABORTY. The matter embodied in this project work
has not been submitted elsewhere for the award of any degree of our knowledge and belief.
December 2020
This is to certify that this proposal of minor project entitled “HAND DRAWN GEOMETRIC
SHAPES (TRIANGLE AND SQUARE) CLASSIFICATION USING DEEP LEARNING” is
a record of bonafide work, carried out by Debayan Roy , Sayani Sengupta , Sriparna Ghosh ,
Pallab Nath, Gourav Nandy and Roumita Singha under my guidance at GURU NANAK
INSTITUTE OF TECHNOLOGY. In my opinion, the report in its present form is in partial
fulfilment of the requirements for the award of the degree of BACHELOR OF INFORMATION
TECHNOLOGY and as per regulations of the institution . To the best of my knowledge, the
results embodied in this report, are original in nature and worthy of incorporation in the present
version of the report.
-------------------------------------- -------------------------------------------
HEAD OF THE DEPARTMENT PROJECT SUPERVISOR
Mr. Sudeep Ghosh Mr. Tridip Chakraborty
Head Of the Department Assistant Professor
Department of Information Technology Department of Information Technology .
GNIT , Kolkata GNIT , Kolkata
Invigilator :- ___________________________
Acknowledgement
Success of any project depends largely on the encouragement and guidelines of many others. I take
this sincere opportunity to express my gratitude to the people who have been instrumental in the
successful completion of this project work.
I would like to show our greatest appreciation to Mr. Tridip Chakraborty , Assistant Professor ,
Department of Information Technology at Guru Nanak Institute of Technology , Kolkata. I always
feel motivated and encouraged every time by his valuable advice and constant inspiration; without
his encouragement and guidance this project would not have materialized.
Words are inadequate in offering our thanks to the other mates , teachers and other members at
GNIT , Kolkata for their encouragement and cooperation in carrying out this project work. The
guidance and support received from all the members and who are contributing to this project, was
vital for the success of this project.
Index :-
8. Review Work
A . Detailed Designing ……………………………………….. 20
B . Project Code ………………………………………………. 22
C . Dataset Representation ……………………………………. 26
D . Output Screen ………………………………………………27
Classification of hand drawn geometric shapes (triangle and square) using deep learning
2
Introduction
Image classification refers to the task of extracting information classes from a multiband raster
image. The resulting raster from image classification can be used to create thematic maps.
Depending on the interaction between the analyst and the computer during classification.
There are two types of classification: supervised and unsupervised.
Supervised Classification
Supervised classification uses the spectral signatures obtained from training samples to classify an
image. With the assistance of the Image Classification toolbar, you can easily create training
samples to represent the classes you want to extract. You can also easily create a signature file
from the training samples, which is then used by the multivariate classification tools to classify the
image.
Unsupervised Classification
Unsupervised classification finds spectral classes (or clusters) in a multiband image without the
analyst’s intervention. The Image Classification toolbar aids in unsupervised classification by
providing access to the tools to create the clusters, capability to analyze the quality of the clusters,
and access to classification tools.
3
The project discusses an approach involving hand drawn digital image processing and geometric
shape for recognition of Two - Dimensional shapes of objects such as squares and triangles as well
as the color of the object. This approach can be extended to applications like robotic vision and
computer intelligence. The methods involved are three dimensional RGB image to two
dimensional black and white image conversion, color pixel classification for object-background
separation, Area Based filtering and use of bounding box and its properties for calculating object
metrics.
We try to find optimal ways of recognizing hand drawn geometric shapes ( square and triangles )
of files in different formats. Shapes recognition is a field of artificial intelligence, which includes
all representation and decision techniques to automate the process of identifying similarities
between objects or phenomena. An application of shapes recognition requires the definition of
descriptors and choosing a distance. An application of shapes recognition is done in two phases:
learning and recognition.
This detection is carried out by generating a dataset which have two sets – Validation and Training
. While checking of the image format is carried off , the part of dataGenerator comes into figure.
Then the model is created , compiled and trained to get the following output.
4
Glossary
Machine learning implementations are classified into three major categories, depending on the
nature of the learning “signal” or “response” available to a learning system which are as follows:-
Supervised learning : When an algorithm learns from example data and associated target
responses that can consist of numeric values or string labels, such as classes or tags, in order to
later predict the correct response when posed with new examples comes under the category of
Supervised learning. This approach is indeed similar to human learning under the supervision of a
teacher. The teacher provides good examples for the student to memorize, and the student then
derives general rules from these specific examples.
Unsupervised learning : Whereas when an algorithm learns from plain examples without any
associated response, leaving to the algorithm to determine the data patterns on its own. This type
of algorithm tends to restructure the data into something else, such as new features that may
represent a class or a new series of un-correlated values. They are quite useful in providing humans
with insights into the meaning of data and new useful inputs to supervised machine learning
algorithms. As a kind of learning, it resembles the methods humans use to figure out that certain
objects or events are from the same class, such as by observing the degree of similarity between
objects. Some recommendation systems that you find on the web in the form of marketing
automation are based on this type of learning.
Reinforcement learning : When you present the algorithm with examples that lack labels, as
in unsupervised learning. However, you can accompany an example with positive or negative
feedback according to the solution the algorithm proposes comes under the category of
Reinforcement learning, which is connected to
5
applications for which the algorithm must make decisions (so the product is prescriptive, not just
descriptive, as in unsupervised learning), and the decisions
bear consequences. In the human world, it is just like learning by trial and error. Errors help you
learn because they have a penalty added (cost, loss of time, regret, pain, and so on), teaching you
that a certain course of action is less likely to succeed than others. An interesting example of
reinforcement learning occurs when computers learn to play video games by themselves. In this
case, an application presents the algorithm with examples of specific situations, such as having the
gamer stuck in a maze while avoiding an enemy. The application lets the algorithm know the
outcome of actions it takes, and learning occurs while trying to avoid what it discovers to be
dangerous and to pursue survival.
Semi-supervised learning : where an incomplete training signal is given: a training set with
some (often many) of the target outputs missing. There is a special case of this principle known as
Transduction where the entire set of problem instances is known at learning time, except that part
of the targets are missing.
Classification : When inputs are divided into two or more classes, and the learner must produce
a model that assigns unseen inputs to one or more (multi-label classification) of these classes. This
is typically tackled in a supervised way. Spam filtering is an example of classification, where the
inputs are email (or other) messages and the classes are “spam” and “not spam”.
1. Regression : Which is also a supervised problem, A case when the outputs are continuous
rather than discrete.
2. Clustering : When a set of inputs is to be divided into groups. Unlike in classification, the
groups are not known beforehand, making this typically an unsupervised task.
6
Problem Definition
Technology Used
Proposed Scheme
Project Planning
The first layer of a neural network takes in all the pixels within an image. After all the data has
been fed into the network, different filters are applied to the image, which forms representations
of different parts of the image. This is feature extraction and it creates "feature maps". This process
of extracting features from an image is accomplished with a "convolutional layer", and convolution
is simply forming a representation of part of an image. It is from this convolution concept that we
get the term Convolutional Neural Network (CNN), the type of neural network most commonly
used in image classification.
Digital images are rendered as height, width, and some RGB value that defines the pixel's colors,
so the "depth" that is being tracked is the number of color channels the image has. Grayscale (non-
color) images only have 1 color channel while color images have 3 depth channels. All of this
means that for a filter of size 3 applied to a full-color image, the dimensions of that filter will be 3
x 3 x 3. For every pixel covered by that filter, the network multiplies the filter values with the
values in the pixels themselves to get a numerical representation of that pixel. This process is then
done for the entire image to achieve a complete representation. The filter is moved across the rest
of the image according to a parameter called "stride", which defines how many pixels the filter is
to be moved by after it calculates the value in its current position. The end result of all this
calculation is a feature map. This process is typically done with more than one filter, which helps
preserve the complexity of the image.
10
After the feature map of the image has been created, the values that represent the image are passed
through an activation function or activation layer. The activation function takes values that
represent the image, which are in a linear form (i.e. just a list of numbers) thanks to the
convolutional layer, and increases their non-linearity since images themselves are non-linear. The
typical activation function used to accomplish this is a Rectified Linear Unit (ReLU), although
there are some other activation functions that are occasionally used.
After the data is activated, it is sent through a pooling layer. Pooling " Down -Samples " an image,
meaning that it takes the information which represents the image and compresses it, making it
smaller. The pooling process makes the network more flexible and more adept at recognizing
objects/images based on the relevant features.
When we look at an image, we typically aren't concerned with all the information in the
background of the image, only the features we care about, such as people or animals. Similarly, a
pooling layer in a CNN will abstract away the unnecessary parts of the image, keeping only the
parts of the image it thinks are relevant, as controlled by the specified size of the pooling layer.
Because it has to make decisions about the most relevant parts of the image, the hope is that the
network will learn only the parts of the image that truly represent the object in question. This helps
prevent overfitting, where the network learns aspects of the training case too well and fails to
generalize to new data.
There are various ways to pool values, but max pooling is most commonly used. Max pooling
obtains the maximum value of the pixels within a single filter (within a single spot in the image).
This drops 3/4ths of information, assuming 2 x 2 filters are being used. The maximum values of
the pixels are used in order to account for possible image distortions, and the parameters/size of
the image are reduced in order to control for overfitting. There are other pooling types such as
average pooling or sum pooling, but these aren't used as frequently because max pooling tends to
yield better accuracy.
11
➢ Block diagram
➢ Model Implementation
Let’s discuss how to train model from scratch and classify the data containing triangle and square.
Train Data : Train data contains the 160 images of each triangle and square i.e. total there are
320 images in the training dataset.
Test Data : Test data contains 40 images of each triangle and square i.e. total there are 80 images
in the test dataset.
Triangle-2.jpg
…………..
14
if K.image_data_format() == ‘channels_first’:
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
This part is to check the data format i.e the RGB channel is coming first or last so,
whatever it may be, model will check first and then input shape will be feeded
accordingly.
15
Now, the part of dataGenerator comes into figure. In which we have used:
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#this is the augmentation configuration we will use for testing:
#only rescaling
test_datagen= ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=’binary’)
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=’binary’)
16
ImageDataGenerator that rescales the image, applies shear in some range, zooms the image and
does horizontal flipping with the image. This ImageDataGenerator includes all possible
orientation of the image.Train_datagen.flow_from_directory is the function that is used to
prepare data from the train_dataset directory Target_size specifies the target size of the image.
Test_datagen.flow_from_directory is used to prepare test data for the model and all is similar as
above. Fit_generator is used to fit the data into the model made above, other factors used are
steps_per_epochs tells us about the number of times the model will execute for the training data.
Epochs tells us the number of times model will be trained in forward and backward
pass.Validation_data is used to feed the validation/test data into the model.Validation_steps
denotes the number of validation/test samples.
model.add(Dense(64))
model.add(Activation(‘relu’))
model.add(Dropout(0.1))
model.add(Dense(1))
model.add(Activation(‘sigmoid’))
model.summary()
model.compile(loss=’binary_crossentropy’,
optimizer=’rmsprop’,
metrics=[‘accuracy’])
Compile function is used here that involve use of loss, optimizers and metrics.here loss function
used is binary_crossentropy, optimizer used is rmsprop.
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
model.save_weights(‘first_try.h5’)
i_pred = image.load_img(‘/content/drive/MyDrive/Dataset/validiation/triangle/Triangle-20’,
target_size=(150, 150))
img_pred = image.img_to_array(img_pred)
img_pred = np.expand_dims(img_pred, axis= 0)
19
rslt = model.predict(img_pred)
print(rslt)
if rslt[0][0] == 1:
prediction=”Triangle”
else:
prediction=”Square”
print(prediction)
20
Review Work
Project Designing
A. Detailed Designing
B. Project Code
#import Libraries
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
#Generate Data
img_width, img_height = 150, 150
train_data_dir = '/content/drive/MyDrive/Dataset/train'
validation_data_dir='/content/drive/MyDrive/Dataset/validation'
nb_train_samples = 320
nb_validation_samples= 80
epochs = 2
batch_size = 1
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
23
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#Create Model
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.summary()
model.add(Conv2D(32,(3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.1))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
25
metrics=['accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size)
model.save_weights('first_try.h5')
img_pred = image.load_img('/content/drive/MyDrive/Dataset/validiation/triangle/Triangle-20',
target_size=(150, 150))
img_pred = image.img_to_array(img_pred)
img_pred = np.expand_dims(img_pred, axis= 0)
#run model
rslt = model.predict(img_pred)
print(rslt)
if rslt[0][0] == 1:
prediction="Triangle"
else:
prediction="Square"
print(prediction)
26
C. Dataset Representation
Output Screen
❖ Understandability:
A method is understandable if someone other than the creator of the method can understand
the code. We use the method which small and coherent helps to accomplish this.
❖ Cost-Effectiveness:
It is under cost budget. It is desirable to aim for a system with a minimum cost subject to
the condition that it must satisfy the entire requirement.
29
Our goal of this project was to detect geometric shaped objects from an image, separate and then
recognize these objects.
Moreover , day by day people are discovering more and more technologies to decrease the
sufferings of people. There has been invented robot to reduce the people effort to the hard
section of various fields of invention and researches.
Object detection is the hard section of various fields of inventions and researches . Object
detection is the crucial requirements in robotics. Without detection or recognition of objects
robot can’t perform any significant role.
30
Future Scope
This model can be easily implemented under various situations. We can add new features
as we required. Reusability is possible as and when require in this application. There is
flexibility in all the modules.
➢ Further Extension :
In this paper we proposed a system that uses convolution
neural network for extracting and selecting the features for any given image and classify
the images into appropriate classes.
▪ Hence , we conclude that Convolution Neural Networks are a good choice for Image
Classification. Further this system can be extended for applications such as biometric
recognition .
Conclusion
The geometric features have been analyzed using the immediate output of 2D classification
algorithm -the borders of shapes have toothed form. If line simplification algorithms are applied,
the correlation and importance of features may be different. The geometric feature “rectilinearity”
has not been analyzed together with other features under scope of study, because it requires using
line simplification algorithms. Therefore , it must be discovered independently to compare the best
combination of algorithms and their input parameters with features researched under this study.
The combinations of statistical, spatial and geometric features belong to different groups of
parameters. Therefore , correlation among them must be minimal, but clusters are located in
sufficient distance one from other providing good conditions for automatic classification.
Detection of objects from an image is an important technological revolution. Our goal of this
project was to detect geometric shaped objects from an image, separate and then recognize these
objects.
For the input image 80 Squares and 80 Triangles used as a training data and 8 Squares and 8
Triangles used as a validation data. A detailed experiment with different input image and different
detection accuracy has been shown in Moreover, day by day people are discovering more
and more technologies to decrease the sufferings of people. There has been invented robot to
reduce the people effort to the hard section of various fields of inventions and researches. Object
detection is the crucial requirement in robotics. Without detection or recognition of objects robot
can’t perform any significant role. The solution presented in this paper will enhance the capability
of object detection and recognition for industrial robots.
32
Bibliography
References :-
THANK YOU