Nothing Special   »   [go: up one dir, main page]

Object Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

A MAJOR PROJECT

ON

OBJECT DETECTION USING MACHINE LEARNING


ALGORITHMS
SUBMITED TO

P.V.K.N GOVERNMENT COLLEGE (A)


CHITTOOR, A.P
(Autonomous Institution)
(Affiliated to Sri Venkateswara University, Tirupati)
(Re-Accredited by NAAC with ‘A’ Grade)

In partial fulfillment of the requirements for the award of the degree of


MASTER OF SCIENCE
By

B.HANUMANTHA RAO (M210606501)

Under the guidance of

Mr. M. ISMAIL
Lecturer In Computer Science

DEPARTMENT OF COMPUTER SCIENCE


2020-2022
CERTIFICATE
This is to certify that this is the bona-fide record of the mini project entitled
“OBJECT DETECTION USING MACHINE LEARNING ALGORITHMS”,
submitted By B. Hanumantha Rao (M210606501) to P.V.K.N Govt. College ,
Vellore Road , Chittoor ,Andhra Pradesh ,in the partial fulfillment of the
requirements for the of Master of science during the year 2020-2022.
INTERNAL GUIDE

S. VINILA KUMARI

HEAD OF THE DEPARTMENT PROJECT GUIDE

M . ISMAIL, M. Tech , M.ISMAIL ,M. Tech,


Head of the department Assistance Professor
Dept .of Computer Science Dept . of Computer Science
P.V.K.N-Chittoor P.V.K.N- Chittoor

External Viva Voce Held On______________________

Internal Examiner External Examiner


P.V.K.N . GOVT COLLEGE - CHITTOOR-517001
(Affiliated to S.V. University, TIRUPATHI)

DEPARTMENT OF COMPUTER SCIENCE

DECLARATION

B. HANUMANTHA RAO(M210606501)

Hereby declare that the project report entitled “OBJECT DETECTION


USING MACHINE LEARNING ALGORITHMS ” done under the esteemed guidance
of M.ISMAIL, M. Tech , Assistant Professor , Department Of Computer
Science, P.V.K.N. Govt. College is Submitted in Partial fulfillment for the award
of Master of Computer Science .

DATE:
PLACE:
ACKNOWLEDGEMENT
ACKNOWLEDGEMENT
A part from the effort of me the success of the project depends largely on
the encouragement and guidelines of many others . I take opportunity to express
my gratitude to the people who have been instrumental in the successful
completion of this project.

I Would also like to thanks to our PRINCIPAL Dr. G. ANANDA


REDDY, M.Com, Ph.D., who had been a source of inspiration and for his
timely guidance in the conduct of my project work.

I Would also like to show my Greatest appreciation to Sri . M


.ISMAIL, M. Tech., HEAD OF DEPARTMENT OF COMPUTER
SCIENCE. I can’t say thanks enough for this tremendous support and help. I
feel Motivated and encouraged every time I attend his meeting . Without his
encouragement and guidance this project would not have materialized.

I wish to express my sense of gratitude to my internal guide Sri . M .


ISMAIL, M .Tech ., Assistant Professor, for his valuable guidance and useful
suggestion, which helped me in completing the seminar report , in time .

Finally , yet importantly, I would to express my heart full thanks to


my beloved parents for their blessings, my friends /classmate for their help and
wishes for the successful completion of this project.

B. Hanumantha Rao (M210606501)


TABLE OF CONTENTS
S.NO TITLE PG.NO

1 INTRODUCTION 04
1.1 PURPOSE AND OBJECTIVES 01
1.2 EXISTING AND PROPOSED SYSTEM 02
1.3 SCOPE OF PROJECT 04

2 LITERATURE SURVEY 5-6

3 SYSTEM ANALYSIS 7-9

3.1 HARDWARE AND SOFTWARE REQUIREMENTS 08

3.2 SOFTWARE REQUIREMENTS SPECIFICATION 09

4 SYSTEM DESIGN 10-17

4.1 DESCRIPTION 11

4.2 ARCHITECTURE 12

4.3 UML DIAGRAMS 13-17

5 METHODOLOGY 18-39

5.1 TECHNOLOGIES USED 19-23

5.2 MODULES DESCRIPTION 24

5.3 PROCESS/ALGORITHM 25-39

6 IMPLEMENTATION 40-52

6.1 SAMPLE CODE 41-46

6.2 OUTPUT SCREENS 47-49

6.3 TEST CASES 50-52

7 CONCLUSION 53

BIBLIOGRAPHY 55
ABSTRACT
1. ABSTRACT

Object detection is a key problem in computer vision. Detection can


be difficult since there are all kinds of variations in orientation, lighting,
background and occlusion that can result in completely different images of the
very same object. Now with the advance of deep learning and neural network, we
can finally tackle such problems without coming up with various heuristics real-
time.

The project “Object Detection” detects objects efficiently based on CNN


algorithm and apply the algorithm on image data. Various algorithms like ‘You
Only Look Once’ and other Convolutional Neural Networks helps in achieving
detection of objects in different position and orientation. We first take a pre-
trained convolutional neural network.
INTRODUCTION
1. INTRODUCTION

1.1 PURPOSE AND OBJECTIVES


A few years ago, the creation of the software and hardware image
processing systems was mainly limited to the development of the user interface,
which most of the programmers of each firm were engaged in.

However, this has not yet led to the cardinal progress in solving typical tasks
of recognizing faces, car numbers, road signs, analyzing remote and medical
images, etc.

In these scenario, deep learning is typically used and among various network
architectures used in deep learning, Convolutional Neural Networks (CNN) are
widely used in image recognition.

We train the last layer of the network based on the number of classes that
need to be detected and the model is fed with different types of objects in different
position, lightning and orientation which is highly required for getting better
prediction. Then we get the Region of Interest for each image. Each and every
object that is detected is labelled along with the accuracy of the detection over
the bounded boxes.

Currently, In this project, we are working with Single-Shot-Multi Box-


Detector(SSD) algorithm to detect objects using Mobile Net which is a
Convolutional Neural Network

1
P.V.K.N GOVT COLLEGE , CHITTOOR
1.2 EXISTING SYSTEM

➢ The improved versions of R-CNN, like Fast RCNN and Faster-RCNN, used
more strategies to reduce computation of region but they didn’t pushed real-time
inference speed.

➢ A YOLO system, however, broke the bottleneck by integrating region


proposal and classification into a single regression problem straight from image
pixels to bounding box coordinates and class probabilities and evaluate each full
image with a single run.

➢ Since the whole detection pipeline is a single network, it can be optimized


end-to-end directly on detection performance. YOLO is the first framework to
reach real time detection standard with 45 FPS (on GPU) and a MAP(mean
Average Precision) of 63.4% on VOC2007, but still has drawback in detecting
smaller objects.

2
P.V.K.N GOVT COLLEGE , CHITTOOR
1.2 PROPOSED SYSTEM

➢ To avoid above situations we propose this model We included Single Shot


Multi Box Detector architecture.
➢ We build a Mobile-network model , TensorFlow, OpenCV that detects
objects, with much accuracy, and is robust.
➢ Through combining anchor box proposal system of faster-RCNN and using
multi-scale features to do detection layer, detecting smaller objects was eased.

3
P.V.K.N GOVT COLLEGE , CHITTOOR
1.3 SCOPE OF PROJECT

➢ This application is intended to be used in any working environment accuracy


and precision are highly desired to serve the purpose.
➢ As mentioned, the proposed model is able to detect around 90 objects. As part
of the future enhancements, the model will be custom trained with the other
objects to increase its detection capability.
➢ With the help of transfer learning, the used network will be trained with other
objects to increase the scope of objects the Mobile.Net can detect.

4
P.V.K.N GOVT COLLEGE , CHITTOOR
LITERATURE SURVEY

5
P.V.K.N GOVT COLLEGE , CHITTOOR
2. LITERATURE SURVEY

➢ To avoid above situations we propose this model We included Single Shot


Multi Box Detector architecture.
➢ We build a Mobile-network model , TensorFlow, OpenCV that detects
objects, with much accuracy, and is robust.
➢ Through combining anchor box proposal system of faster-RCNN and using
multi-scale features to do detection layer, detecting smaller objects was eased

6
P.V.K.N GOVT COLLEGE , CHITTOOR
3.SYSTEM ANALYSIS

7
P.V.K.N GOVT COLLEGE , CHITTOOR
3.1 HARDWARE AND SOFTWARE REQUIREMENTS
The development and deployment of the application require the following
general and specific minimum requirements for hardware:

Components Minimum requirement

Ram Capacity Minimum of 4GB

Camera Any desktop/laptop supported

Hard Disk Minimum of 2GB

Processor Intel Pentium or higher

The development and deployment of the application requires the following


general and specific minimum requirements for software:

Components Minimum requirement

Operating System Windows 7,8,10 (or)


Linux

Coding Language Python

Coding Languages libraries OpenCV – pip install opencv-python.


(version 4.1.0 and above) TensorFlow
– pip install tensor flow (TensorFlow
as backend) Keras – pip install keras
Numpy – pip install numpy

Software Environment Visual Studio

8
P.V.K.N GOVT COLLEGE , CHITTOOR
3.2 SOFTWARE REQUIREMENTS SPECIFICATION

Functional Requirements
➢ A real time image or a video can be fed to the ML model.

➢ User has to open the application.

➢ User has to choose either a live detection or a video detection.

➢ User will be able to get labelled objects detected.

➢ All the detected objects along with their labels are displayed with in the video
or the image.
Non-Functional Requirements.
➢ Performance: User will get the desired output, without getting interrupted.
➢ Scalability: The model can handle large data sets.

➢ Reliability: The probability of accurate object detection is more because the


algorithm is well organized.

9
P.V.K.N GOVT COLLEGE , CHITTOOR
4. SYSTEM DESIGN

10
P.V.K.N GOVT COLLEGE , CHITTOOR
4.1 DESCRIPTION
Object detection is to detect all instances of objects from a known class,
such as people, cars, or faces in an image. Generally, only a small number of
instances of the object are present in the image, but there is a very large number
of possible locations and scales at which they can occur and that need to
somehow be explored.
Each detection of the image is reported with some form of pose
information. This is as simple as the location of the object, a location, and scale,
or the extent of the object defined in terms of a bounding box.
In some other situations, the pose information is more detailed and
contains the parameters of a linear or non-linear transformation. For example,
face detection in a face detector may compute the locations of the eyes, nose, and
mouth, in addition to the bounding box of the face

11
P.V.K.N GOVT COLLEGE , CHITTOOR
4.2 ARCHITECTURE

RCNN or Region Convolutional Neural Network determines the


location of multiple objects in an image. An image is split into various regions of
interest to scan for an object. It scans for the specific regions of interest that will
likely contain an object of value. The input image is processed using selecting
search method, generating about 2000 region proposals. The proposal regions are
run through CNN, then fed into a classification subnetwork to determine the
Object class. The figure below illustrates R-CNN architecture

12
P.V.K.N GOVT COLLEGE , CHITTOOR
4.3UML DIAGRAMS

The unified modeling is a standard language for specifying,


visualizing, constructing, and documenting the system and its components
is a graphical language that provides a vocabulary andset of semantics and
rules. The UML focuses on the conceptual and physical representation of
the system.
It captures the decisions and understandings about systems that must
be constructed. It is usedto understand, design, configure and control
information about the systems.
Depending on the development culture, some of these artifacts are treated
more or less formally than others.
Such artifacts are not only the deliverables of a project; they are also
critical in controlling, measuring, and communicating about a system
during its development and after its deployment.

The UML addresses the documentation of a system's architecture and all of its
details. The UML also provides a language for expressing requirements and for
tests. Finally, the UML provides a language for modeling the activities of project
planning and release management

13
P.V.K.N GOVT COLLEGE , CHITTOOR
4.3.1Use Case Diagram:

A use case diagram is a graph of actors’ set of use cases enclosed


by a system boundary, communication associations between the actors
and users, and generalization among use cases.
In our project, the actors are the user and the machine(server). The
user starts the application and selects the a desired mode of video input for
the network and the server classifies the objects based on the labeled-map
, then it returns the labeled frame or object as the output.

14
P.V.K.N GOVT COLLEGE , CHITTOOR
4.3.2Sequence Diagram:

Sequence Diagrams are interaction diagrams that detail how


operations are carried out. They capture the interaction between objects
in the context of a collaboration. there are three life lines in our project
they are user-application, server. These operations between the lifelines
are of stepwise in an order.

15
P.V.K.N GOVT COLLEGE , CHITTOOR
4.3.3Activity Diagram:

A UML activity diagram is basically used to document the logic of


a single operation, a single- use case, and follow the communication
process. The activity can be described as an operation of the system. The
control flow is drawn from one operation to another.

16
P.V.K.N GOVT COLLEGE , CHITTOOR
4.3.4Class Diagram:

Objects in this class diagram are user, application and the server.
In this class diagram, the server and the application have an association
connection. User and Application have a direct association connection.

17
P.V.K.N GOVT COLLEGE , CHITTOOR
5.METHODOLOGY

18
P.V.K.N GOVT COLLEGE , CHITTOOR
5.1 TECHNOLOGIES USED:

Python:

Python is an interpreted, object-oriented, high-level programming


language with dynamic semantics.

Its high-level built-in data structures, combined with dynamic typing


and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect
existing components together.

Python's simple, easy to learn syntax emphasizes readability and therefore


reduces the cost of program maintenance.

19
P.V.K.N GOVT COLLEGE , CHITTOOR
OpenCV

OpenCV-Python is a library of Python bindings designed to


solve computer vision problems.

OpenCV-Python makes use of Numpy, which is a highly


optimized library for numerical operations with a MATLAB-style
syntax.
All the OpenCV array structures are converted to and from Numpy
arrays. This also makes it easier to integrate with other libraries that use
Numpy such as SciPy and Matplotlib.

TensorFlow – pip install tensorflow- gpu


It is an open source artificial intelligence library, using data
flow graphs to build models. It allows developers to create large-
scale neural networks with many layers. TensorFlow is mainly used
for: Classification, Perception, Understanding, Discovering,
Prediction and Creation.

20
P.V.K.N GOVT COLLEGE , CHITTOOR
Tkinter:

Python provides the standard library Tkinter for creating the graphical user
interface for desktop-based applications.
Developing desktop-based applications with python Tkinter is not a
complex task.
An empty Tkinter top- level window can be created by using the following
steps.
import the Tkinter module.
Create the main application window.
Add the widgets like labels, buttons, frames, etc. to the window.
Call the main event loop so that the actions can take place on the user's computer
screen.

21
P.V.K.N GOVT COLLEGE , CHITTOOR
Keras:

Keras is an open-source high-level Neural Network library,


which is written in Python is capable enough to run on Theano, TensorFlow, or
CNTK.

It was developed by one of the Google engineers, Francois Chollet. It is


made user-friendly, extensible, and modular for facilitating faster
experimentation with deep neural networks. It not only supports Convolutional
Networks and Recurrent Networks individually but also their combination.

It cannot handle low-level computations, so it makes use of the Backend


library to resolve it. The backend library act as a high-level API wrapper for the
low-level API, which lets it run on TensorFlow, CNTK, or Theano.

22
P.V.K.N GOVT COLLEGE , CHITTOOR
Numpy:

Numpy is a general-purpose array-processing package. It provides a high-


performance multidimensional array object and tools for working with these
arrays.

It is the fundamental package for scientific computing with Python.


Besides its obvious scientific uses, Numpy can also be used as an efficient
multi-dimensional container of generic data.

23
P.V.K.N GOVT COLLEGE , CHITTOOR
5.2MODULES DESCRIPTION

Modules:
Various modules used in the project are:
1) GUI

2) Object-Detection

Module I:

It mainly concentrates on User Interface. A library called tKinter


is used to develop an application UI . It has a text describing the
project and has some buttons which lets user to choose between the
options such as Live Object Detection and Object Detection Using a
Video.
It also lets user to upload an mp4 file in-order to detect objects using a
video file.

Module II:

This project lets user to choose detection using either live detection
or by uploading an mp4 file. The project uses pre-trained Single Shot Multi
Box-Detector(SSD) Mobile Net to detect various objects. A frozen-
inference-graph is extracted and used which has the pre-trained weights of
the objects which helps in faster execution and efficient detection of
objects in the frame.

A session is used in the project, which allows user to execute graphs i.e.,
frozen inference graphs.

Each and every frame from the live feed or from the video file is used to
detect the objects and after successful detection, a rectangular box is
drawn around the detected object and also labelled accordingly as
detected from the labelled-map.

24
P.V.K.N GOVT COLLEGE , CHITTOOR
5.3 PROCESS/ALGORITHM

Mobile Net:
❖ Mobile Net is an object detector is an efficient CNN architecture
designed for mobile and embedded vision applications. This
architecture uses proven depth-wise separable convolutions to
build light weight deep neural networks.
❖ The core layers of Mobile Net is built on depth-wise separable
filters. The first layer, which is a full convolution, is an exception.
❖ Basic operations like reshaping and resizing of images during
feeding the data to the model is done.
❖ Data preprocessing involves conversion of data from a given format
to much more user friendly, desired and meaningful format.
❖ The proposed method deals with image and video data using Numpy
and OpenCV.

❖ Data visualization is the process of transforming abstract data to


meaningful representations using knowledge communication and
insight discovery through encodings.
❖ The SSD MobileNet model is pretrained with COCO(common
objects in context) dataset.
❖ This model consists of 90 different labelled classes.

25
P.V.K.N GOVT COLLEGE , CHITTOOR
❖ These networks uses Depth-Wise separable convolutions in place
of the standard convolutions used in earlier architectures to build
lighter models.
❖ Each depth-wise separable convolution layer consists of a depth-
wise convolution and a pointwise convolution. Counting depth-
wise and pointwise convolutions as separate layers, a MobileNet
has 28 layers.
❖ A frozen inference graph from the pre-trained which has the
weights of pretrained objects is used in detecting objects.
❖ Freezing is the process to identify and save all of required
things(graph, weights etc) in a single file that you can easily use.

26
P.V.K.N GOVT COLLEGE , CHITTOOR
Depth Wise Convolution:

Depth-wise Convolution is a type of convolution where we apply a


single convolutional filter for each input channel. In the regular 2D
convolution performed over multiple input channels, the filter is as deep
as the input and lets us freely mix channels to generate each element in the
output. In contrast, depth-wise convolutions keep each channel separate.
In this process, the procedural module followed :

2. Split the input and filter into channels.

3. We convolve each input with the respective filter.


4. We stack the convolved outputs together.

The above shows the pictorial representation of depth-wise separable


convolutions.

27
P.V.K.N GOVT COLLEGE , CHITTOOR
R-CNN

To circumvent the problem of selecting a huge number of regions, Ross


Girshick et al. proposed a method where we use the selective search to extract
just 2000 regions from the image and he called them region proposals.
Therefore, instead of trying to classify the huge number of regions, you can just
work with 2000 regions. These 2000 region proposals are generated by using
the selective search algorithm which is written below.
Selective Search:
1. Generate the initial sub-segmentation, we generate many candidate regions
2. Use the greedy algorithm to recursively combine similar regions into larger
ones
3. Use generated regions to produce the final candidate region proposals
These 2000 candidate regions which are proposals are warped into a square and
fed into aconvolutional neural network that produces a 4096-dimensional
feature vector as output.The CNN plays the role of feature extractor and the
output dense layer consists of the features extracted from the image and the
extracted features are fed into an SVM for the classify the presence of the job
within that that candidate region proposal. In addition to predicting the presence
of an object within there gion proposals, the algorithm also predicts four values
which are offset values for increasing the precision of the bounding box. For
example, given the region proposal, the algorithm might have predicted the
presence of a person but the face of that person within that region proposal
could have been cut in half. Therefore, the offset values which is given help in
adjusting the bounding box of the region proposal

28
P.V.K.N GOVT COLLEGE , CHITTOOR
Fast R-CNN

The same author of the previous paper(R-CNN) solved some of the


drawbacks of the R-CNN to build a faster object detection algorithm and it was
called Fast R-CNN. The approach is similar to the R-CNN algorithm.
But, instead of feeding the region proposals to the CNN, we feed the
input image to the CNN to generate a convolutional feature map.

29
P.V.K.N GOVT COLLEGE , CHITTOOR
From the convolutional feature map, we can identify the region of the
proposals and warp them into the squares, and by using an RoI pooling layer
shape them into a fixed size so that they can be fed into a fully connected layer.
From the RoI feature vector, we can use a softmax layer to predict the class of
the proposed region and also the offset values for the bounding box.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time.
Instead, the convolution operation is always done only once per image and a
feature map is generated from it.
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to
feed2000regionproposals to the convolutional neural network every time.
Instead, the convolution operationisalwaysdone only once per image and a
feature map is generated from it

30
P.V.K.N GOVT COLLEGE , CHITTOOR
Faster R-CNN

Both of the above algorithms(R-CNN & Fast R-CNN) uses selective


search to find out the region proposals. Selective search is a slow and
time-consuming process that affects the performance of the network.

31
P.V.K.N GOVT COLLEGE , CHITTOOR
Similar to Fast R-CNN, the image is provided as an input to a
convolutional network which provides a convolutional feature map. Instead of
using the selective search algorithm for the feature map to identify the region
proposals, a separate network is used to predict the region proposals. The
predicted region which is the proposal is then reshaped using an RoI pooling
layer which is used to classify the image within the proposed region and predict
the offset values for the bounding boxes.

From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. There fore,it can even be used for real-time object detection.

32
P.V.K.N GOVT COLLEGE , CHITTOOR
YOLO — You Only Look Once
All the previous object detection algorithms have used regions to localize the
object within the image. The network does not look at the complete image.
Instead, parts of the image which has high probabilities of containing the object.
YOLO or You Only Look Once is an object detection algorithm much is
different from the region-based algorithms seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class probabilities
for these boxes.

YOLO works by taking an image and split it into an SxS grid, within each
of the grid we taken bounding boxes. For each of the bounding boxes, the
network gives an output a class probability and offset values for the bounding
box. The bounding boxes have the class probability above-threshold value is
selected and used to locate the object within the image.
YOLO is orders of magnitude faster(45 frames per second) than any other
object detection algorithms. The limitation of the YOLO algorithm is that it
struggles with the small objects within the image for example, it might have
difficulties in identifying a flock of birds. This is due to the spatial constraints
of the algorithm.

33
P.V.K.N GOVT COLLEGE , CHITTOOR
SDD :

The SSD object detection composes of 2 parts:


1. Extract feature maps, and

2. Apply convolution filters to detect objects.

SSD uses VGG16 to extract feature maps. Then it detects objects using the
Conv4_3layer. For illustration, we draw the Conv4_3 to be 8 × 8 spatially (it
should be 38 × 38). For each cell in the image(also called location), it makes 4
object predictions.

34
P.V.K.N GOVT COLLEGE , CHITTOOR
Each prediction composes of a boundary box and 21 scores for each class
(one extra class for no object), and we pick the highest score as the class for the
bounded object. Conv4_3 makes total of38× 38 × 4 predictions: four predictions
per cell regardless of the depth of feature maps. As expected, many predictions
contain no object. SSD reserves a class “0” to indicate

35
P.V.K.N GOVT COLLEGE , CHITTOOR
SSD does not use the delegated region proposal network. Instead, it
resolves to a very simple method.It computes both the location and class scores
using small convolution filters. After extraction the feature maps, SSD applies
3 × 3 convolution filters for each cell to make predictions. (These filters compute
the results just like the regular CNN filters.) Each filter gives outputs as
25channels:21scores for each class plus one boundary box.

Beginning, we describe the SSD that detects objects from a single layer.
Actually, it uses multiple layers(multi-scale feature maps the detecting objects
independently. As CNN reduces the spatial dimension gradually, the resolution
of the feature maps also decreases. SSD uses lower resolution layers for the
detect larger-scale objects. For example, the 4× 4 feature maps are used for the
larger-scale object

36
P.V.K.N GOVT COLLEGE , CHITTOOR
SSD adds 6 more auxiliary convolution layers to the image after VGG16.
Five of these layers will be added for object detection. In which three of those
layers, we make 6 predictions instead o f 4.In total, SSD makes 8732
predictions using 6 convolution layers.

Multi-scale feature maps enhance accuracy.The accuracy with different


number of featuremaplayersis used for object detection.

37
P.V.K.N GOVT COLLEGE , CHITTOOR
MANet:

Target detection is a fundamental challenging problem for long time and has
been a hotspot in the area of computer vision for many years. The purpose and
objective of target detection is, to determine if any instances of a specified
category of objects exist in an image. If there is an object to be detected in a
specific image, target detection returns the spatial positions and the spatial
extent of instances of the objects (based on the use of a bounding box, for
example). As one of the cornerstones of image understanding and computer
vision, target and object detection form the basis for complex and higher her-
level visual tasks, such as object tracking, image capture, instance
segmentation, and others. Target detection is also widely used in areas such as
artificial intelligence and information technology including machine vision,
automatic driving vehicles, and human-computer interaction. In recent times,
the method of automatic learning of represented features from data based on
deep learning has effectively improved performance of target detection. Neural
networks are the foundation of deep learning. Therefore, the design of better
neural networks has become a key is forward the improvement of target
detection algorithms and performance. Recently developed object detectors that
has been based on convolutional neural networks (CNN) have been classified in
two types: The first is two-stage detector type, such as Region-Based CNN (R–
CNN), Region-Based Full Convolutional Networks (R–FCN), and Feature
Pyramid Network (FPN), and the other is a single-stage detector, such as the
You Only Look Once (YOLO), Single-shot detector (SSD), and the RetinaNet.
The former type generates a series of candidate frames as samples of data, and
the classifies the samples based on a CNN; the latter type do not generate
candidate frames but directly converts the object frame positioning problem into
a regression processing problem.

To maintain real-time speeds without sacrificing precision in various object


detectors described above, Liu et al proposed the SSD which is faster than
YOLO and has comparable accuracy to that of the most advanced region- based
target detectors.SSD combines the regression idea of YOLO with the anchor
box mechanism of Faster R–CNN, predicts the object region based on the
feature maps of the different convolution layers, and outputs discretized
multiscale and multi proportional default box coordinates.

38
P.V.K.N GOVT COLLEGE , CHITTOOR
The convolution kernel predicts frame coordinates compensation of a series of
candidate frames and the confidence of each category. The local feature maps
of the multiscale area are used to obtain results for each position in the entire
image.
This maintains the fast characteristics the of YOLO algorithm and also ensures
that the frame positioning effect is similar to that induced by the FasterR– CNN.
However, SSD directly and independently uses two layers of the backbone
VGG16andfourextra layers obtained by convolution with stride 2 to construct
a feature pyramid but lacks strong contextual connection.

39
P.V.K.N GOVT COLLEGE , CHITTOOR
6.IMPLEMENTATION

40
P.V.K.N GOVT COLLEGE , CHITTOOR
6.1 SAMPLE CODE
UI.py:
from tkinter import *
from tkinter import ttk
from tkinter import filledialog
from tkinter import *
class Detection
def init (self, root):

root.title("Object Detection")

#window config
mainframe = ttk.Frame(root,
padding="3 3 12 12")
mainframe.grid(column=0,
row=0, sticky=(N, W, E, S))
root.columnconfigure(0,
weight=1) root.rowconfigure(0,
weight=1)
#Adding label to the window
label1=Label(mainframe,text = "OBJECTDETECTION
USING
CNN",anchor=CENTER,font=('Arial',20,'bold'),fg='red',pad
x=10).place(x = 90,y = 50)
label2=Label(mainframe,text="Mobile-Net is a CNN based networks
which is currently used in the Real-Time
Detection",font=('times',10,'bold','italic'),fg='red',padx=30,pady=30)
#Adding button for live
detection
live_det=Button(mainframe,te
xt="Detect Using Live
Feed",command=self.live_detection,padx=10
,pady=5,borderwidth=0)
live_det.configure(background='red',
foreground='white',activebackground='#89654
5',font=('arial',10,'bold'))
live_det.place(x=20, y=60)

#Adding button for detection of an uploaded video

41
P.V.K.N GOVT COLLEGE , CHITTOOR
vid_det=Button(mainframe,text="Detect Using a
Video",command=self.vid_detection,padx=10,pady=5,borderwidth=0)
vid_det.configure(background='red',
foreground='white',activebackground='#896545',font=('arial',10,'bold'))
vid_det.place(relx=0.5, rely=0.5, anchor=CENTER)

for child in mainframe.winfo_children(): child.grid_configure(padx=5, pady=15)


root.bind("<Return>", self.live_detection) def live_detection(self, *args):
d = Test() d.od()

def vid_detection(self,*args):
filename = filedialog.askopenfilename(filetypes=(("mp4 file", ".mp4"),("All
files", ".mp4"))) d=Test()
d.od(filename)

app = Tk() app.minsize(600,450) app.resizable(0,0) Detection(app)


app.mainloop()

Test.py

42
P.V.K.N GOVT COLLEGE , CHITTOOR
import numpy as np import os
import six.moves.urllib as urllib import sys
import tarfile
import tensorflow as tf import zipfile
from collections import defaultdict from io import StringIO
from PIL import Image import cv2

from object_detection.utils import label_map_util


from object_detection.utils import visualization_utils as vis_util

class Test: #Video Source


def od(self,path=0):
cap = cv2.VideoCapture(path) sys.path.append("..")

#Model Path

43
P.V.K.N GOVT COLLEGE , CHITTOOR
MODEL_FILE='C:\\Users\\DELL\\Desktop\\Project-
CNN\\ssd_mobilenet_v1_coco_11_06_2017.tar.gz'

#Model CheckPoint Path


PATH_TO_CKPT=
'ssd_mobilenet_v1_coco_11_06_2017'+'/frozen_inference_graph.pb'

#Path to Labelled Images


PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

#Defining no of classes the model can define NUM_CLASSES = 90

#Extracting the inference graph tar_file = tarfile.open(MODEL_FILE)

#Extracting the frozen graph which has all the weights of trained objects for file
in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name: tar_file.extract(file, os.getcwd())

#Graph to contain the computational data flow detection_graph = tf.Graph()


with detection_graph.as_default():

od_graph_def = tf.compat.v1.GraphDef()

44
P.V.K.N GOVT COLLEGE , CHITTOOR
with tf.compat.v2.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph
= fid.read() od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')

#Loading LabelMaps
label_map=label_map_util.load_labelmap(os.path.join('C:\\Users\\DELL\\Deskt
op\\Project-
CNN\\models\\research\\object_detection\\data', 'mscoco_label_map.pbtxt'))
#Defining Classes

categories=label_map_util.convert_label_map_to_categories(label_map,max_n
um_classes=NUM_CLASS ES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS=[os.path.join(PATH_TO_TEST_IMAGES_DIR,
'image{}.jpg'.format(i)) for i in range(1, 3) ]

IMAGE_SIZE = (12, 8)

with detection_graph.as_default():

#Executing the ops in the graph using session

45
P.V.K.N GOVT COLLEGE , CHITTOOR
with tf.compat.v1.Session(graph=detection_graph) as sess: while True:
#returns status and image if successfully read ret, image_np = cap.read()
#expanding the image_array
image_np_expanded = np.expand_dims(image_np, axis=0)

image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') boxes =


detection_graph.get_tensor_by_name('detection_boxes:0') #getting the accuracy
% from the model
scores = detection_graph.get_tensor_by_name('detection_scores:0') #detecting
the classes as mapped from the label_map
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
#presentation of the detected objects along with the label and bounded boxes
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections], feed_dict={image_tensor:
image_np_expanded})
vis_util.visualize_boxes_and_labels_on_image_array( image_np,
np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores),
category_index, use_normalized_coordinates=True, line_thickness=8)

cv2.imshow('object detection', cv2.resize(image_np, (800,600))) if


cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows() break

46
P.V.K.N GOVT COLLEGE , CHITTOOR
6.2OUTPUT SCREENS

GUI :
1. Launch window of the application:

2.User prompt for a video file upload:

47
P.V.K.N GOVT COLLEGE , CHITTOOR
48
P.V.K.N GOVT COLLEGE , CHITTOOR
3.Object Detection using an uploaded video file:

6.3TEST CASES

49
P.V.K.N GOVT COLLEGE , CHITTOOR
4. Live detection:

50
P.V.K.N GOVT COLLEGE , CHITTOOR
51
P.V.K.N GOVT COLLEGE , CHITTOOR
7. Conclusion

➢ This project is an efficient real-time deep learning-based framework to


automate the process of monitoring and detecting. It uses frozen inference
graph of the network, which has the calculated weights of different objects
classified.
➢ GUI has also been provided for an effective and easy use with certain labels
which are short descriptive and self explanatory.
➢ This object detection can run on either real time detection or detecting objects
from a video file.
➢ Each and every object that is detected is labelled with its appropriate name
along with the accuracy of the detection around the bounded boxes is
provided.

52
P.V.K.N GOVT COLLEGE , CHITTOOR
Future Enhancements
➢ This application is intended to be used in any working environment
accuracy and precision are highly desired to serve the purpose.
➢ As mentioned, the proposed model is able to detect around 90 objects.
As part of the future enhancements, the model will be custom trained
with the other objects to increase its detection capability.
➢ With the help of tranfer learning, the used network will be trained with
other objects to increase the scope of objects the MobileNet can detect.

53
P.V.K.N GOVT COLLEGE , CHITTOOR
BIBLIOGRAPHY

Website References:

[1] OpenCV API Reference available at

< https://docs.opencv.org/2.4/modules/gpu/doc/image_processing.html>
[2]TensorFlow Core v2.2.0

< https://www.tensorflow.org/api_docs/python/>
[3] NumPy v1.15 Manual

<https://numpy.org/doc/1.15/>
[4] tKinter
[5] Documentation
https://docs.python.
org/3/library/tk.htm
l
[6] TensorFlow Model
tensorflow/models/blob/master/research/object_detection/g
3doc/tf2_detection_zoo.md

Book References :
[1] Agarwal, S., Awan, A., and Roth, D. (2004). Learning to detect
objects in images via a sparse, part-based representation. IEEE Trans.
Pattern Anal. Mach. Intell. 26,1475–1490.
doi:10.1109/TPAMI.2004.10
[2] Alexe, B., Deselaers, T., and Ferrari, V. (2010). “What is an
object?,” in ComputerVision and Pattern Recognition (CVPR), 2010
IEEE Conference on (San Francisco,CA: IEEE), 73–80.
doi:10.1109/CVPR.2010.5540226

54
P.V.K.N GOVT COLLEGE , CHITTOOR
Appendix A
Abbreviations :-

CMD - command prompt

RAM - Random Access Memory

GB - Gigabyte

GPU - Graphics processing unit

UML - Unified Modelling Language

OpenCV - Open Source Computer Vision Library

55
P.V.K.N GOVT COLLEGE , CHITTOOR
Appendix

B Software Installation

Procedure 

Installing python : 

Step 1 :- To download setup file for python, go to any search engine and type
download python 3.6,it will lead you to the official website and click
download.
(Or)
You can go to the official website
with the following link
https://www.python.org/ftp/python/
3.6.4/python-3.6.4.exe

56
P.V.K.N GOVT COLLEGE , CHITTOOR
Step 2 :- Once downloaded, locate the setup file under the
name python-3.6.4.exe in the downloads folder and run it.
You will see something like:

Step 3 :- Click on Run, you will see something like:

57
P.V.K.N GOVT COLLEGE , CHITTOOR
Step 4 :- By default, the Add Python 3.6 to PATH option is unchecked, make
sure it is checked then click on Install Now. If the setup is successful, you
should see a window as below:

Step 5 :- Let’s check if python 3.6 is successfully installed now. Open the
command prompt and type “python” on it. If you haven’t closed the command

58
P.V.K.N GOVT COLLEGE , CHITTOOR
prompt from earlier, you will need to close and reopen it. You will see
something like:

Installing TensorFlow :
Step 1 :- open CMD from windows Start

Step 2 :- Now Type “pip install tensorflow”

59
P.V.K.N GOVT COLLEGE , CHITTOOR
Installing OpenCV :-
Step 1 :- open CMD from windows Start

Step 2 :- Now Type “pip install opencv-python”

60
P.V.K.N GOVT COLLEGE , CHITTOOR
Appendix C Software
Usage Procedure
Step 1:- Open the
CMD

61
P.V.K.N GOVT COLLEGE , CHITTOOR
Step 2 : Locate the where the program is saved and go to that location using cmd

Step 3 :- Now run the program by typing the command as : python “gui.py”

Step 4 :- New window will be open as follow :

62
P.V.K.N GOVT COLLEGE , CHITTOOR
Step 5: A real time camera can be used for the object detection or else a
video can also be uploaded to detect objects in the video.Uploading an
mp4 file can be done as follows:

63
P.V.K.N GOVT COLLEGE , CHITTOOR
64
P.V.K.N GOVT COLLEGE , CHITTOOR

You might also like