Face-Mask Detection Using Yolo V3 Architecture
Face-Mask Detection Using Yolo V3 Architecture
Face-Mask Detection Using Yolo V3 Architecture
On
Submitted by
Of
BACHELOR OF TECHNOLOGY
In
COMPUTER ENGINEERING
PREPARED BY
Nisarg Pethani (IU1641050045)
Harshal Vora (IU1641050063)
UNDER GUIDANCE OF
Internal Guide
Mr. Hiren Mer
Assistant Professor,
Department of Computer Engineering,
I.T.E, Indus University, Ahmedabad
SUBMITTED TO
INSTITUTE OF TECHNOLOGY AND ENGINEERING
INDUS UNIVERSITY CAMPUS, RANCHARDA, VIA-THALTEJ
AHMEDABAD-382115, GUJARAT, INDIA,
WEB: www.indusuni.ac.in
MAY 2020
CANDIDATE’S DECLARATION
I declare that final semester report entitled “Face-Mask Detection using YOLO V3
Architecture” is my own work conducted under the supervision of the guide Mr. Hiren
Mer.
I further declare that to the best of my knowledge, the report for B.Tech final semester
does not contain part of the work which has been submitted for the award of B.Tech
Degree either in this university or any other university without proper citation.
___________________________________
Candidate’s Signature
___________________________________
Guide: Mr. Hiren Mer
Assistant Professor
Department of Computer Engineering,
Indus Institute of Technology and Engineering
INDUS UNIVERSITY– Ahmedabad,
State: Gujarat
CANDIDATE’S DECLARATION
I declare that final semester report entitled “Face-Mask Detection using YOLO V3
Architecture” is my own work conducted under the supervision of the guide Mr. Hiren
Mer.
I further declare that to the best of my knowledge, the report for B.Tech final semester
does not contain part of the work which has been submitted for the award of B.Tech
Degree either in this university or any other university without proper citation.
___________________________________
Candidate’s Signature
___________________________________
Guide: Mr. Hiren Mer
Assistant Professor
Department of Computer Engineering,
Indus Institute of Technology and Engineering
INDUS UNIVERSITY– Ahmedabad,
State: Gujarat
INDUS INSTITUTE OF TECHNOLOGY AND ENGINEERING
COMPUTER ENGINEERING
2019 -2020
CERTIFICATE
This is to certify that the project work entitled “Face-Mask Detection using YOLO V3
Architecture” has been carried out by Nisarg Pethani, Harshal Vora under my
guidance in partial fulfillment of degree of Bachelor of Technology in COMPUTER
ENGINEERING (Final Year) of Indus University, Ahmedabad during the academic
year 2019 - 2020.
___________________________ ________________________________
Mr. Hiren Mer Dr. Seema Mahajan
Assistant Professor, Head of the Department,
Department of Computer Engineering, Department of Computer Engineering,
I.T.E, Indus University I.T.E, Indus University
Ahmedabad Ahmedabad
ACKNOWLEDGEMENT
Towards the successful completion of my B.Tech in Computer Engineering final year project,
we feel greatly obliged to certain Specials.
I am thankful and would like to express my gratitude to my internal guide Mr. Hiren Mer for
his conscientious guidance and diligently helping me in this endeavor. I am grateful to him for
providing precise milestones to be achieved for my final year project. Also, I extend my
gratitude all teachers who taught me throughout my Engineering and thank them for the
knowledge they imparted to me, also helping me in providing suggestions for existing features
of the project and how could they be improved. Finally, I give thanks to all those who
indirectly helped me or contributed towards the completion of my final year project.
- Nisarg Pethani
- Harshal Vora
IU/ITE/CE/2020/UDP-006
TABLE OF CONTENT
Title Page No
ABSTRACT................................................................................................... v
LIST OF FIGURES........................................................................................ vi
LIST OF TABLES......................................................................................... ix
ABBREVIATIONS........................................................................................ x
CHAPTER 1 INTRODUCTION................................................................... 1
1.1 Project Summary.......................................................................... 2
1.2 Project Purpose............................................................................. 2
1.3 Project Scope................................................................................ 3
1.4 Objectives........................................................................ 3
1.5 Technology and Literature Overview.......................................... 4
1.5.1 Python........................................................................... 4
1.5.2 PyTorch......................................................................... 5
1.5.3 PyCharm........................................................................ 5
1.5.4 LabelImg....................................................................... 5
1.5.5 DarkLabel...................................................................... 6
1.6 Synopsis....................................................................................... 6
CHAPTER 2 PROJECT MANAGEMENT................................................... 7
2.1 Project Planning Objectives......................................................... 8
2.1.1 Project Development approach..................................... 8
2.1.2 Resource........................................................................ 8
2.1.2.1 Human Resource............................................ 8
2.1.2.2 Environment Resource................................... 8
2.2 Project Scheduling....................................................................... 8
2.3 Timeline Chart............................................................................. 9
CHAPTER 3 SYSTEM REQUIREMENTS.................................................. 10
3.1 Hardware Requirement................................................................ 11
3.2 Software Requirement.................................................................. 11
3.3 Environment Setup....................................................................... 14
CHAPTER 4 NEURAL NETWORK............................................................ 15
4.1 AI vs ML vs DL........................................................................... 16
4.1.1 Artificial Intelligence.................................................... 16
ABSTARCT
Object Detection is one of the most emerging and widely studied fields of computer
vision systems. The goal of Object Detection is to find out objects of certain classes along
with its location in a given image and assign a respective class label. With the help of
deep learning, the usage and efficiency of object detection systems has increased
tremendously. Our project incorporates state-of-the-art techniques for object detection
that can also be used for real-time object detection.
A major inconvenience in many object detection mechanisms is the dependency on other
computer vision approaches before using deep learning which results in loss of
performance in the system. In this project we make use of deep learning to solve the
problem of object detection in an end-to-end manner. The network is trained on a self-
developed dataset. The resulting module is very fast and accurate and can also be used for
real-time object detection.
LIST OF FIGURES
Figure 4.1 AI vs ML vs DL 16
LIST OF TABLES
ABBREVIATION
Abbreviations used throughout this whole document for Survey Application are:
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
NLP Natural Language Processing
YOLO You Look Only Once
PIL Python Imaging Library
CNN Convolutional Neural Network
RCNN Recurrent Convolutional Neural Network
SSD Single Shot MultiBox Detector
IOU Intersection Over Union
mAP Mean Average Precision
NMS Non-Max Suppression
ReLU Rectified Linear Unit
LReLU Leaky Rectified Linear Unit
MSE Mean Square Error
BCE Binary Cross Entropy
FPS Frame Per Second
IO Input / Output
The most complicated problem in the project is to detect whether a person is wearing a
face mask or not and that involves classification and localization.
Image classification which involved predicting the class of an image.
And more complicated problem is image localization, where the image will have a
single object and the model should predict the class of the object as well as its
location and put a bounding box around the object.
Here, in our project the input to our model will be an image or a video (Mostly Real-
Time) and the output will be a bounding box corresponding to person face in the
image/video along with telling that that person has wear a face mask or not.
Face Mask detection is an important aspect in the Health care industry and it cannot be
taken lightly.
This project is to help identify face masks as an object in video surveillance cameras
across different places like hospitals, emergency departments, out-patient facilities,
residential care facilities, emergency medical services, and home health care delivery to
provide safety to doctors, patients and reduce the outbreak of disease. Where the
detection of Face Mask would be required to happen in Real-time as the necessary actions
in case of any disobedience will be taken on the spot.
Airports:
The Face Mask Detection System can be used at airports to detect travelers
without masks. Face data of travelers can be captured in the system at the
entrance. If a traveler is found to be without a face mask, their picture is sent to
the airport authorities so that they could take quick action. If the person’s face is
already stored, like the face of an Airport worker, it can send the alert to the
worker’s phone directly
Hospitals:
Using Face Mask Detection System, Hospitals can monitor if their staff is wearing
masks during their shift or not. If any health worker is found without a mask, they
will receive a notification with a reminder to wear a mask. Also, if quarantine
people who are required to wear a mask, the system can keep an eye and detect if
the mask is present or not and send notification automatically or report to the
authorities.
Offices:
The Face Mask Detection System can be used at office premises to detect if
employees are maintaining safety standards at work. It monitors employees
without masks and sends them a reminder to wear a mask. The reports can be
downloaded or sent an email at the end of the day to capture people who are not
complying with the regulations or the requirements
1.4 OBJECTIVES
It is not feasible for a human to detect face mask in Real-time as there can be more than
hundreds of instances in a given frame, also it will be very time consuming and non-
efficient for a human to find a subject with or without the mask.
Because of this reason we have to make a powerful model that can overcome the problem
of Real-time detection and inefficiency of a human.
Also, the model should be capable to provide Face mask detection on Real-time
surveillance camera feed, any video, or a set of images.
Below subsections are intended to present the overview of the technologies that are used
in this project.
1.5.1 Python
Advantages Disadvantages
Vast libraries support Slow speed
Improved Productivity Not memory efficient
IOT opportunities Weak in Mobile computing
1.5.2 PyTorch
1.5.3 PyCharm
1.5.4 LabelImg
1.5.5 DarkLabel
1.6 SYNOPSIS
The project is developed at Rajkot and the time duration for completing the project is
from 15th January, 2020 to May 5th, 2020.
During the project development period, we have submitted a report and presentations to
the internal guide on regular intervals whenever required.
2.1.2 Resource
Project scheduling is one of the most important aspects of any project. Any project must
have a precise schedule before developing it.
When a project developer works on a scheduled project, it is more advantageous for
him/her to compare to an unscheduled project. It gives us a timeline for the motivation of
finishing a particular activity. Scheduling gives us an idea about project length, its cost,
its expected duration of completion and we can also find out the shortest way to complete
the project with the less overall cost of the project.
The project schedule describes dependency between activities. It states the estimated time
required to reach each milestone and allocation of people to activities.
The total amount of data that will process through this hardware is approximately 10GB.
Table 3.1 denotes the hardware required to process the project.
Requirement Specification
RAM 32 GB DDR4
CPU Intel Core i9 9th Gen 9900K
GPU Nvidia GeForce RTX 2080
Memory ~ 5 GB
CPU CORE Octa Core
We developed the whole project including technologies image processing and machine
learning completely in a python programming language Table 3.2 denotes the software
required to process the project.
Requirement Specification
Platform Python
IDE PyCharm
Technology Image and Video Processing, Deep
Learning
Libraries Torch, NumPy, PIL, tqdm, argparse, os,
Matplotlib, terminaltables, TorchVision,
TensorBoard, etc.
One of the advantages of python is its vast library support. We used various libraries of
python for this project. Table 3.3 shows the libraries We used during the project and the
description of the libraries.
Library Description
Torch Torch is an open-source machine learning library, a scientific
computing framework, and a scripting language based on the Lua
programming language. It provides a wide range of algorithms for
deep learning and uses the scripting language LuaJIT, and an
underlying C implementation.
The core package of Torch is torch. It provides a flexible N-
dimensional array or Tensor, which supports basic routines for
indexing, slicing, transposing, type-casting, resizing, sharing storage
and cloning. [1]
NumPy NumPy is the fundamental package for scientific computing with
Python. NumPy is a library for the Python programming language,
adding support for large, multi-dimensional arrays and matrices,
along with a large collection of high-level mathematical functions to
operate on these arrays. [2]
PIL Python Imaging Library (abbreviated as PIL) (in newer versions
known as Pillow) is a free and open-source additional library for the
Python programming language that adds support for opening,
manipulating, and saving many different image file formats. It is
available for Windows, Mac OS X and Linux. [3]
tqdm TQDM supports nested progress bars. If you have Keras fit and
predict loops within an outer TQDM loop, the nested loops will
display properly. TQDM supports Jupyter/IPython notebooks. [4]
argparse The argparse module makes it easy to write user-friendly command-
line interfaces. It parses the defined arguments from the sys.argv.
The argparse module also automatically generates help and usage
messages, and issues errors when users give the program invalid
arguments.
A parser is created with ArgumentParser and a new parameter is
added with add_argument(). Arguments can be optional, required,
or positional. [5]
Os The OS module in Python provides functions for interacting with
the operating system.
Matplotlib Matplotlib is a comprehensive library for creating static, animated,
and interactive visualizations in Python. It is a plotting library for
the Python programming language. [6]
terminaltables Easily draw tables in terminal/console applications from a list of
lists of strings.
Multi-line rows: add newlines to table cells and terminatables will
handle the rest.
Table titles: show a title embedded in the top border of the table.[7][8]
TorchVision The torchvision package consists of popular datasets, model
architectures, and common image transformations for computer
vision. Some of the popular packages that are present in
TorchVision are torchvision.datasets,torchvision.io,
torchvision.models, torchvision.ops, torchvision.transforms,
torchvision.utils , etc. [9]
TensorBoard TensorBoard provides the visualization and tooling needed for
machine learning experimentation:
Tracking and visualizing metrics such as loss and accuracy
Visualizing the model graph (ops and layers)
Viewing histograms of weights, biases, or other tensors as
they change over time
Projecting embeddings to a lower-dimensional space
Displaying images, text, and audio data
Profiling TensorFlow programs [10]
1. Download Anaconda3-2019.03-Windows-x86_64
4.1 AI VS ML VS DL
Fig 4.1 AI vs ML vs DL
Deep Learning works in a layered architecture and uses the artificial neural
network, a concept inspired by the biological neural network.
Deep Learning algorithms are trained to identify patterns and classify
various types of information to give the desired output when it receives an
input. [12]
Neural networks are multi-layer networks of neurons that will be used by people to
classify things and make predictions.
Firstly, the inputs are given to the perceptron, the basic Artificial neuron.
Then, the weights are multiplied with each input
Now, the obtained values are summed and then bias is added.
The activation function is applied now to get the output. Some of the popular
activation functions are sigmoid, hyperbolic tangent(tanh), rectifier (Relu) and
more.
At last the output is triggered as 0 or 1.
As artificial neurons are elementary units in an artificial neural network Fig 4.3 shows
artificial neural network where each round represents an artificial neuron.
Here,
The First Layer represents the Input Layer.
The Last Layer represents the output layer (i.e. the prediction).
In between All layers are Hidden Layers
Round Shows the Artificial Neuron Which is Described below
A Convolutional neural network (CNN) is a neural network that has one or more
convolutional layers and are used mainly for image processing, classification,
segmentation and also for other auto correlated data. The most common use for CNNs is
image classification. [14]
The role of convolutional neural network is to transform the images into a format that is
easier to process, without losing the features which are necessary for getting a good
prediction.
The above mentioned is important when our goal is to design an architecture that is not
only good at learning features but is also scalable to massive datasets. Fig 4.4 shows the
Convolutional Neural Network.
In the Fig. 4.5 the left section is similar to 5 × 5 × 1 matrix which is input
image.
In the Fig. 4.5 the right section is similar to 3 × 3 × 1 matrix which is
Kernel. It is represented here as K.
Kernel/Filter, K =
Here, the kernel will shift 9 times because Stride Length = 1, every time
performing a matrix multiplication operation between K and the portion P
of the image over which the kernel is hovering. The filter will keep on
moving to the right with some stride value until it parses the complete
width. Then it will move down to the left most beginning of the image
where it will again continue its journey to the end until the complete image
is traversed.
The function of the pooling layer is to reduce the spatial size of the
convolved feature. Because of this the computational power required to
process the data will decrease gradually through dimensionality reduction.
Also, it is useful for finding out the dominant features which are
independent of rotation and position thereby maintaining the process of
effectively training the model.
There have been many works in the field of object detection using computer vision
techniques which include sliding window algorithm and deformable part models, etc. But,
all of them lack the accuracy that is provided by the deep learning methods. There are two
main broad class methods:
The major concepts that are used in the above techniques is shown below:
In this method the bounding box is predicted using regression and the class
that is present within the bounding box will be predicted with the help of
classification. The example of this architecture is shown in the image
below in Fig. 4.7
In this method the region proposals are extracted with the help of some
other computer vision technique and then it will be resized to the fixed
input for the classification of the network which will then work as a
feature extractor. An SVM will then be trained to classify the object and
the background which will contain one SVM for each class. And a
bounding box regressor is also trained which will output corrections for
some proposal boxes. The idea of the above is shown in the image below.
This method is extremely effective but on the other hand it is also
computationally very expensive.
network, we will run another network over these feature maps to predict
the class scores and the bounding box offsets. The overview idea of the
above is shown in Fig. 4.10
The more important techniques that refer to this strategy are: SSD (uses
different activation maps for the prediction of classes and the bounding
boxes and YOLO (used in this project) which uses a single activation map
for predicting classes and bounding boxes. Here, we use multiple scales to
achieve a higher mAP (Mean Average Precision) by detecting objects that
vary in size with very high accuracy.
5.1 INTRODUCTION
There are currently 3 versions of the YOLO algorithm that are being used in practice.
Each version has its advantages and disadvantages. But YOLO v3 is right now the most
popular Real-time object detection algorithm being used around the globe.
The YOLO v3 (YOU LOOK ONLY ONCE) is one of the faster algorithms currently
being used worldwide. Even though it is not the most accurate algorithm out there, but it
is a very good choice when there is a need for real-time object detection without loss of
too much accuracy.
YOLO v3 consists of 53 layers while YOLO v2 consists of only 19 layers due to which
the performance and accuracy of YOLO v3 is very much higher than YOLO v2, but
because of additional layers, YOLO v3 is slightly slower than YOLO v2. But in terms of
accuracy YOLO v3 is much better than YOLO v2.
Here, we have used the standard YOLO v3 algorithm with a change in the Non-Max
suppression process.
5.2.1 IOU
The bounding box is a rectangle that is drawn in such a way that it covers
the entire object and fits it perfectly. There exists a bounding box for every
instance of the object in the image. And for the box, 4 numbers are
predicted which are as follows:
5.2.3 mAP
5.2.3.1 Recall
Recall is the ratio of true positive (true predictions) and the total of
ground truth positives (total number of cars) [16]
How many relevant items are selected?
Recall =
5.2.3.2 Precision
precision is the ratio of true positive (true predictions) (TP) and the
total number of predicted positives (total predictions) [16]
How many selected items are relevant?
Precision =
5.2.3.3 mAP
5.2.4 Threshold
(5.2.1)
ReLU is the most widely used activation function for deep learning
applications with the most accurate results. It is faster compared to
many other Activation Functions. ReLU represents a nearly Linear
function and hence it preserves the properties of the linear function
that made it easy to optimize with gradient descent methods. The
ReLU activation function performs a threshold operation to each
input element where values less than zero are set to zero. [18]
the ReLU is given by Formula 5.2.2
(5.2.2)
The leaky ReLU, was introduced to sustain and keep the weights
updates alive during the entire propagation process. A parameter
named alpha was introduced as a solution to ReLU’s dead neuron
problem so that the gradients will not be zero at any time during
training.
LReLU computes the gradient with a very small constant value for
the negative gradient with a very small constant value for the
negative gradient alpha in the range of 0.01 thus LReLU is
computed as:
(5.2.3)
There are many different types of Loss Functions but the ones that are
used here are
Mean Square Error/Quadratic Loss/ L2 Loss
Binary Cross Entropy
(5.2.4)
(5.2.5)
5.3 ARCHITECTURE
The network that is used in this project is based on YOLO V3. And the architecture is
shown in Fig 5.7
YOLO v3 works on Darknet-53. This means it has 53 layers in its network which are
trained on the Imagenet. And for the task of detection another 53 layers are added into the
layer making a total of 106 layer fully convolutional underlying architecture of YOLO v3.
The newer architecture of YOLO consists of residual skip connections and upsampling. It
makes detection at 3 different scales.
YOLO is a fully convolutional network and the output is eventually obtained by applying
a 1×1 kernel on the feature map. In YOLO v3, detection occurs by applying 1×1
detection kernel on the feature maps of different 3 sizes at three different places in the
network.
There are in total 5 types of layers that are used as building block of YOLO v3 algorithm.
They are explained below.
Where,
B is the number of bounding boxes that can be predicted by a
single cell.
The number “5” is for the 4 bounding box attributes and one object
confidence
C will determine the number of classes. [21]
kernel will have identical height and width of the previous feature map and
will have the detection attributes along the depth as described above.
A shortcut layer is a skip connection similar to the one that is used in the
Resnet.
The Output of the shortcut layer is obtained by adding feature maps from
the previous layer and the from parameter that is defined (in the
configuration file) backward from the shortcut layer.
The working of the Upsample layer is pretty simple. It the Upsamples the
feature map in the previous layer by a factor of stride using bilinear
upsampling.
The need for upsampling is because as we go deeper into the network the
size of the image keeps on decreasing and upsampling helps to get the
image size bigger so that it can be added to other layers.
YOLO layer corresponds highly to the detection layer that was discussed
before. The anchors in the YOLO layer describe 9 anchors but only the
anchors which are indexed by the attributes of the mask tag are used.
Table 5.1 shows the main difference between the stand YOLO approach and the self-
modified YOLO approach which we have use in this project.
There are many object detection definitions in which there is a chance that an
object is present inside the bounding box of another object and so according to the
standard yolo, inner and outer both objects could be detected because of non-max
suppression being applied with respect to the object class label.
But in face mask detection the main object is face and in a real-life situation, it is
impossible that a face of one person is inside of another person's face' bounding
box and that is why there is no harm doing non-max suppression irrespective of
the object class label.
5.5 APPROACH
Here, we will discuss the working of the YOLO algorithm and how the algorithm detects
the object in the image.
From which for one bounding box there are 7 values used.
So,
13 x 13 x 3 = 507
26 x 26 x 3 = 2028
52 x 52 x 3 = 8112
Here,
bx, by are the x, y center co-ordinates,
bw, bh are width and height of our prediction.
tx, ty, tw, th is what the network outputs.
cx and cy are the top-left co-ordinates of the grid.
pw and ph are anchors dimensions for the box. [22]
Only one bounding box prior is assigned for each ground truth
object.
5.5.2 Thresholding
Even after such a thresholding, we end up with many boxes for each object
detected. But we only need one box. This bounding box is calculated using
Non-max suppression.
Non-max suppression makes use of a concept called “intersection over
union” or IoU. It takes as input two boxes, and as the name implies,
calculates the ratio of the intersection and union of the two boxes.
This type of filtering makes sure that only one bounding box is returned
per object detected.
In the process of non max suppression we have neglected the class label.
To assign the class label we will check if any one of the bounding boxes
have class label as MASK while merging.
If YES:
Final merged bounding box will be labeled as MASK.
Otherwise:
It shows that all the bounding boxes which are merged is with
NO_MASK label
Which results in final merged bounding box being labeled as
NO_MASK.
6.1 DATASET
For the purpose of this project the dataset which has two classes (MASK and
NO_MASK) was obtained in the following manner.
No_Mask dataset:
The No-Mask dataset was the WIDER FACE: A Face Detection Benchmark that
provided us with approximately 14000 No_Mask images.
6.1.1.1 LabelImg
6.1.1.2 DarkLabel
The name of the image and the corresponding text file has to be same.
Data has been shuffled and divided two parts: 80% and 20% for Training
and Validation purpose respectively.
For visualization purpose Bounding boxes are shown in Fig 6.1 and
its respected .txt file containing values of Bounding Box and Label
is shown in Fig 6.2
6.2.1.1 Description
6.2.1.2 Parsing
Fig 6.x shows how the yolov3.cfg file is being parsed and get the
YOLO architecture information into module_def list.
Where each list element contains each layer information as a
Dictionary.
Fig 6.10 shows the code explanation that how YOLO Architecture is made
from module_def list.
Fig 6.x shows how the YOLO Architecture information which is stored in
the form of ModuleList
Where ModuleList contains the List of Modules
Here, These Modules are Layer of YOLO Architecture
6.3 TRAINING
6.4 DETECTION
Frame Per Second (FPS) & Real Time is shown while Real-Time Face-
Mask Detection that is shown in Fig 5.24 & 5.25
Black box testing treats the system as a ‘black-box’, so it does not explicitly use
Knowledge of the internal structure or code. Or in other words the Test engineer need not
know the internal working of the “Black box” or application. Main focus in black box
testing is on functionality of the system as a whole. The term ‘behavioral testing’ is also
used for black box testing and white box testing is also sometimes called ’structural
testing’. Behavioral test design is slightly different from black-box test design because the
use of internal knowledge isn’t strictly forbidden, but it’s still discouraged.
Each testing method has its own advantages and disadvantages. There are some bugs that
cannot be found using only black box or only white box. Majority of the application are
tested by black box testing method. We need to cover majority of test cases so that most
of the bugs will get discovered by black box testing. Black box testing occurs throughout
the software development and testing life cycle i.e. in Unit, Integration, System,
Acceptance and regression testing stages.
White box Testing is also called Structural or Glass box testing. White box testing
involves looking at the structure of the code. When you know the internal structure of a
product, tests can be conducted to ensure that the internal operations performed according
to the specification. And all internal components have been adequately exercised.
We divided the strategy to test the project using the above-mentioned plans into small
tasks.
From the two methods of testing, namely Black Box Testing and White Box Testing, we
are going to use:
White Box Testing for,
Unit Testing
Module Testing
Sub-System Testing.
and Black Box Testing for,
System Testing
Acceptance Testing.
As mentioned in our project scheduling and planning, there are a total of four test cases in
the training phases.
Detecting face mask in the image which had the following characteristics:
1. Side face
2. vertically front half face
3. Subject wearing Cap or Spectacles or Googles
4. Detecting masks made of handkerchief or other types of fancy or designer
masks.
Solution:
1. Gathered training images which had above-defined characteristics.
System Utilization and Optimization
1. Reducing training time
2. Increasing GPU utilization
3. Maintaining high GPU memory usage
4. Maintaining high FPS rate of 25 – 30.
Solution:
1. Enhanced the code to perform most of the numerical calculations in the GPU
to get maximum advantage of parallel processing.
2. Tried to avoid writing unnecessary code that connected to I/O peripheral
which showed current status.
Another big challenge was to maintain the input and output clarity of the image.
To overcome this challenge, we normalized the Bounding Boxes to [0,1] and
expanded the bounding boxes according to the output image size.
8.1 LIMITATIONS
The first step towards future enhancement would be to improve accuracy while
detecting not commonly used masks and fancy masks.
Overcome the limitation of horizontal and inverted face detection as well as the
inefficiency in detecting half-worn masks.
We could design a software/application which will provide various alerts (SMS or
Email or Notification) when software detects a face without mask.
9.1 CONCLUSION
An accurate and efficient Face-Mask detection system has been developed which
achieves astounding results. This project uses recent techniques in the field of computer
vision and deep learning. Custom dataset was created using labelImg and DarkLabel. This
can be used in real-time Face-Mask detection applications which can be used in Airports,
Hospitals, Offices, etc.
BIBLIOGRAPHY
REFERENCES
1. https://en.wikipedia.org/wiki/Torch_(machine_learning)
2. https://numpy.org/
3. https://en.wikipedia.org/wiki/Python_Imaging_Library
4. https://pythonhosted.org/keras-tqdm/
5. https://docs.python.org/3/library/argparse.html
6. https://www.windowssearch-
exp.com/search?q=matplot+library&qpvt=matplot+library
7. https://github.com/Robpol86/terminaltables
8. https://robpol86.github.io/terminaltables/
9. https://github.com/pytorch/vision
10. https://www.tensorflow.org/tensorboard?hl=ru
11. https://academic.microsoft.com/topic/119857082
12. https://intellipaat.com/blog/tutorial/artificial-intelligence-tutorial/ai-vs-ml-vs-dl/
13. https://en.wikipedia.org/wiki/Artificial_neuron
14. https://towardsdatascience.com/an-introduction-to-convolutional-neural-networks-
eb0b60b58fd7
15. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-
networks-the-eli5-way-3bd2b1164a53
16. https://medium.com/@amrokamal_47691/yolo-yolov2-and-yolov3-all-you-want-
to-know-7e3e92dc4899
17. http://www.thresh.net/
18. https://deepai.org/publication/activation-functions-comparison-of-trends-in-
practice-and-research-for-deep-learning
19. https://towardsdatascience.com/calculating-loss-of-yolo-v3-layer-8878bfaaf1ff
20. https://towardsdatascience.com/understanding-different-loss-functions-for-neural-
networks-dd1ed0274718
21. https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b
22. https://towardsdatascience.com/review-yolov3-you-only-look-once-object-
detection-eab75d7a1ba6
23. https://pjreddie.com/darknet/yolo/
24. https://arxiv.org/pdf/1506.02640.pdf
25. https://arxiv.org/pdf/1612.08242.pdf
26. https://pjreddie.com/media/files/papers/YOLOv3.pdf
27. https://towardsdatascience.com/deep-learning-in-science-fd614bb3f3ce
28. https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset
29. http://shuoyang1213.me/WIDERFACE/
30. https://github.com/eriklindernoren/PyTorch-YOLOv3
31. https://medium.com/@jonathan_hui/real-time-object-detection-with-yolo-yolov2-
28b1b93e2088
32. https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
33. https://cs231n.github.io/convolutional-networks/
34. https://arxiv.org/pdf/1311.2524.pdf
35. https://towardsdatascience.com/setup-an-environment-for-machine-learning-and-
deep-learning-with-anaconda-in-windows-5d7134a3db10
COURSES