Nothing Special   »   [go: up one dir, main page]

Distracted Driver Detection Using Deep Learning Algorithms

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 37

Distracted driver detection using deep learning Algorithms

A DISSERTATION SUBMITTED TO COVENTRY UNIVERSITY FOR THE DEGREE


OF BACHELOR OF ENGINEERING IN THE FACULTY OF ENGINEERING,
ENVIRONMENT AND COMPUTING

By
Abstract

Distracted driving is one of the leading causes of traffic accidents today. The development of driver assistance
systems to detect the actions of drivers and assist them in driving safely has gained a lot of attention in recent years.
In these studies, although several different types of data are used, e.g., physical characteristics of the driver, audio
and visual aspects of the vehicle and car information. Automatically interpreting the driver's behavior is one of the
challenges of intelligent transportation systems. The study investigates driver posture recognition in a framework for
human action recognition. The number of road accidents has steadily grown over the past few years. An estimated
one motor vehicle crash out of every five is caused by a distracted operator, according to a study done by (NHTSA).
It is our intent to create a robust and accurate system for detecting distracted drivers and alerting them to the danger.
For better detection and prevention of distracted driving, a dashboard camera is being used to detect distracted
driving behaviors. Ten different distracted behaviors were detected using a two-stage system. There are many
distractions while driving including texting, talking on the mobile phone, drinking water, safe vehicle driving,
operating the dashboard radio, reaching the behind our bodies, adjusting hair and face makeup, and conversing with
other passengers. For this purpose we use CNN algorithms of deep learning. Our CNN-based system is not only
capable of detecting distracted drivers, it can also identify the reason for their distraction. Developing an accurate
system that can detect distracted driving and warn the driver against it is our goal. The performance of
Convolutional Neural Networks in computer vision has inspired us to present a CNN-based system which not only
detects distracted driving, but also identifies the cause. Various regularization techniques are incorporated to
increase performance of the VGG-16 architecture. In order to improve performance, the VGG-16 architecture has
been modified to address this particular challenge and various regularization techniques have been implemented.
This motivates the effectiveness of Convolutional Neural Networks in computer vision. Additionally, We use the
Vgg16, Vgg16 fine-tuned, Vgg16 modified as well as various regularization techniques to improve the performance.
We also used several activation functions (Leaky ReLU, DReLU, SELU) to analyze performance. On Kaggle, the
State Farm Distracted Driver Detection Challenge is used as an example of a method for classifying distracted
drivers. A deep learning library, Keras, is used to achieve this problem solution . Keras is built on TensorFlow.
Based on experimental results, our system outperforms the CNN scratch model, achieving 99% accuracy and using
previous methods in literature, our Vgg16 model achieves 94% accuracy and processes forty two images per second.
Additionally, we explore how dropout, L2 regularization, and batch normalization affect the performance.
According to our research, our VGG-16 version of our architecture operates at 99.75% to 98.5% classification
accuracy, and has fewer parameters than the original VGG-16, which had 140M parameters. A very good set of
results is produced by the proposed method. After applying CNN and VGG16 models, we find a good accuracy at
CNN model from scratch. Therefore, our final testing results incorporate the CNN Scratch model weight and these
prediction results are used for prediction of images and videos.

Keywords: Distracted driver, deep learning, Classification, CNN, VGG16, Activation function, Keras, prediction,
images and video.
Contents Table

PART I - INTRODUCTION------------------------------------------------------------------------------------------ 1

PART II - LITERATURE REVIEW--------------------------------------------------------------------------------- 4

PART III - RESEARCH METHODOLOGY----------------------------------------------------------------------- 7

PART IV - LITERATURE REVIEW-------------------------------------------------------------------------------- 4

PART V: CONCLUSION-------------------------------------------------------------------------------------------- 23
Figures List
Fig-1 Training and validation accuracy of CNN model from Scratch. 14

Fig-2 VGG16 Architecture 15

Fig-3 Training and validation accuracy model of VGG16. 16

Fig-4 Training and validation accuracy of VGG16 fine tuned. 17

Fig-5 VGG16 modified architecture. 18

Fig-6 Training and accuracy of VGG16 modified. 19

Fig-7 Confusion Matrix using CNN scratch model. 20

Tables List
Table-1 Summary of deep learning methods 9

Abbreviations
CNN Convolutional Neural Network
WHO The World Health Organization
NHTSA The-National-Highway-Traffic-Safety-Administration
CDC Centre for Disease Control
NCRB National Criminal Research Bureau
INS National Institute of Statistics
SVM Support Vector Machine
VGG Visual Geometric Group

Lists of Code
APPENDIX A IMPLEMENTATION CODE LINK 27

APPENDIX B IMPLEMENTATION CODE 28


PART I - INTRODUCTION

1. The Introduction

Recent years have seen an increase in the number of accidents caused by distracted driving. As traffic density
increases, car crashes are expected to increase as well. Approximately 1.25 million deaths each year are attributed to
car accidents in the world, the World Health Organization (WHO) reports a pair of recent studies (1, 2). Over a
million people die a year due to dangerous driving and more than 50 million serious injuries are caused as a result
(3, 4). Distracted driving killed 3477 people and injured 391,000 people in 2015, as reported by the National
Highway Traffic Safety Administration (NHTSA). (5) Most reported car accidents are caused by drivers texting or
speaking on the mobile phone while driving. In general, it is defined as any activity that redirects attention from
driving by the (NHTSA). The fact that driving distractions may be categorised into three distinct types does not
always ensure that they occur separately. In the case of talking on the phone, two types of distraction occur at
the same time: manual and cognitive. There are many sources that can lead to distraction. Although
distractions usually occur within a vehicle, they are often caused by outside factors. Many major
manufacturers, including Toyota, Nissan, Ford, and Mercedes-Benz, have introduced innovative infotainment,
control panels, and display technologies. . As you adjust those in-vehicle devices while driving, you may be
distracted from the road and cause an accident. Another source that can influence the driving performance is
phone use. Eat, drink, converse with passengers, talk on the phone, adjust the stereo, as well as driving while
drowsy or distracted, navigate, and watch television prepared by the National HighwaysTraffic Safety
Administration's(NHTSA) (5, 6)]. Driving and doing both of these activities can reduce brain activity by 37%. Even
more distractions occur when texting while driving because it keeps not only the driver's attention but also his/her
hands and eyes from focusing on the task at hand. In a recent study, 78% of surveyed drivers admitted to using cell
phones while behind the wheel, which has a negative impact on traffic accidents. Recently, research interests have
increased in a system that can classify distracted driving as a way to reduce vehicle accidents and improve
transportation safety. It is the goal of this study to develop such a distraction detection system that can be installed
in real cars. Our research was conducted on a custom-designed assisted driving test bed rather than actual road tests
due to the lack of hardware facilities and safety concerns associated with actual road tests. As defined by the(CDC),
distracted driving can be visual (not paying attention to the road looking at objects around), cognitive ( not mentally
focusing on it,looking at the road), and manually (Wheel of one hand is taking) (2). Governments and car
manufacturers are concerned about the number of car accidents caused by distracted driving and have partnered to
develop smart vehicles with distracted driver posture detectors to improve car safety. Furthermore, police officers as
well as radar cameras must be provided with distraction detection devices to identify lawbreakers.

In modern cars, a growing number of advanced driver assistance systems(ADAS) has been implemented, including
stability control, anti-lock brakes, traction controls, adaptive cruise controls and blind spot warning. Several safety
systems are developed to provide drivers and passengers with warnings about potential problems and ensure their
safety in the occasion of an accident. Even the most advanced autonomous vehicles are not fully self-driving or
autonomous, and the driver needs to be vigilant in emergency situations. Two automated driving levels are
considered autonomous, out of five in total. Almost all self-driving cars are level 2, meaning they require a human
driver to be on standby in case of an emergency and should not be distracted. Among the self-driving cab services
currently under development is Waymo, which is in level 4. In May 2016, Tesla's autopilot crashed at a whites
truck-trailer in Willistons, Florida, killing one of its passengers, and in March 2018, another Uber driver with self-
driving capability hit a pedestrian in Arizona, killing them. Clearly, both drivers in these fatal crashes could have
avoided accidents if they had paid attention, but instead they were distracted. As a result, distracted driving
detection becomes an absolutely necessary part of any cars and can take the forms of new ADAS systems.
Additional preventive measures require the detection of driver inattention. As long as the vehicle can detect these
distractions to the company headquarters for professionals or to the insurance company, accidents can be brought
down, routine can be recognized by professionals, and a further customized insurance scheme can be designed for
the vehicle or warnings can be transmitted to the driver. We are interested in distractions from the manual category
such as talks on the phone, drinking, reaching behind you to adjust your stereo, texting, entertainment, eating, and
dressing. Detecting driver distractions is the focus of this paper. Using different hyperparameters and activation
functions, we propose the CNN algorithms to overcome this specific problem. We have optimized distracted driving
performance using VGG16 algorithms with convolutional neural networks. A module is also proposed in order to
add attention, while maintaining good accuracy, without adding computational complexity and memory impact.

2. Background Context
This research specifically aims to distract the driver in order to avoid detection. Car crashes will further increase as
the number of vehicles increases with traffic density. World Health Organization (WHO) statistics rank traffic
accidents as eighth leading killers, killing 1.3 million people worldwide every year. In addition, 20-50 million suffer
physical or mental disabilities. The reports from the National_Criminal_Research_Bureau(NCRB) indicate the U.S.
It is estimated that there are more road deaths in India than anywhere else in the world. The number of road accident
deaths in India has been increasing since 2006. The number of traffic fatalities in 2015 stood at 1.46 lakh, and the
most familiar cause of this coincidence was driver error. Globally, more than 3700 people are killed on the roads
every day, described in a World_Health_Organization report. Highway circulation injuries are the main causes of
passing for people ages five to twenty nine. This sad statistic is amongst the most heartbreaking statements in the
report. Also, the reports show that the number of deaths is on the rise each year, and that driver distractions are the
primary culprits in these accidents. Inexperienced and young drivers use their mobile devices while driving, posing
a high risk of accident and death. When you use a cell phone while driving, your chances of causing an accident
increase by four times. The risk of a crash increases by 23 times when you text while driving. Furthermore, with a
telephone in hand, the reaction times of drivers are reduced by 50%. There were 1951 road accidents fatalities and
40211 injuries related to distracted driving, as reported by the National Institute of Statistics (INS)(7). Nine times
more Romanian fatalities were reported involving distracted drivers than in the USA. A machine learning model of
support vector machine selection is used to detect driver distraction in primary research. Only the front face of
the driver is used in the dataset. Cameras that capture images of the drivers' faces are placed at highways to
collect the dataset. The face and hand mean datasets in this dataset only feature two actions: drivers with cell
phones and drivers without cell phones. In the field of machine learning and computer vision, driver distraction
detection using image analysis has become a popular topic, and many algorithms and models are proposed and
analyzed. Image preprocessing techniques and classification model selection are the main topics of research. Among
the possible methods for preprocessing images, HaQ et al. suggest that flattening images with features rather than
directly extracting them might provide better forecast accuracy. An approach that is commonly used for classifying
models is convolutional neural networks (CNNs). Liu et al. combined static and dynamic features into a two-stream
convolutional network to detect distraction activities. They showed that a weighted ensemble of classifiers created
by a genetic algorithm produces a higher classification accuracy than an ensemble of convolutional neural networks.
It is found that CNN initial models generally have peak accuracy scores. The Support Vector Machine is another
important strategy. Neither linear SVM classifier nor nonlinear SVM classifier displays very high accuracy
according to Osman et al. When compared to CNN- based models, SVM model does not get the good accuracy, but
it is faster to learn, and the computing cost is lower. In addition to supervised learning, semi-supervised learning is
also an important method. Since many unlabeled images will be present in the datasets obtained in this problem,
using semi-supervised learning will augment the available data for training and may be able to increase prediction
accuracy. Using a semi-supervised model, Liu et al. showed that the semi-supervised learning methods will improve
the detection performance over traditional supervised methods, since the unlabeled data will improve the finding
ability. Aside from these three main methodologies, shallow neural networks with attribute extraction, K Nearest
Neighbor and random forest were also considered. For detection of cell phones, SVM is used. An author in
another study carried out the same activity that is talked on a cell phone at a chair and did extraordinary work
while doing so. Using adaboost hidden markov methods, we analyzed the RGB-D dataset. The model in
question has two limitations, the first being the lighting effect. The second is the distance between driver and
kinect device. Many of the earlier distraction data sets are not publicly available, since they are limited to only a
limited set of distractions. A distracted driving competition on Kaggle was defined by StateFarm as detecting ten
postures. The initial comprehensive list of distractions that was made publicly available included a wide variety of
distractions. In addition to using traditional hand extractors, researchers have proposed several approaches involving
classifiers such as SVM, BoW, and neural networks in addition to traditional hand extractors. Although CNNs are
the most effective technique to achieve high accuracy, they require a lot more computational time. According to the
rules of the contest, however, the data set may only be used for purposes related to the competition itself.
StateFarm's dataset for detecting driver distraction was similar to the dataset created by Abouelnaga et al. in 2017.
Five different convolutional neural networks were used in a weighted assembly to achieve the authors' solution. In
terms of classification accuracy, the system is good, but it is too complex to be used for immediate detection. Along
with reducing the number of parameters significantly, Baheti et al. were able to bring off accuracy 95.54 percent.

3. Research Aim and Objectives


The steps applied in this particular section of distracted driver detection are changing the radio station,eating,
calling on the mobile phone,talking to fellow passengers,and drinking that are causes of distraction. Accident
rates should be monitored and checked for reduction. This particular problem is being thoroughly researched
by a number of researchers. A major objective of this paper is to fix the driver-side technology. HOG and SIFT
description criteria are regarded as traditional hand-crafted features in this paper. With the application of a HOG
pillar and clustered SIFT pillar, HOG features are passed to an SVM pillar. On a dense grid of uniformly spaced
cells, HOG counts the occurrences of gradient orientations. The SIFT method identifies interesting features by using
the difference between Gaussian regressions. SIFT features capture surface gradients and can withstand modest
transformations to the geometry. By calculating SIFT features across a set of quantized buckets, regardless of
location in the image, the Bag of Words technique can evaluate the quality of images. A lot of smart cities hope to
make use of these techniques to detect and alert drivers to distracted driving in the future so as to prevent
accidents. This technique can also assist the law_enforcement authority in identifying and monitoring
situations involving distracted driving using radar or a camera. Let's assume that a cell phone can be detected
by measuring its sensor and detecting its sound using techniques found in a driver's phone. In addition, drivers'
movements can also be measured. Detection frames can be captured easily by the camera mounted in front of
the driver or above the roof of the car. The image is captured to detect distractions of the driver. Images are
transmitted to be classified so that driver actions can be detected. Drivers can be detected in several ways,
according to the researchers. There are many distractions when driving a vehicle, including using electronic
devices, eating, drinking, talking with the passenger, reading text messages on a cell phone, and looking at
advertisements or billboards. Develop a system that monitors distracted driving for a better way to observe
distracted driving. This system is integrated into the vehicle info system, according to the driver. A driver's
state must be identified in order to make it work properly. A major objective of this paper is to fix the driver-
side technology. A lot of smart cities hope to make use of these techniques to detect and alert drivers to
distracted driving in the future so as to prevent accidents. This technique can also assist the law_enforcement
authority in identifying and monitoring situations involving distracted driving using radar or a camera.
Distracted drivers are penalized when distracted driving is detected.

PART II - LITERATURE REVIEW

4. Literature Review
Here you will find a literature review on distracted driving detection that combines the relevant and significant
results. Driving safety is a critical issue in North America, Japan and Europe. Australian authorities have
recently recognized the dangers of driver distraction on the road. Additionally, this research focuses on vehicle
driver distraction, especially that associated with cell phones (14) which require devices to assist in detecting
driver distraction and maintaining attention to the road while driving. Driver performance is measured by this
device. For safety, the software identifies and solves distracted driving issues by identifying the driver's
performance issues. A computer vision algorithm is used by the software engineer to make this specific
device. Distracted drivers can be detected most effectively with these computer vision algorithms. Computer
vision algorithms detect distracted driving automatically and also alert them for safety purposes. To prevent
distracted driving accidents, we are introducing the device that is installed in the car. There are four different
categories of distraction of driver detection research in the last 7 years. Using the cellphone is one of the major
causes of manual distractions (10). A few researchers investigated cell phone use when driving in order to detect it.
The Hidden Conditional Random Fields model was used by Zhang et al. (11) to detect mobile phone usage from
above the dashboard using a camera. In essence, it improves facial features, mouth features, and hand features. The
Aggregate Channel Features (ACF) were used by Nikhil et al. (12) in 2015 for the purpose of hand detection in the
automotive environment. For detecting cell phone usage, Seshadri et al. (13) compiled their own dataset. A
classifiability accuracy of 93.9% was obtained using the Supervised Descent Method,AdaBoost classifier and the
Histogram of Gradients (HoG) in this study. Using this system, you would be able to view the picture in near real-
time (7.75 frames per second). For minimizing road accidents, distracted driving detection is essential.

This task requires the most attention from the driver. An estimated one-quarter of all traffic fatalities in the
U.S. are caused by distracted or inattentive drivers. The most common model is the Acer because of its
wireless communication, entertainment system, and driving assistance. Driving safety is a critical issue in
North America, Japan and Europe. Australian authorities have recently recognized the dangers of driver
distraction on the road. Additionally, this research focuses on vehicle driver distraction, especially that
associated with cell phones which require devices to assist in detecting driver distraction and maintaining
attention to the road while driving. Driver performance is measured by this device. For safety, the software
identifies and solves distracted driving issues by identifying the driver's performance issues. A computer
vision algorithm is used by the software engineer to make this specific device. Distracted drivers can be
detected most effectively with these computer vision algorithms. Computer vision algorithms detect distracted
driving automatically and also alert them for safety purposes. To prevent distracted driving accidents, we are
introducing the device that is installed in the car. There are four different categories of distraction of driver
detection research in the last 7 years. Based on the detection of mobile phone usage during driving, and using
support vector machines (SVMs) to determine the uses of mobile phones. In this section, we highlight relevant
and significant works on the detection of distracted drivers in the literature. The use of mobile phones is the primary
distraction for drivers, according to NHTSA. The same motivation has led some researchers to detect when a person
is using his or her mobile phone while driving. The Hidden Conditional Random Field model, based on the features
of hand, face, and mouth, was used by Zhang et al. in 2011 to detect the use of a mobile phone. An average
precision of 70.09% was achieved by Nikhil et al. using the Aggregate Channel Features (ACF) object detector in a
vehicle environment in 2015. The study conducted by Seshadri et al. used a Histogram of Gradients (HoG) method
and an AdaBoost classifier to detect mobile phone usage and achieved 93.9% accuracy in classifying the phone
users. The Faster-RCNN deep learning model was used by Le et al. to achieve 94.2% accuracy with the above
dataset. As a result, the system is slow and attempts to detect hands on the steering wheel with the use of face and
hand segmentation. The Laboratory of Intelligent and Safe Automobiles at University of California San Diego has
made significant contributions to this field, although it has addressed only three types of distraction: radio tuning,
operating gear and mirror adjustment. Researchers from Martin et al. used two Kinect cameras to recognize
activities in a vehicle using a vision-based analysis framework. Using three regions of the image - steering wheel,
gearbox, and dashboard - Ohn-bar et al. proposed a fusion of classifiers to determine real-time activity. As part of
the research, they presented a region-form classification approach to determine the presence of hands in certain
predetermined areas of an image, and incorporated the characteristics of the eyes into their research. However, the
focus was only on three types of distracting factors. Zhang et al. created a more comprehensive dataset based on the
position of the driver: driving safely, interacting with the shift lever, eating, and making calls. In their study, random
forest and contourlet transforms were used to achieve an accuracy of 90.5%. They also generated a system that,
using PHOGs and multiple layers of perceptrons, provides a 94.75% accuracy. A convolutional neural network
solution presented by Yan et al in 2016 achieved 99.78% classifying accuracy. There are other CNNs, based on
different datasets and presented in papers. In order to classify distracted drivers, there are two main groups. Fitbit
devices measure physiological information and biomedical signs such as brain activity, arterial and musculoskeletal
movements, and heart rate. A disadvantage of these methods is that they require the involvement of the user and
cost a lot of money. Cameras are used as a second method of classifying distracted drivers. The main focus of this
category is to monitor distracted driving behaviors using vision-based techniques such as head pose gaze detection,
fatigue cue extraction from the face of the driver, and body posture analysis (e.g., arm, foot, and hand postures).
According to most vision-based approaches, features are obtained from the raw data by hand-crafted methods and
classifiers are fitted based on the extracted features. The two-step architecture poses a problem in the trade-off
between robustness and the distinctive, quality-oriented characteristics of the classifier. Vision-based methods to
detect distracted drivers have dominated research over the last two decades and they rely on support vector
machines and decision trees. Recently, the deep learning model convolutional neural networks (CNNs) has been
seen as the dominant method to tackle distracted driving due to the great success of deep learning in computer
vision, natural language processing, and speech recognition. An approach based on deep CNNs was proposed by
Yan et al. for recognizing and detecting driving posture. The approach includes local neighborhood behaviour and
training filters that allow meaningful features to be automatically selected. By utilizing this behaviour, meaningful
features can be learned with minimal dominion knowledge, which could result in a more accurate model than those
with features that were handcrafted previously. To accelerate the training of the filters and to achieve faster
convergence and better generalization, the authors also pre-trained the fifilters with a sparse filter. They tested
several activation function and pooling behaviours and found it was the rectifier linear unit (ReLU) and the max
pooling technique that produced the best activation function. Taking a cell phone call while driving, using the cell
phone while driving, drinking, reaching behind, eating, adjusting the radio, and fiddling with hair and makeup are
all cases of manual driver distractions aboulenaga et al. identified. By adapting two well-known CNN architectures,
AlexNet and Inception V3, to produce a genetically weighted ensemble, the authors propose a novel technique.

Table 1. Summary of deep learning methods currently available.


.

Its inputs include raw images, images of the face, images of the arms, images of the hands, and skin-parts images.
After that, we trained the model to recognize distraction behavior using the input images. For faster convergence,
the authors used transfer learning (i.e., the ImageNet model) to pretrain the model. A weighted sum of all networks'
outputs was then used to compute the final distribution. As well as an inherited algorithm, they developed an
approach for evaluating the weights. There are three different methods used in this study: state-of-the-art methods,
datasets used with each method, and overall accuracy. For this forecast, the dataset contains front face pictures of
the driver taken among a variety of angles. This allows you to accurately identify if the driver is distracted.
Our deep learning (computer vision) model can detect and recognize driver actions, such as looking at
something from the left or right, and make alerts to the driver. We adopt and benchmark a large number of
parameters in our deep learning model for a system that will run in real-time in the real world. Another author
designed a new dataset that detects distracted drivers within 2017 that is similar to StateFarm's. A weighted object
of five convolutional_neural_networks was used to group the skin,hand and face segments,to solve a problem. In
autonomous driving, it is imperative to have systems that can operate in real-time, which is difficult with the system
that achieved high classification accuracy.

5. Research Question
Distracted driving is a major cause of serious car accidents. In 2014 and 2018, there was a statistical increase in fatal
crashes and it has been cited as a contributing factor. Currently, there are strong concerns about it among the public
(8). Distraction activities are associated with varying risk of accidents, according to strong evidence (9). Therefore,
recognizing and categorizing distraction activities of drivers while driving by using images is important. Using
images and different types of machine learning algorithms, this project investigates the detection of driver
distraction activities. We are attempting to create an accurate model and also identify the drivers who are driving
right or distracted. Our dataset driver's image is provided as input to our model. there are following research
question we are discussed:

● Is it possible to study driver distraction in normal driving safely and rigorously? In lab or field studies, do
test tracks, simulators, or other methods provide valid results?
● How do driver distraction measures (dependent variables) differ from each other? Which safety outcomes are
resulting from those policies?
● Can distraction be measured by any technology (e.g. physiological monitoring), device (e.g. eye trackers), or
analytical technique (e.g. steering control inputs)?
● To understand the distracting effects of a particular distraction, or how it may affect the likelihood of a crash,
what are some models you can utilize?
● Are there tools that can support investigations into technology-related crashes?

6. Significance of this Studies


In the section, we explain the rest of the paper. Research or related literature review is discussed in the second
section. As part of section three, we discuss research methodology which contains introduction of research
methodology, practical research plan,implementation,research finding analysis,evaluation and discussion. In section
4 include conclusions which contain research conclusions and recommendations.
PART III – PROJECT TOOLS

7. Introduction to Research Methodology


In this particular section, we are discussing the research methodology of distracted driver detection. In the previous
section of our research we see the related techniques of machine and deep learning but these old techniques do not
provide the accuracy results according to requirement. Two convolutional networks are evaluated for classification
performance, each with a base architecture and activation functions that differ. In order to support a large number of
architectures, a familiar deep neural network as a root architecture was used, and the same parameters were used for
the activation settings and layer order variations. Then, we develop a reliable and accurate system for detecting
drivers who are distracted and warning them against it. In the area of computer vision,
convolutional_neural_networks have performed well, and we present a computer vision method for identifying and
detecting the reason for distraction. A single vector containing the clustered SIFT descriptors, extracted using HOG,
was created by concatenating BoWs and SIFT descriptors using BoWs. In order to compare deep CNN methods that
utilize comparable input image sizes, the images were first downscaled to 227 * 227. Extracting HOG features was
done using MATLAB's "extractHOGFeatures" function. It was investigated how different cell sizes, 8*8, 16*16,
24*24, and 32*32, affect optimization and performance. In addition to those parameters, all others were set to their
back out values. Extracting SIFT descriptors was performed using the VL-Feat library. By clustering over the k
means of images from the training set, BoW vocabularies were derived. For this classification problem, we
examined different word counts (500 to 2000) to identify the best BoW vocabulary. Using SVM classifiers, before
AlexNet, VGG, and ResNet are used, 8-weight layer architecture is used in AlexNet, and 19-weight layer
architecture is used in VGG. This notation indicates the number of convolutional filters: conv(filter size)-(number of
filters). Rows of layers across the table represent the shared layers between the models. Moreover, both models are
produced with output in the same dimensions due to their horizontally corresponding elements. The architectures for
AlexNet and VGG are eight layers; the residence time networks are 19 layers; and the residence time networks are
152 layers. This is a notation for convolutional layers: conv(filter size)-(number of filters). Rows of layers are
depicted across the table to denote shared layers between models. Moreover, both models are produced with output
in the same dimensions due to their horizontally corresponding blocks. Convolutional Deep Neural Networks are
basically a type of Artificial Neural Network (ANN) influenced by the visual cortex found in animals. Over the past
few years, CNNs have demonstrated impressive progress when it comes to various tasks, including image
classification, object prediction, action recollection, and natural language processing. As an example, CNN-based
systems with convolutional filters/layers include activation functions, pooling purpose, and fully connected layers
(FC). By building these layers one after another, a CNN is basically formed. A large amount of labeled data is
available and there is a lot of computing power to enable rapid advancements in CNNs; this has been happening
since 2012. In computer vision, several architectures have established benchmarks such as ResNet, ZFNet,
GoogleNet, VGGNet. To detect distracted drivers, Simonyan and Zisserman modified the VGG-16 architecture
proposed by Simonyan and Zisserman.We use several regularization techniques to optimize previous VGG-16
architectures and improve performance for this particular task. By using these models we are achieving the
maximum accuracy prediction results.

8. Practical Work Plan


In this part we are explaining the practical work plan of distracted driver detection. We are taking a dataset that was
part of a public challenge two years ago hosted by State Farm Insurance company on Kaggle (16). A total of 22400
training images and 79727 test images were included in these datasets. There were 640 x 480 pixels in the image.
Labels were attached to the training images. One of the ten classes was represented by each label. We apply a few
Deep CNN models trained on ImageNet, including VGG-16 fine tuned and modified. In some cases, solutions are
based on genetically weighted ensembles of convolutional neural networks. When performing data preprocessing,
one can often reduce overfitting by incorporating more information into a dataset through zooming, rotating, or
shearing. In a few cases, they applied only one model to the dataset and did not use ensembles. Dropout is a
regularization technique that reduces irregularities; others have not used it for reducing the overfitting. Additionally,
batch normalization, which normalizes an output of a previous activation layer, was missing in some
implementations. The inclusion of batch normalization speeds up learning and improves accuracy. Train our CNN
model for decreasing the loss and increasing the accuracy and then save weights. Then we test the result and save
weight for final prediction.

PART IV – PROJECT TEST

9. Implementation
The code in Appendix A

Dataset Exploration

Abouelnaga et al. created the dataset we analyze in this article. In the field of driver distraction classification, the
StateFarm dataset was one of the first available. A dataset of this type was made available on Kaggle in 2016. There
are ten classes in the dataset. We have provided you with images of drivers doing something in their cars (eating,
safe driving. talking on their phones, applying makeup, texting, reaching behind, etc.). The following are a few
sample images from the dataset of each class. Thirty one participants representing seven different countries
participated in the study. The study incorporates several variations in driving conditions and driver characteristics.
The lighting conditions drivers face vary, for example, from sunlight to shadows. In order to make true performance
comparisons, we keep to the matching data placement as in. Kaggle ( Distracted Driver Detection in state farm
competition) provided the data.

Predict the following 10 classes:

● c0: driving safe


● c1: talking at phone_right
● c2: talking on passenger
● c3: text_right
● c4: drinking
● c5: text_left
● c6: operating the radio
● c7: talking at phone_left
● c8: reaching behind
● c9: makeup and hair
Our method is evaluated only on the data in the training folder since the testing folder lacks labels. In the original
published dataset there are two folders, which include train and test data. There are 27608 images in the dataset.
There are 17462 training images that are divided into 10 classes and 10146 testing images. Another research data set
from the American University in Cairo (AUC) was also used to determine the strength and generalisation ability of
our approach.

There are 44 individuals in the AUC dataset, 29 men and 15 women, from seven nations: Morocco, Uganda,
USA,Germany, Egypt, Palestine and Canada. Individual times of the day, individual car, different driving conditions
and also different clothes were worn by the drivers in every video. This is done in order to make sure the same
driver does not appear both in the train and in the test set. This results from the fact that the images contain a high
degree of correlation. Based on a random selection of 150 images from each class, we compiled an initial validation
set consisting of 1500 images. High validity accuracy resulted from highly correlated images, which was a faulty
indication of quality. In order to make a sample set of images that do not include the same drivers as the testing set,
we had to select specific images to be a part of the validation set.

In this manner, the training set was split into training and validation portions, making sure that neither is related to
the other. A multiclass logarithmic loss was used for the StateFarm competition submission. We use categorical
cross-entropy in all evaluations and log loss in all models. In order to assess the efficiency of the model, we handed
down the log loss metric. Probability sets are predicted; each image is generated based on a set of targets assigned to
each picture. The images are colored and have 320x240 each that shown below:
texting_right

talking to passenger

Data preprocessing

Data is preprocessed before the model is built and the training process is executed. The following steps are taken
during preprocessing:

● initially taken the training images.


● The images are resized to square images, 224 x 224 pixels.
● We used all three channels during the training process since these were color images.
● We normalize each image and divide each pixel at 255.
● The value of 0.5 is subtracted from the mean to ensure it is zero.
Data point adjustment

As an implementation, neural networks use layers in their architecture, with the neurons embedded within each
layer. During the network building process, data is input to the layers, which are connected to one another and have
a set of weights. The weights of the network change as we train it, and so the model has learned the attributes of our
dataset. The same is true for convolutional neural networks, but they process images. ANNs inspired by the
mammalian visual cortex are Deep Convolutional Neural Networks (Deep CNNs). Convolutional networks are
composed of convolutional filters, pooling of data, non-linear activation layers, and fully-connected layers (FC) as
well as objective function loss. Computational photography, object recollection, natural language processing and
object perception have all been applied to neural networks. The ImageNet Large-Scale Visual Recognition
Challenge (ILSVRC) in 2012 was won by Krizhevsky and Hinton by 10%. In addition, they used a powerful
regularization scheme called dropout and a seven-layer of deep CNN architecture, which they called AlexNet.
AlexNet was taken advantage of because it leveraged the advantage of a simple, yet effective activation function
called Rectified Linear Unit (ReLU). The methodology they proposed was so impressive that all winners in the
prestigious ILSVRC competition have used CNNs since their win. In ILSVRC2014, ILSVRC2015, and
ILSVRC2012, AlexNet, VGG, and ResNet won. Improvements in regularization methods and building deeper
architectures have contributed to recent improvements in classification accuracy. A random cropping technique and
improved parameters tuning methods were employed by Zeiler & Fergus to improve classification results. While
Simoyan and Zisserman examined the effect of network depth, Szegedy et al. employed convolutional filters of
smaller sizes in order to increase accuracy while decreasing the number of parameters. Rectifier Linear Units
(ReLUs), which are nonlinear activation functions, have also aided in improving performance. When
backpropagation is done simultaneously with slope parameters and weight hyperparameters, he and his colleagues
used a parameterized version of ReLU. Despite the difficulty of learning of very deep networks, He et al. launch
skip relations in a residual learning framework to simplify the learning process. The CNN model receives images as
input. Initially, a standard CNN architecture was created and trained. For our modeling, we created four
convolutional layers as well as four pooling layers between them. The filters were increased from 64 to 512 in each
layer of the convolution. Prior to adding the fully connected feature, a dropout layer was used and a flattening layer.
There are two interconnected layers in a neural network. Here, softmax was activated on the last fully connected
layer which contained the number of nodes. In all other layers, Relu was used as an activation method. Xavier
initialization was used to initialize each layer. There are three layers: parameters, outputs, and hidden layers. In
addition to the Convolution layer, the Pool layer, the Rectified Linear Units layer, the Dropout layer and the Fully
Connected layer are hidden layers. All convolutional model layers are explain below:

● Input layer

Images are stored in the input layer as raw pixel values. Color images are selected for the 224*224 training size, and
images with a resolution of 640*480 pixels are decreased to reduce the training time.
● Convo 2D ayer

In the Conv layer, there are a number of small-sized learnable filters. A dot product is taken at each location with the
weights of the filter and the small area beneath the filter. These filters are moved across the entire image region. We
will use 12 filter for our project, so the output dimensions are 224 * 224 * 12.

● Pooling layer

Using the Pooling layers reduces the input volume's 2D dimension to avoid over-fitting or inefficient computation.
By applying a small filter to each depth slice, this is achieved. Filters for pooling can be of various types, including
max pooling, which selects the maximum value, average pooling, etc.

● ReLu layer

A non-linearity mechanism of the model was implemented by introducing activation functions to each element. The
activation function max(0,x) illustrates this.

● Dropout layer

The Dropout layer prevents overfitting by preventing the model from fitting too closely. Activation values are
randomly set to zero using this regularization method to remove some attributes detectors. A dropout layer of 0.5
has been added to our models.

● FC layer

This layer connects every neuron to each other's output from previous layers. Predictions for every class are given in

this layer. The FC layer in our project has ten neurons, so there are 10 classes.

Model training modiification

A simple CNN architecture was built and evaluated to get the initial results. The result was a decent loss. The public
score for the initial simple CNN architecture (the initial unoptimized model) was 2.67118. To further improve the
loss, transfer learning was applied to VGG16 along with 3 types of architectures for fully connected layers. Model
architecture1 showed good results and was further improved by using the following techniques:

● In order to account for overfitting, we added a dropout layer.


● Weights were initialized using Xavier initialization instead of random initialization.
● During pre-processing, 0.5 was subtracted from the mean to ensure a zero mean.
● The training was conducted with 400 epochs and 16 batches.
● VGG16 and Model Architecture were chosen to further enhance the loss metric, and then fine-tuning and
modifications were applied. We used the SGD optimiser with a very slow learning rate of 1e-4. SGD was
applied with a momentum of 0.9.

10. Critical Appraisal


For improving the model results, I applied the following four approaches one by one:

❖ CNN model from scratch

From scratch, we first built a CNN model. Four convolution blocks with twenty layers each comprised this model.
A layer of CNN is composed of 64,128,256,512 filters with a 3*3 filter size in each block. In the next layer, which is
followed by a MaxPooling layer using a 2*2 pool size, is a CNN layer. Afterwards, two dense layers and an integral
dropout layer with 0.5 dropout value are added to the final block. RMSPROP optimizer is used for compilation, with
a momentum of 0.9 and learning rate of 0.01. For each batch, the size is 40. An augmenting technique is used to
prevent overfitting of the images. The model runs at 25 epoch. The model produces the following output:

After these 25 epochs the training accuracy reached 99.35% and the validation accuracy reached 99.28%. Losses
calculated for both training and validation are 0.0302 and 0.0621, respectively. After this training and validation
accuracy graph are plotted below:
Figure# 01: Training and validation accuracy

Its training accuracy is not too much greater than its validation accuracy, which makes the model fit. After this we
find the accuracy metrics include recall, precision and f1 score by using predict and testing data. Here are the results
for accurate, precise, recall and f1 scores:

Accuracy: 0.992843

Precision: 0.992864
Recall: 0.992843

F1 score: 0.992843.

❖ VGG16 Architecture

For improvement of the performance of model accuracy we use transfer learning pretrained model VGG16. It is
used for feature extraction. A landmark CNN architecture is VGGNet - one of the most influential in literature.
Throughout the article, there is an emphasis on the importance of deep and simple networks. As shown in fig. 1, this
is the architecture.

Figure# 02: which depicts the original VGG-16 architecture with its fully connected layers of 4096 scalar
dimensions and 3x3 convolutions.

Both classification and localization were performed well. VGG utilizes three-by-three filters, a ReLU activation
function, two-by-two pooling with stride 2, and categorical cross-entropy loss functions in each of its thirteen
convolutional layers. The CNN's initial layers extract features and the fully connected layers classify the input
images into predefined categories. With the pre-trained ImageNet model weights, we initialize the network with all
layers and fine-tune them using the dataset. All images are resized to 224 by 224 and the per channel means of RGB
planes are subtracted from each pixel as a preprocessing step. Based on centered data clouds around each axis, this is
a geometric interpretation of the data. Feature extraction is performed by the first layer of CNN, and classification is
performed by the last layer, which is a softmax classifier. There are 1000 ImageNet object classes associated with
each of the 1000 output channels in the original model. Consequently, the last layer is removed and replaced with a
10-class softmax layer. Overfitting of model training is then accomplished by using the dropout function in fully
connected layers. During the training phase, random dropouts of neurons serve as an efficient method to reduce
overfitting (17). Interdependence among neurons is reduced. Linearly increasing dropout is used in a few
convolutional layers and fully connected layers. Then we compile and fit our pretrained VGG16 model with training
and validation dataset. The rmsprop optimizer is used for compiling with a momentum of 0.9 and learning rate of
0.01. We maintain a batch size of 16 batches. An augmented image prevents overfitting. The 400 epochs are run in
the model. Performance is evaluated here based on the cross entropy loss function. Our model produces the
following results:
After these 400 epochs the training accuracy reached 97.54% and the validation accuracy reached 93.70%. Losses in
training and validation are 0.0950 and 0.9370, respectively. Here the training and validation accuracy graph are
plotted below:

Figure# 03:Training and validation accuracy

After this we find the accuracy metrics include recall, precision, f1 score by using predicted and testing data.
Results of recall, precision, and f1 score are determined as follows:

Accuracy: 0.937017
Precision: 0.937256

Recall: 0.937017

F1 score: 0.937033

❖ VGG16 with fine_tuned

after the VGG16 architecture model that is used for feature extraction. Then we use the VGG16 fine tuning model to
improve the performance and increase the accuracy. Fine tuning of the VGG16 architecture with keeping the first 15
layers fixed. As my dataset is small and not similar to the VGG-16 sample, it would be wise to train some layers and
freeze others. Using the ImageNet dataset, this study assesses the effect of fine-tuning on the model accuracy for the
VGG-16 pre-trained model. We compile and fit our pretrained VGG16 fine tuning model. The SGD optimizer is
also used to compile with a momentum and learning rate of 0.9 and of 1e-4. We keep a batch size at 16. To prevent
overfitting, the images are augmented. The 25 epochs are running in our model. Therefore, the following results are
obtained:

After these 25 epochs the training accuracy reached 99.99% and the validation accuracy reached 99.06%. Losses
incurred during training and validation are 0.0042 and 0.0410, respectively. Here are the training and validation
accuracy graph are plotted below:
Figure# 04:Training and validation accuracy

After this we find the accuracy metrics include precision, f1 score and recall by using prediction as well as testing
data. Results for accuracy, recall, precision, f1 score are as follows:

Accuracy: 0.990553

Precision: 0.990580

Recall: 0.990553

F1 score: 0.990552
❖ VGG16 modified

However, VGG-16 suffers from one major disadvantage: it contains nearly 140M parameters. Fully connected
layers consume many of these parameters as well as being computationally expensive. As well, a network with fully
linked configuration can only be used for inputs with fixed sizes. We are replacing the fully connected layer with a
convolutional layer in the VGG16 architecture, which can respond to varying input sizes (18). The convolutional
network is fully generated by replacing dense layers with 1x1 convolutions. shows the modified network
architecture:

Figure 05: VGG16 Modified architecture

By replacing FC layers with convolutional layers, VGG-16 replaces its FC architecture. All Conv and FC layers are
subjected to batch normalization and L2 regularization with λ = 0.001. As the number of drops increased from the
3rd max-pooling layer to FC, the dropout rate linearly increased.

A reduction of 15M parameters to 11% of the original VGG-16 is implemented. In previous sections, the parameters
of regularization remained the same. We compile and fit our pretrained VGG16 modified model. The rmsprop
optimizer is used for compiling with a momentum of 0.9 and learning rate of 0.1. The batch size is set to 16. The
images are augmented to prevent overfitting. After 75 epochs, the following is the output of the model:

After these training 75 epochs the validation accuracy reached 98.57% and the train accuracy reached 98.08%.
Losses during training and validation are 0.1137 and 0.1520, respectively. Here the training and validation accuracy
graph are plotted below:
Figure#06: Training and validation accuracy

The VGG16 Modified model is the best model; its prediction results are accurate and fulfil all requirements. After
this we find the accuracy metrics include precision, f1 score and recall by using prediction at testing data. The
prediction result of accuracy, precision, f1 score and recall is as follows:

Accuracy: 0.985686

Precision: 0.985768

Recall: 0.985686
F1 score: 0.985695

PART V: CONCLUSION

11. Evaluation

In the evaluation and discussion portion we take final prediction results. Convolutional Neural Network-based
distracted driving detection system is designed from scratch. we are make the small convolutional neural network
that have four conv 2d blocks that have filter 64, 128, 256, 512 with filter size 3x3 and also add the max pooling
with size of 2x2 and also take two dense layer with dropout size of 0.5 that include in fully connected layers.

We compile the model at rmsprop optimizer with momentum size of 0.9 and learning rate of 0.1. we also use cross
entropy for improvement of performance. We fit our CNN scratch model with a train and validate the dataset that
has kept the batch size at 40. Enhanced images are used to prevent overfitting. For 25 epochs, the model is run.
Using this model, the following results are generated:

After these 25 epochs the training accuracy reached 99.35% and the validation accuracy reached 99.28%. Losses for
training and validation are 0.0302 and 0.0621, respectively.

After this training process we predict the classes and also get the weights of the CNN scratch model. Then we find
the model analysis that includes the confusion matrix. In the confusion matrix we take the class name in column and
also take the headmap and value of the confusion matrix in integer form of y axis and xaxis. Then we plot the true
label values that are stored in ylabel and predicted labels are stored in xlabel. then we save the fig in the model path
with png format. After that we print the headmap in which include y_test, y_predict and classes_names that are
display below:
Figure#07: Confusion Matrix using CNN scratch model

After finding the confusion matrix for model analysis we make the model prediction. We will predict the results of
the competition based on the best models generated from each of the following approaches:

● Self-designing CNN from scratch.


● By using the Vgg16 architecture as a base and tuning the last layer.

For model prediction results we take data from google drive by mounting the google drive and unzip the data of
google drive. Importing different libraries that are related to prediction. Get the predicted model by path and save it
in the base_model_path folder. create a csv file in which store test data in csv format and csv file data save in the
test_dir folder. Create predict dir in which store the test predicted data and all data saved in predict_dir folder. create
a pickle file in which store the test images pickles and all data saved in pickle_dir. Then create a Json file of test data
and save it in json_dir. After this we use the weights that are stored in a self trained folder of CNN scratch models.
Then load our base_ model that has our CNN model and save it in the model folder. After saving the weights in the
built model we read the csv of test_dir and store data in data_test. Testing on only 10000 images since loading all
images would require more than 8GB of RAM. we label the images in pickle format and also assign the label id.
After this we load the RGB images by using the PIL library. This type is converted to a 3D tensor with the shape
(224, 224, 3) followed by three conversions to a 4D tensor with shape (1, 224, 224, 3) and returned as a 4D tensor.
For tensor test images are taken in 255 shapes. After this we create a human readable and understandable class
dictionary with the name of class_name in which make ten sub classes and assign id of each class that is used for
test data that are in csv format label it according to his activities and store testing Json_dir. Creating the prediction
results for the images classification and shifting the predicted images to another folder with a renamed file name
containing the class name of the images predicted by the model then saving the predicted result into a csv file.

After this we make final predictions on images and videos. In image prediction we make a directory that reads the
images in different formats that are taken by using opencv tools, resize the images then predict the images according
to our predictions classes and show on the window then save the predicted results. The predicted result of images
according to our activity classes are follow:

Talking on phone right


Texting right

Safe driving
Text_left

Phone talking_left
Radio operating

Talking to passenger
Drinks

Behind reaching
Makeup and hair

In video prediction we make a directory by using opencv library in which we import any video that is related to the
distracted driver. We take video in a different format than resize the input video then predict this video according to
our distracted classes then predict the video shown on the window and also save the predicted results. The results of
the predicted video are given in our google drive. The link of google drive is mentioned in the appendix A portion.

12. Research conclusion


Road crashes are caused by driver distraction, a serious problem in the world. Self-driving cars detect distracted
drivers as an essential part of their operation. The ADAS systems must therefore implement a distracted driver
detection system. A challenging problem in ITS is automatically recognizing a driver's behavior. Distracted driving
has been a contributing factor to many car accidents in the past decade, even as smart phones have become
increasingly prevalent. In order to recognize the distracted driver's behavior, we investigate distracted driver
postures as a part of human action recognition. We aim to improve driver classification accuracy by improving
classification accuracy. In order to detect distracted drivers, as well as to identify the cause of the distracted driving,
our system uses a Convolutional Neural Network. Several regularization techniques are applied to the VGG-16
architecture to help prevent overfitting to the training data. This was accomplished using a modified version of
VGG-16 architecture, which applied several activation functions as well as an attention mechanism. A thinned
version of the proposed system performed better with 95.82% accuracy. Reduced classification error may be
achieved by using temporal context. It may also be useful to introduce new features such as eye position and head
orientation, which correlate to signals received from the car, to detect cognitive distractions and optical distractions
such as sleepiness. By detecting distracted drivers more accurately than previously published approaches, the
proposed system provides an accuracy of 98.57 percent. Two-dimensional images of distracted drivers taken from a
dashboard camera were used to create the datasets. Based on 16GB RAM of the NVIDIA P5000 GPU, the system
processes 42 images per second. Moreover, we propose an adapted version of VGG-16 with only 15M parameters,
which is still able to achieve satisfactory classification accuracy, albeit with a thinner model. Methodologically, the
proposed approach showed promise. A continuation of this work aims at reducing the number of parameters and
reducing calculation time. It is possible to reduce misclassification errors and thus increase accuracy by
incorporating temporal context. Our goal is to build an algorithm that will detect both manual and visual
distractions as well as cognitive distractions in the future.

13. Recommendation
Model results were predicted accurately based on distracted driver detection accuracy. We recommend different car
modification and ride companies like Uber, Cream to fit our dashboard camera in front of the car that can detect the
distracted driver activities and also alarm the driver by predicting the video that is taken by the dashboard camera. In
this way, it is greatly helpful for decreasing the rate of car accidents.
Reference:
1. Organization of word health: organization of world health management of the Substance Abuse Unit. Alcohol and Health: Global
Status Report, 2014. Organization for Health Care in the Global South, Geneva (2014).
2. Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification. Preprint
arXiv:1706.09498(2018).
3. Peden, M. World Report on Road Traffic Injury Prevention. Geneva, World Health Organization (2004).

4. Yan, C, Coenen, F, and Zhang, B.: Driving posture recognition by convolutional neural networks. IET Computing View.
(2016, 10(2), 103-114).
5. The National highway Traffic Safety Administration: 2015 motor vehicle crashes. Traffic Safety Facts Research Note, pp. I-
IX (2016).
6. Resalat, S.N., Saba, V.: A practical method for driver sleepiness detection by processing EEG signals stimulated by external
flickering light. Signal image video processing. The 9th, 1751-1757 (2015).
7. Institutul Naţional de Statistică (2017). Inmatriculate the vehicle if there has been a routine traffic accident.
[online]http://www.insse.ro/cms/sites/default/files/field/publicatii/
vehicule_inmatriculate_in_circulatie_si_accidente_circulatie_rutiera_2017.pdf.
8. D. G. Kidd, N. K. Chaudhary, "Changes in the sources of distracted driving among Northern Virginia drivers: A comparison of results
from two roadside observation surveys", Journal of Safety Research 2019, 69,131-138.
9. Mr. R. L. Olson, Richard J. Hanowski, Jeffrey S. Hickman, and John Bocanegra. Distracted driving in commercial vehicle operations.
Department of Transportation, Washington, DC, since September2009:https://www.fmcsa.dot.gov/sites/fmcsa.dot.gov/files/docs/
FMCSA-RRR-09-042.pdf.
10. National highway traffic safety administration traffic safety facts. Visit https://www.nhtsa.gov/risky-driving/distracted-driving.

11. X. F. Wang, N. Zheng, and Y. Zhang. He is. Based on hidden crf, visual identification of driver cell phone usage. A paper is published
in Proceedings of 2011 IEEE International Conference on Vehicular Electronics and Safety, pages 248–251, July 2011.
12. N. Das, E. Ohn-Bar, and E. Ohn-Bar. Trivedi, M. Performance evaluation of driver hand detection algorithms: Challenges, datasets,
and metrics. IEEE 18th International Conference on Intelligent Transportation Systems, pages 2953-2958, Sept 2015.
13. K. : Seshadri, Juefei-Xu, Pal, Savvides, and C. : Driver cell phone usage on strategic highway research program (shrp2) face view
videos by P. Thor. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 35-43, June 2015.
14. Briem, V., and Hedman, L.R. (1995). A behavioral study of simulated driving using mobile phones. 38, 2536-2562, in
Ergonomics.
15. Y. H. M. Aboulnaga. The two are Eraqi and M. N. Moustafa. Classification of distracted driver posture in real-time,
CRR,abs/1706.09498, 2017.
16. Kaggle. Here is a brief summary of the study: https://www.kaggle.co m/c/state-farm-distracted-driver-detection.

17. N. Srivastava, Hinton, Krizhevsky, I. R. and Sutskever. Salakhutdinov. Dropout: A simple method to prevent neural networks from
overfitting. J. Mach. Learn. Research, 15(1):1929-1958, January 2014.
18. Joel Long, Shelhamer, and T. Darrell. Convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 39(4):640–651, April 2017.
Appendices

Appendix A(Implementation Code Link)

Google Drive Link of Implementation code:

https://drive.google.com/file/d/1IKLSxhDMVAHRHb6xldjKSDzNDXOYpyty/view?usp=sharing

Appendix B

You might also like