Black Book Final Word
Black Book Final Word
Black Book Final Word
OF
BACHELOR OF ENGINEERING
(COMPUTER ENGINEERING)
SUBMITTED BY
2023 -2024
Are bonafide student of this institute and the work has been carried out by
him/her under the supervision of Prof. Rohini Hanchateand it is approved for the
partial fulfillment of the requirement of Savitribai Phule Pune University, for the
award of the degree of Bachelor of Engineering (Computer Engineering).
Prof. Rohini Hanchate Prof. Pritam Ahire Dr. Saurabh SaojiDr. Vilas Deotare
Internal Guide Project Coordinator H.O.D Principle
Dept.of Computer Engg. Dept.of Computer Engg. Dept.of Computer Engg. NMIET Pune
External Examiner
Sign
ABSTRACT
TABLE OF CONTENTS
1 Introduction 12
1.1 Motivation 13
1.2 Problem Definition 14
2 Literature Survey 15
3.4 Methodology 24
3.4.1 Dataset 24
3.4.2 Preprocessing of Data 25
3.4.3 Neural Network using CNN 25
3.5 System Requirements 27
3.5.1 Requirements 27
3.5.2 Software Requirements 28
3.5.3 Hardware Requirements 29
3.6 Analysis Models: SDLC Model to be Applied 29
3.7 System Implementation Plan 33
4 System Design 38
4.1 System Architecture 38
4.2 Data Flow Diagrams 41
5 Other Specifications 52
5.1 Advantages 53
5.2 Applications 53
6 Implementation 55
6.1 GUI Design 55
6.2 Database 56
References 58
ABBREVIATION ILLUSTRATION
1 System Design 39
2 Data flow Diagram 42
3 ER Diagram 43
INTRODUCTION
Sign Language (SL) is the principal technique by which deaf and dumb individuals
communicate with one other and with their own community through hand and body
gestures.Its vocabulary, meaning, and grammar are all distinct from those of spoken
or written language. In order to express meaningful meanings, spoken language is
made up of articulate sounds that are mapped onto particular words and
grammatical combinations. Sign language is a visual language that communicates
meaning through hand and body gestures. for an estimated 7 million deaf people.
There aren't many sign language interpreters working now, therefore it would be
difficult to teach sign language to the deaf and dumb. The goal of sign language
recognition is to translate these hand motions into the appropriate spoken or written
language. These days, there is a lot of interest in Computer Vision and Deep Learning,
and it is possible to create several State of the Art (SOTA) models. These hand
motions can be classified, and matching text can be generated, with the use of Deep
Learning algorithms and Image Processing. An illustration of how the English letter
"A" is used in speech or writing.
Convolution neural networks, or CNNs, are the most widely used neural network
technique in deep learning and are frequently employed for image and video
tasks. We can use cutting-edge convolution neural network (CNN) designs, such as
LeNET-5 and MobileNetV2, to reach the State of the Art (SOTA). All of these
architectures can be used, and we can use neural network ensemble techniques to
integrate them. By doing this, we are able to create a model that can recognize
hand motions with nearly 100% accuracy. This approach will be implemented in
standalone apps, embedded devices, and web frameworks like Django, where
hand movements captured by a live camera will be translated into text. This
technology will facilitate easy communication for dumb and deaf people.
• Our main goal is to empower the communities of the deaf and hard of
hearing by giving them access to effective communication tools.
• Due to the subtleties and changes in hand positions, illumination, and
signing gestures, sign language detection is a challenging undertaking.
• Our goal is to close the communication gap between the hearing
community and the hearing-impaired by creating real-time sign
language recognition.
• Sign Language hand gestures to text/speech translation systems or
dialog systems which are used in specific public domains such as
airports, post offices, or hospitals.
• Sign Language Recognition (SLR) can help to translate the video to text
or speech enables inter-communication between normal and deaf
people.
• Additionally, we hope that our findings will stimulate more research in
the area of sign language recognition.
• We want to make a difference in creating a more inclusive society in a
world where technology is used more and more. Our research is to
develop solutions that allow the hard of hearing people to participate
in several spheres of life, such as social relationships, work, and
education.
Many gestures are used in sign language to give the impression that it
is movement language, which is made up of various hand and arm
motions. Different sign languages and hand gestures correspond to
different nations. It should be noted that certain terms that are not
well-known can be translated by merely making motions for each
letter in the word. Additionally, each letter in the English vocabulary
and every number from 0 to 9 has a specific gesture in sign language.
LITERATURE SURVEY
Paper-1: Indian Sign Language Recognition Using Neural Networks and KNN
Classifiers
• Publication Year: 8 August 2017
• Author: Madhuri Sharma, Ranjna Pal and Ashok Kumar Sahoo.
• Journal Name: ARPN Journal of Engineering and Applied Sciences.
• Summary: In this paper using KNN classifiers, The gesture recognition system is
capable of recognizing only numerical ISL static signs with 97.10% accuracy. The
experimental result shows that system can be used as a "working system" for Indian
Sign Language numerical recognition.
Paper-6: A Review Paper on Sign Language Recognition for The Deaf and Dumb
• Publication Year: 10 October 2021
• Author: R Rumana, Reddygari Sandhya Rani, Mrs. R. Prema.
• Journal Name: International Journal of Engineering Research & Technology (IJERT)
• Summary: In this report, a functional real time vision based American sign
language recognition for Deaf and Dumb people have been developed for asl
alphabets. We achieved final accuracy of 92.0% on our dataset. We are able to
improve our prediction after implementing two layers of algorithms in which we
verify and predict symbols which are more similar to each other. This way we are
able to detect almost all the symbols provided that they are shown properly, there is
no noise in the background and lighting is adequate.
CHAPTER 3
SOFTWARE REQUIREMENTS SPECIFICATION
3.1 Introduction:
• Millions of people with hearing loss communicate vitally and
expressively with one other through sign language worldwide. It
is an intricate visual language with a large vocabulary of
expressions and gestures that let people communicate with one
another and transmit meaning.
• Although sign language is an amazing form of communication,
both those who use it and those who try to comprehend and
interpret it face particular difficulties.
• The advancement of sign language recognition systems, namely
those utilizing Convolution Neural Networks (CNNs), has
become a crucial measure in improving communication
accessibility and inclusion for the deaf and hard of hearing
populations.
• This project's primary goals are to advance the fields of automatic sign
language recognition and voice or text translation. We are
concentrating on hand movements in static sign language for our
project.
• This work focused on using Deep Neural Networks (DNN) to recognize
hand motions that included 10 numerals (0-9) and 26 English alphabets
(A-Z).
• We developed a convolution neural network classifier that can identify
English numerals and alphabets from the hand motions.
• The neural network has been trained using a variety of setups and
designs, including LeNet-5, MobileNetV2, and our own design.
• To get the highest level of model accuracy possible, we employed the
horizontal voting ensemble technique.
• Additionally, we used the D-jango Rest Frameworks to build a web
application to test the outcomes of live camera.
3.2.1 Description:
Dataset
Since the datasets we identified were only available in RGB values, we were unable
to find any pre-existing raw image datasets that would meet our needs for project
implementation. As a result, we made the decision to produce our own dataset.
These are the actions we did: To create our dataset, we used the OpenCV (Open
Computer Vision) library. For training purposes, we first took about 800 pictures of
every American Sign Language (ASL) symbol, and for testing, we took about 200
pictures of each symbol. We started by taking a picture of every frame that our
computer's webcam showed. We identified a region of interest (ROI) denoted by a
blue square within each frame.[1]
Alphabet in ASL
The data consists of a set of photographs depicting the alphabet in American Sign
Language, arranged into 29 folders, each of which represents a different class. The
remaining 3 classes are DELETE, SPACE, and NOTHING. These three courses are highly
significant and practical for real-time applications.[10]
Pre-processing of Data
1. Read Images.
2. Reshape or resize images to the similar size.
3. Eliminate noise.
All pixel arrays of the images are scaled between a range of 0 and 255 by splitting
each image array by 255.
Within the topic of artificial intelligence, computer vision addresses image and video-
related issues. CNN can handle complicated tasks when paired with computer vision.
• Feature extraction and classification are the two main stages of the convolution
neural network.
• Number of pooling and convolutions procedures are carried out to extract the
features from the images.
•Resulting matrix size may get less as we apply more filters over it.
• (Size of old matrix – filter size) +1 is the size of the new matrix.
NMIET, Department of Computer Engineering 2023-2024
23
•The classifier in convolution neural networks will be the fully connected layer.
Convergence
Pooling
Flatters
There will be multiple dimensions in the final matrix that is produced. The data must
be flattened and converted into a 1-dimensional array in order to be entered into the
layer that comes after it. There is only one feature vector created when the
convolution layers are flattened.
Full Connection
Autocorrect Functionality:
The Hunspell suggest Python library is utilized to propose suitable corrections for
each wrongly spelled input word. It displays a list of words that resemble the
incorrect word, allowing the user to choose the most appropriate option to include
in their sentence. This feature aids in minimizing spelling errors and facilitates the
prediction of difficult words.[11]
For our process, initially we convert RGB images to grayscale and apply Gaussian blur
over to eliminate any superfluous noise or disturbance. We then use an adaptive
thresholding technique that separate the hand portion from our background and
reshape or size the images to pixels of 128 x 128. These preprocessed images are
after words feeded to our model for training and testing both, as described. We
measure performance using cross-entropy, which is a loss function that is minimized
when the predicted value matches the actual label. Our goal is to minimize this
function towards zero by optimizing the neural network's weights. TensorFlow offers
a built-in function to compute cross-entropy, and we further refine our model's
performance using gradient descent, specifically with an optimizer known as the
Adam Optimizer.[12]
Methodology:
Dataset
We have used multiple datasets and trained multiple models to achieve good
accuracy.
Alphabet in ASL
The data consists of a set of photographs depicting the alphabet in American Sign
Language, arranged into 29 folders, each of which represents a different class.
NMIET, Department of Computer Engineering 2023-2024
25
There are 87,000 200x200 pixel images in the training dataset. There are 29 classes
total; 26 of them contain the English alphabet from A to Z. The remaining 3 classes
are DELETE, SPACE, and NOTHING. These three courses are highly significant and
practical for real-time applications.
The 37 hand sign gestures in the dataset include the A-Z alphabet, 0–9 number
gestures, and a gesture for space, which indicates how deaf or hard of hearing
individuals convey space between two letters or words in communication. There are
37 gestures in all, and each gesture has 1500 50x50 pixel images, for a total of 55,500
images throughout all gestures. This dataset is ideal for training convolutional neural
networks (CNNs), which are used for gesture recognition and model training.
Pre-processing of Data
Convergence
Pooling
After the convolution operation, the pooling layer will be applied. The pooling
layer is used
to reduce the size of the image. There are two types of pooling:
1. Max Pooling
2. Average Pooling
Flatters
The resulting matrix that is obtained will have several dimensions. In order to
enter the layer into the following layer, the data must be flattened and made
NMIET, Department of Computer Engineering 2023-2024
27
into a 1-dimensional array. The convolution layers are flattened to produce a
single feature vector.
Full Connection
CHAPTER 2
LITERATURE SURVEY
3.5.1 Requirements:
Back-End : Python3.9
• Server:
• Operating System: 64-bit Windows 7 or above.
• RAM: 8 GB or higher.
• Client:
RAM:128 MB
HARD DISC: 10 GB
Phases:
Requirement gathering and analysis
Design the requirements
Construction/ iteration
Deployment
Testing
Feedback
At this stage you need to define the requirements. You should explain
the business opportunities and plan the time and effort required to
build the project. Based on this information, you can evaluate the
technical and economic feasibility.
2. Design requirements:
3. Construction iteration:
4. Development:
In this phase, the team will release the product for the user's work
environment.
5. Testing:
6. Feedback:
After releasing the product, the last step is feedback. In this step, the
team receives feedback about the product and processes the
feedback.
Coding:
Testing:
The main objective of the testing phase is to bring together all the
programs that the system contains to confirm that they work as
required. For the purposes of this project and to make testing more
successful, the testing phase can be divided into two parts. The first
phase of testing involves testing the entire system as a whole. A
report is then generated and errors found in the system are corrected
as necessary. Once this is done, the test will run again on the system
to see if any errors are still available on the system. Every part of the
system is tested separately and then all the junk is mixed together.
This is repeated until all errors are cleared from the system. The
testing phase usually takes a lot of time, but it is a very critical phase
in the implementation phase as it ensures that the system is working
Installation:
Once testing is complete, all test components used in the system are
removed from the server and the system is built completely from
scratch. At the end of the installation of any system component, a test
was run. This was to ensure that the system worked as required and if
an
Documentation:
Training:
Training should be done by users who are familiar with the system. It
is necessary that this involves the use of professionals from different
stages of the implementation phase. This will help to gain a good
understanding of the system. It makes more sense if the training is
done from within, especially from the staff who participated in the
implementation phase. The training is primarily aimed at equipping
system users with operational and troubleshooting information for
the newly designed system. The system can then be implemented, but
its operation must overlap with the operation of the old system. This
helps users become more familiar with the new system and various
files can be transferred to the new system.
To design a system for Multiple Disease prediction based on lab reports using
machine
learning, we can follow the following steps:
While they work well for data flow software and systems, they
are currently not well suited for visualization of interactive, real-
time, or database-oriented software and systems.
4.4.1LeNet-5
Class Diagram Create class structure and content elements using class drawing
class, package and object marked design. The class describes the approach to the
figure when constructing the method—idea, result, and outcome.Classes are
made up of three things: name, properties, and operations. Class diagrams also
display relationships such as inheritance, cohabitation, and so on.Relation is the
most common relation in a class diagram. Association refers to the relationship
between instances of classes.
Applications
Limitations
• Limited Dataset Diversity: Acquiring a diverse dataset representing various sign languages,
dialects, and gestures may be challenging. This limitation could affect the model's ability to
generalize across different sign language variations accurately.
• Complexity of Gesture Interpretation: Sign language gestures can be intricate and context-
dependent, leading to challenges in accurately interpreting their meaning. Variability in
hand movements, facial expressions, and body language adds complexity to the detection
process, potentially affecting the model's performance.
The field of sign language recognition is continually evolving, and there is still much
work to be done to further improve the technology and expand its
applications.Enhance the accuracy and robustness of recognition systems by
developing more sophisticated machine learning models, including deep learning
architectures.Develop models that can recognize sign language gestures from
different signers, accounting for individual variations in signing styles.Investigate sign
language generation technology, which can convert spoken or written language into
sign language gestures, further enhancing accessibility for Deaf individuals.Research
and implement advanced privacy and security mechanisms to protect the sensitive
data involved in sign language recognition, especially in applications that involve
personal or medical information.Continue collaborating with Deaf and Hard of
Hearing individuals in the development process to ensure that the technology meets
their specific needs and cultural considerations.Investigate the ethical implications of
sign language recognition technology, including issues related to consent, data
protection, and potential misuse.The future of sign language recognition holds great
promise for bridging communication gaps and making the world more inclusive. As
technology advances and more research is conducted, sign language recognition
systems will become more accurate, versatile, and integrated into various aspects of
daily life.