Nothing Special   »   [go: up one dir, main page]

BT4344 PPT

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

SUPERVISED MACHINE LEARNING ALGORITHM

FOR HANDWRITING RECOGNITION

Team Members Detail: Guide details:


1. RISHAV - 19SCSE1010450 Mr. Pradeep Bedi
2. Akash Singh – 19SCSE1180010 Assistant professor
Problem Formulation

During covid times all universities and schools were conducting online examinations some were providing options to
student to type the answer, upload handwritten answer script or both. Since students were having a habit of writing the
answer on page so it was new for them to type on a computer, this was new for them, and many students were not
comfortable with this as typing on electronic device needs practice and exam there was time constraints, so it is observed
many students have opted for writing the answer and then uploading it method which is ok.

But the main problem that our software will tackle is that difficulties faced by teacher while checking the answer of the
students that they have uploaded as the in some cases answer sheet was not properly visibly due to scanning in improper
lighting and quality degradation due to restriction in the size of the file to be uploaded. To tackle these problems our
software will integrate with lms or whatever exam conducting software an institution have and provide appropriate text
for given handwriting.
INTRODUCTION

• Written text or symbols is one of the


common ways of communication. Writing
of symbols text etc. is coming from stone
age itself which was used to express views
or some meaningful information.
• In handwriting recognition machine
performs operations to automatically
recognize the characters and patterns
written by the user. It is the ability of
machine to recognise and analyse a person's
Fig 1:- Recognition of the scanned handwritten
handwriting.
document[1]
• Our handwriting recognition software will
be based on a supervised machine learning
technique called support vector machine.
Flow chart Diagram

Fig 2:- The flow chart of recognition using Support Vector Machine
Tools and Technologies Used

For the development of the system, we will be requiring a computer with, and IDE installed to
write the python code. For running of our python code, we will additionally be requiring a
python compiler installed in our system or as an extension with the IDE we will be using.

Fig 3:- Various IDE for writing python code[2]


Physical devices that will be required during the development phase are scanner for scanning
the handwritten samples connected and a physical storage device as these handwritten samples
can be in thousands and thus size of the scanner images can go in terabyte.
Architecture

Trainer Database

Input Device Shape Extractor Filter Recognizer

Result

Fig 4:- Architecture diagram for the proposed system


Architecture

• Input device: Scanner, stylus or mouse can be used as input device.


• Extractor: This part of software converts curves in handwriting into right
angle and line thus represents alphabets on letter using the basic shape
and described earlier.
• Filter: This part of the software filter out vestigial shapes generated by
extractor
• Recognizer: It matches the sequence of basic shape supplied by the
converter with the help of database and display the result.
• Trainer: It trains a system for various writings, ways and writing speed (In
case of stylus input).
• Result: The output text place at appropriate place
Proposed Design

Database :-
The IAM Handwriting Database contains forms of handwritten English text which can be used to train
and test handwritten text recognizers and to perform writer identification and verification experiments.
The database was first published in [4] at the ICDAR 1999. Using this database an HMM based
recognition system for handwritten sentences was developed and published in [5] at the ICPR 2000.

Fig 5:- Image of word (taken from IAM) and its transcription into digital text[6]

The segmentation scheme used in the second version of the database is documented in [7] and has been
published in the ICPR 2002. The IAM-database as of October 2002 is described in [8]. We use the
database extensively in our own research. The database contains forms of unconstrained handwritten
text, which were scanned at a resolution of 300dpi and saved as PNG images with 256 gray levels. The
figure below provides samples of a complete form, a text line and some extracted words.
Steps Involved (for training of the software)

At present, there are many machine learning libraries in Python, of which scikit-learn is the most famous,
simple and efficient tools for data mining and data analysis, it can be accessible to everybody, and
reusable in various contexts. In scikit-learn, an estimator for classification is a Python object that
implements the methods fit (x, y) and predict (T), and some forecast result is show in fig 2.

Fig 6:- Left: an image from the dataset with an arbitrary size. It is scaled to fit the target image of the size 128x32, the
empty part of the target image is filled with white color.

We are given samples of each of the 10 possible classes (the digits 0 through 9) on which we fit an
estimator to be able to predict the classes to which unseen samples belong. The cache_ size is set to
500(MB). When training SVM with the kernel function of RBF, C and gamma are considered.
Experimental Analysis

As the number of training samples increases, the


accuracy of the model will increase significantly.
However, when the predicted model reach a certain
precision, increasing the number of training samples
cannot contribute to the accuracy of the model. When
the prediction number are 300, 500 and 800 with the
training samples increase from 100 to 600, and the
accuracy increases from less than 95% to more than
99.9. When training an SVM with the RBF kernel, C and
gamma is considered. The recognition rate increases
continuously with C increases. When C increases to a
certain extent, the recognition rate changes little, the
maximum accuracy is 0.97. The recognition rate Fig 7:- The relation of training number and precision
increases first with gamma increases, and when gamma
increases to a certain value (about 0.001-0.002),
precision begins to decline (fig. 7)
GUI Implementation

Fig 8:- Initial GUI implementation of the project


Project Plan/ Duration (Time line)

Sr. No. Activity /Objective Duration


1. Thoroughanalysis of the idea 1 week
2. Visualization of idea and flowchart preparation 1 week
3. Identifying the equipment's required and installing them 1 week
4. Gathering the initial database for training and testing 1-2 weeks
5. Getting the coding part done using PyCharm IDE 3 weeks

6. Training our model with the partial instances of database created 2 weeks

7. Testing our model with the complete database 1 weeks

5. Gathering handwritten samples from the real world and training our 5 weeks
model with the same for real world application
Proposed Budget

It could be open-source software as for the development it will cost us nothing.

Sr. Item
Quantity Rate Amount
No.
1. Getting the publicly available database - - -

2. Setting up the IDEs in personal computers(opensource) - - -

3. Publication of the proposed research paper in Scopus journal 1 7500 7500

Grand Total 7,500/- plus


Conclusion and Future Prospects

• Training samples have a significant impact on the model. As the number of training samples
increases, the accuracy of the model will increase significantly. When the predicted model reach a
best precision, increasing the number of training samples cannot improve the accuracy of the
model. There is an acceptable training number.

• Different kernel functions have a different effect on the accuracy of the model. The experimental
results show that the performance of RBF is the best. RBF has the strongest learning ability, linear
and polynomial have the stronger learning ability, while sigmoid has the weakest learning ability.

• It can used with Notepad MS OFFICE or can be added as an extension to the browser etc. It can
be integrated with the LMS or other exam conducting software's which institution uses

• Direct software to perform specific task which will be handwriting to text conversion and
handwriting matching.
References

1. Alur, D., Crupi, J., Malks, D.: Core J2EE Patterns, p. 460. Sun Microsystems, Inc. (2001)Google Scholar

2. https://learntocodewith.me/posts/backend-development/

3. http://civil.utm.my/postgraduate-office/files/2013/10/Problem-Formulation-16032015.pdf

4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6344399/

5. https://libguides.uwf.edu/c.php?g=215199&p=1420828

6.Apache Jakarta Project, STRUTS home page (2003), http://jakarta.apache.org/struts/

7.https://99designs.com/blog/web-digital/how-to-design-anapp/

8.Yin, R.K.: Case Study Research, Design and Methods, 3rd edn. Sage Publications, Thousand Oaks
THANK YOU

You might also like