Abstract
This paper presents an online handwritten benchmark dataset (OHWR-Gurmukhi) for Gurmukhi script. TIET, Patiala released the unconstrained online handwriting databases, OHWR-GNumerals and OHWR-GScript, which contain isolated strokes samples produced by 190 writers. The OHWR-GNumerals covers 10 stroke classes and OHWR-GScript covers 95 stroke classes to represent the Gurmukhi character set. For data collection, two data sets of Gurmukhi words have been finalized after having a consultation with language experts in order to collect the balanced stroke samples. The preprocessing methods used to prepare these datasets include: size normalization, removing duplicate points, interpolating missing points and re-sampling. The purpose of this benchmark is to create a common platform and make the benchmark dataset publically available for research endeavors in the area of online handwriting recognition. The dataset is available as supplement at https://sites.google.com/view/ohwr-gurmukhi-script/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
International Unipen Foundation: The Unipen Project (1994). http://www.unipen.org/home.html
Agrawal, M., Bhaskarabhatla, A.S., Madhvanath, S.: Data collection for handwriting corpus creation in Indic scripts. In: International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA 2004), New Delhi, India, November 2004. Citeseer (2004)
Belhe, S., Chakravarthy, S., Ramakrishnan, A.: XML standard for Indic online handwritten database. In: Proceedings of the International Workshop on Multilingual OCR, p. 19. ACM (2009)
Djeddi, C., Al-Maadeed, S., Gattal, A., Siddiqi, I., Ennaji, A., El Abed, H.: ICFHR2016 competition on multi-script writer demographics classification using “QUWI” database. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 602–606. IEEE (2016)
Fisher, W.M.: The DARPA speech recognition research database: specifications and status. In: Proceedings of DARPA Workshop on Speech Recognition, February 1986, pp. 93–99 (1986)
Godfrey, J.J., Holliman, E.C., McDaniel, J.: SWITCHBOARD: telephone speech corpus for research and development. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, vol. 1, pp. 517–520. IEEE (1992)
Hemphill, C.T., Godfrey, J.J., Doddington, G.R.: The ATIS spoken language systems pilot corpus. In: Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, 24–27 June 1990 (1990)
Hull, J.J., Fenrich, R.K.: Large database organization for document images. In: Impedovo, S. (ed.) Fundamentals in Handwriting Recognition, pp. 397–414. Springer, Heidelberg (1994). https://doi.org/10.1007/978-3-642-78646-4_24
Khayyat, M., Lam, L., Suen, C.Y.: Arabic handwritten word spotting using language models. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 43–48. IEEE (2012)
Lamel, L.F., Kassel, R.H., Seneff, S.: Speech database development: design and analysis of the acoustic-phonetic corpus. In: Speech Input/Output Assessment and Speech Databases (1989)
Messaoud, I.B., Amiri, H., El Abed, H., Märgner, V.: Region based local binarization approach for handwritten ancient documents. In: 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 633–638. IEEE (2012)
Phillips, I.T., Ha, J., Haralick, R.M., Dori, D.: The implementation methodology for a CD-ROM English document database. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR), pp. 484–487. IEEE (1993)
Price, P., Fisher, W.M., Bernstein, J., Pallett, D.S.: The DARPA 1000-word resource management database for continuous speech recognition. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1988, pp. 651–654. IEEE (1988)
Singh, H., Sharma, R.K., Singh, V.P.: Efficient zone identification approach for the recognition of online handwritten Gurmukhi script. Neural Comput. & Applic. 31(8), 3957–3968 (2019)
Singh, H., Sharma, R., Singh, V.: Recognition of online unconstrained handwritten Gurmukhi characters based on finite state automata. Sādhanā 43(11), 192 (2018)
Wilkinson, R.A., et al.: The first census optical character recognition system conference, vol. 184. US Department of Commerce, National Institute of Standards and Technology (1992)
Xing, L., Qiao, Y.: DeepWriter: a multi-stream deep CNN for text-independent writer identification. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 584–589. IEEE (2016)
Acknowledgment
The authors take this opportunity to thank Technology Development for Indian Languages (TDIL) Programme, Department of Information Technology, Government of India for funding this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, H., Sharma, R.K., Kumar, R., Verma, K., Kumar, R., Kumar, M. (2020). A Benchmark Dataset of Online Handwritten Gurmukhi Script Words and Numerals. In: Nain, N., Vipparthi, S., Raman, B. (eds) Computer Vision and Image Processing. CVIP 2019. Communications in Computer and Information Science, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-4018-9_41
Download citation
DOI: https://doi.org/10.1007/978-981-15-4018-9_41
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-4017-2
Online ISBN: 978-981-15-4018-9
eBook Packages: Computer ScienceComputer Science (R0)