Abstract
The cursive nature, Nastaliq writing style and a large number of different ligatures make ligature recognition very difficult in Urdu. In this paper, we present a segmentation-free approach to holistically recognize Urdu ligatures. We first generate a rich dataset which contains 17,010 ligatures with different orientation and different degrees of noise. Secondly, the ligatures are clustered (categorized) in order to reduce the search space and make the learning robust. Finally, we employ a deep neural network with dropout regularization to classify ligatures. The detailed experiments show that a deep neural network with dropout regularization and clustering of ligatures significantly enhances the classification accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmad I, Wang X, Mao YH, Liu G, Ahmad H, Ullah R (2017a) Ligature based Urdu Nastaleeq sentence recognition using gated bidirectional long short term memory. Clust Comput 17:1–12. https://doi.org/10.1007/s10586-017-0990-5
Ahmad I, Wang X, Li R, Rasheed S (2017b) Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder. China Commun 14(1):146–157
Asad M, Butt AS, Chaudhry S, Hussain S (2004) Rule-based expert system for urdu Nastaleeq justification. In: Multitopic Conference, 2004. Proceedings of INMIC 2004. 8th International, IEEE. pp. 591–596
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, CVPR 2005, IEEE vol. 1, pp. 886–893
Dalb SKS et al (2015) Review of online and offline character recognition. Int J Eng Comput Sci 4(5):11729–11732
Din IU, Siddiqi I (2017) Khalid S (2017) Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 1:62
El-Korashy A, Shafait F (2013) Search space reduction for holistic ligature recognition in Urdu Nastalique script. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1125–1129
Gonzalez RC, Woods RE (2004) Eddins SL (2004) Digital image processing using MATLAB. Cambridge, p, Pearson Education
Hussain S, Niazi A, Anjum U, Irfan E, et al (2014) Adapting Tesseract for complex scripts: an example for Urdu Nastalique. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), IEEE, pp. 191–195
Impedovo S, Ottaviano L, Occhinegro S (1991) Optical character recognition–a survey. Int J Pattern Recogn Artif Intell 5(01n02):1–24
Javed ST, Hussain S (2009) Improving Nastalique specific pre-recognition process for Urdu OCR. In: Multitopic Conference, 2009. INMIC 2009. IEEE 13th International pp. 1–6. IEEE
Javed ST (2007) Investigation into a segmentation based OCR for the Nastaleeq writing system. National University of Computer and Emerging Sciences, Islamabad, p 2007
Javed ST, Hussain S (2013) Segmentation based urdu nastalique OCR. Iberoamerican Congress on Pattern Recognition. Springer, Berlin, pp 41–49
Javed ST, Hussain S, Maqbool A, Asloob S, Jamil S, Moin H (2013) Segmentation free nastalique Urdu OCR. World Acad Sci Eng Technol 46:456–461
Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures-a holistic approach. In : 2015 13th International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 71–75
Khattak IU, Siddiqi I, Khalid S, Djeddi C (2015) Recognition of Urdu ligatures—a holistic approach. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 71–75, Aug 2015
Khorsheed MS (2015) Recognizing cursive typewritten text using segmentation-free system. Sci World J 2015:7. https://doi.org/10.1155/2015/818432
Lehal GS, Rana A (2013) Recognition of nastalique urdu ligatures. In: Proceedings of the 4th International Workshop on Multilingual OCR, ACM, p. 7
Lehal GS (2013) Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1130–1134
Line Eikvil (1993) Optical character recognition citeseer.ist.psu.edu/142042.html
Marques O (2011) Practical image and video processing using MATLAB. Wiley, New Jersey
Mori S, Suen CY, Yamamoto K (1992) Historical review of OCR research and development. Proc IEEE 80(7):1029–1058
Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU (2014) The optical character recognition of Urdu-like cursive scripts. Pattern Recogn 47(3):1229–1248
Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of Arabic-like scripts: a comprehensive survey. Educ Inf Technol 21(5):1225–1241
Naz S, Umar AI, Ahmad R, Siddiqi I, Ahmed SB, Razzak MI, Shafait F (2017) Urdu Nastaliq recognition using convolutional-recursive deep learning. Neurocomputing 243:80–87
Rana A, Lehal GS (2015) Offline Urdu OCR using ligature based segmentation for Nastaliq Script. Indian J Sci Technol 8(35):1–9
Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu Nastaliq OCR. In: Proceedings of the Conference on Language & Technology, pp. 85–91
Shafait F, Sabbour N (2013) A segmentation-free approach to Arabic and Urdu OCR. Proc SPIE 8658:8658
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Su T-H, Zhang T-W, Guan D-J, Huang H-J (2009) Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recogn 42(1):167–182
Venkata Rao N, Sastry ASCS, Chakravarthy ASN, Kalyanchakravarthi P, Kalyanchakravarthi P (2016) Optical character recognition technique algorithms. J Theor Appl Inf Technol 83(2):275
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Pattern Anal Mach Intell 13(8):841–847
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rafeeq, M.J., ur Rehman, Z., Khan, A. et al. Ligature categorization based Nastaliq Urdu recognition using deep neural networks. Comput Math Organ Theory 25, 184–195 (2019). https://doi.org/10.1007/s10588-018-9271-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10588-018-9271-y