Abstract
This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
West, E.W.: Pahlavi Texts, 5 (1860)
Jain, A.K., Yu, B.: Document Representation and Its Application to Page Decomposition. IEEE Trans. on Pattern Analysis and Machine Intelligence 20(3), 294–308 (1998)
Casey, R.G., Lecolinet, E.: A Survey of Methods and Strategies in Character Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 18(7) (1996)
Plamondon, R., Srihari, S.N.: On-line and Off-line handwriting Recognition,A Comprehensive Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1) (2000)
Arica, N., Yarman-Vural, F.T.: An Overview of Character Recognition Focused on Off-Line Handwriting. IEEE Trans. on Sys., Man., and Cybernetics 31(2) (2001)
Sahoo, P.K., Soltani, S., Wong, A.K.C., Chen, Y.C.: A survey of thresholding techniques. Computer Vision, Graphics, and Image Processing 41, 233–260 (1998)
Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn (2002)
Giardina, C.R., Dougherty, E.R.: Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cliffs (1988)
Mohaderan, U., Nagabhushanam, R.C.: Gap metrics for word separation handwritten lines. In: ICDAR, pp. 124–127 (1995)
Seni, G., Cohen, E.: External word segmentation of off-line handwritten text lines. Pattern Recognition 27(1), 41–52 (1994)
Ha, J., Haralick, R., Phillips, I.: Document Page Decomposi-tion by the Bounding-Box Projection Technique. In: ICDAR, pp. 119–122 (1995)
Schomaker, L., Bulacu, M.: Automatic Writer Identification Using Connected-Component Contours and Edge-Based Features of Uppercase Western Script. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(6) (2004)
Likas, A., Valassis, N., Verbeek, J.J.: The global k_means algorithm. Pattern Recognition 36, 451–461 (2003)
Schomaker, L., Bulacu, M.: Automatic Writer Identification Using Connected-Component Contours and Edge-Based Features of Uppercase Western Script. IEEE Trans. on Pattern Analysis and Machine Intelligence 26(6) (2004)
Pahlavy Handwritten Documents, Asian Institute of Shiraz University (1972)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alirezaee, S., Fard, A.S., Aghaeinia, H., Faez, K. (2005). A Restoration and Segmentation Unit for the Historic Persian Documents. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2005. Lecture Notes in Computer Science, vol 3708. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558484_85
Download citation
DOI: https://doi.org/10.1007/11558484_85
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29032-2
Online ISBN: 978-3-540-32046-3
eBook Packages: Computer ScienceComputer Science (R0)