Abstract
The basic idea of re-flowable document understanding and automatic typesetting is to generate logical documents by judging the hierarchical relationship of physical units and logical tags based on the identification of logical paragraph tags in re-flowable document. In order to overcome the shortages of conventional logical structure reconstruction methods, a novel logical structure reconstruction method of re-flowable document based on directed graph is proposed in this paper. This method extracts the logical structure from the template document and then utilizes directed graph’s single-source shortest path algorithm to filter out redundant logical tags, thus solving the problem of logical structure reconstruction of a document. Experimental results show that the algorithm can effectively improve the accuracy of logical structure recognition.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, International Society for Optics and Photonics, pp. 197–207 (2003)
Namboodiri, A.M., Jain, A.K.: Document structure and layout analysis. In: Digital Document Processing, pp. 29–48. Springer, London (2007)
Wu, Z., Mitra, P., Giles, C.L.: Table of contents recognition and extraction for heterogeneous book documents. In: Document Analysis and Recognition 12th International Conference, 2, pp. 1205–1209 (2013)
Sonka, M., Hlavac, V., Boyle, R.: Image processing, analysis, and machine vision. Cengage Learning (2014)
Hu, T.: New Methods for Robust and Efficient Recognition of the Logical Structures in Documents. IIUFUniversité de Fribourg, Switzerland (1994)
Satkhozhina, A., et al.: Non-manhattan layout extraction algorithm. In: Proceedings of SPIE-IS&T Electronic Imaging, 86640A (2013)
Belaïd, A., D’Andecy, V.P., Hamza, H., Belaïd, Y.: Administrative document analysis and structure. In: Biba, M., Xhafa, F. (eds.) Learning Structure and Schemas from Documents. SCI, vol. 375, pp. 51–71. Springer, Heidelberg (2011)
Song, H., Li, L., Zhang, W.: Application of VSM model to document structure identification. Journal of Beijing Information Science and Technology University (Natural Science Edition) 6, 66–69 (2011)
Jin, C.: Determine Algorithm of logical order in document layout based on directed graph. Microcomputer Information 12, 292–293 (2008)
Peng X., Li, N.: Improved VSM algorithm for judging paragraph logic label. Journal of Beijing Information Science and Technology University (Natural Science Edition), 19–24 (2014)
Nepomniaschaya, A.S.: An associative version of the bellman-ford algorithm for finding the shortest paths in directed graphs. In: Malyshkin, V.E. (ed.) PaCT 2001. LNCS, vol. 2127, pp. 285–292. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, L., Li, N., Peng, X., Liang, Q. (2015). An Improved Algorithm of Logical Structure Reconstruction for Re-flowable Document Understanding. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2015. Lecture Notes in Computer Science(), vol 9362. Springer, Cham. https://doi.org/10.1007/978-3-319-25207-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-25207-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25206-3
Online ISBN: 978-3-319-25207-0
eBook Packages: Computer ScienceComputer Science (R0)