Abstract
This paper presents the very first unsupervised and automatic system which can recognize the logical structure of business documents without any models or prior information about their logical structure. Our solution can process totally unknown new models of documents. We consider the problem of recognition of logical structures as a problem of detection, because we simultaneously have to localize and recognize the logical function of blocks of text. We assume that any document is composed of parts from several other models of documents. We have proposed a part-based spatial model suited for partial voting. Our proposed model presents the concept of Spatial Context (SC) as a spatial feature, which locally measure the distribution of spatial information around a point of reference. Our method is based on a Gaussian voting process providing a robust mechanism to detect elements of any logical structure. Our solution is suited for non-rigid structures and works well with a reduced number of images. This excellent property is not shared by the supervised approaches, especially methods based on neuronal networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Srihari, N., et al.: Name and address block reader system for tax form processing. In: ICDAR, pp. 5–10 (1995)
Mao, J., et al.: A system for automatically reading IATA flight coupons. In: ICDAR97, pp. 153–157 (1997)
Cesarini, F., et al.: Trainable table location in document images. In: ICPR (3), pp. 236–240 (2002)
Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67
Coüasnon, B., et al.: Dmos, a generic document recognition method: application to table structure analysis in a general and in a specific way. In: IJDAR, pp. 111–122 (2006)
Klein, B., et al.: Three approaches to “industrial” table spotting. In: ICDAR, pp. 513–517 (2001)
Coüasnon, B., et al.: DMOS, It’s your turn! In: 1st International Workshop on Open Services and Tools for Document Analysis. ICDAR17
Mao, J., et al.: A model-based form processing sub-system. In: ICPR (1996)
Ting, A., et al.: Business form classification using strings. In: ICPR 96, p. 690
Héroux, P.: Etude de méthhodes de classification pour l’identification automatique de classes de formulaires. In: CIFED (1998)
Duygulu, P.: A hierarchical representation of form documents for identification and retrieval. IJDAR 5(1), 17–27 (2002)
Ishitani, Y., et al.: Model based information extraction and its application to document images. In: DLIA (2001)
Cesarini, F., et al.: INFORMys: A Flexible Invoice-Like Form-Reader System. In: IEEE PAMI, pp. 710–745 (1998)
Cesarini, F., et al.: Analysis and understanding of multi-class invoices. IJDAR 6(2), 102–114 (2003)
Hamza, H., et al.: Incremental classification of invoice documents. ICPR, pp. 1–4 (2008)
Hamza, H., et al.: Application du raisonnement à partir de cas à l’analyse de documents administratifs. Nancy2 University, France (2008)
Tateisi, Y., et al.: Using stochastic syntactic analysis for extracting a logical structure from a document image. In: ICPR, pp. 391–394 (1994)
Belaïd, Y., et al.: Form analysis by neural classification of cells. In: DAS, pp. 58–71 (1998)
Tsuji, Y., et al.: Document recognition system with layout structure generator. In: Proceedings of the MVA (1990)
Yamashita, A., et al.: A model based layout understading method for the document recogntion system. In: ICDAR, pp. 130–138 (1991)
LeBourgeois, F., et al.: Document understanding using probabilistic relaxation: application on tables of contents of periodicals. In: ICDAR, pp. 508–512 (2001)
Lebourgeois, F.: Localisation de textes dans une image `a niveaux de gris. In: CNED 1996, pp. 207–214
Hough, P.V.C.: Method and means for recognizing complex patterns, U.S. Patent 3,069,654, December 18 (1962)
Duda, R.O. et al.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 72, 11–15
Ballard, et al.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recogn. 13(2), pp. 111–122 (1981)
Medioni, G., et al.: 3-D structures for generic object recognition. In: ICPR, pp. 1030–1037 (2000)
Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_44
Leibe, B., et al.: Robust object detection with interleaved categorization and segmentation. Int. J. Comp. Vis. 77(1–3), 259–289 (2008)
Rusinol, M., et al.: Field extraction from administrative documents by incremental structural templates. In: ICDAR, pp. 1100–1104 (2013)
Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47
Palm, R.B., et al.: Cloudscan-a configuration-free invoice analysis system using recurrent neural networks. In: ICDAR, pp. 406–413 (2017)
https://www.kofax.com/-/media/Files/Datasheets/EN/ps_kofax-readsoft-invoices_en.pdf
Schuster, D., et al.: Intellix – end-user trained information extraction for document archiving. In: ICDAR, pp. 101–105 (2013)
Liyuan, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR (2020)
Katti, A.R., et al.: Chargrid: towards understanding 2d documents. In: EMNLP, pp. 4459–4469 (2018)
Zhao, X., et al.: CUTIE: learning to understand documents with convolutional universal text information extractor (2019)
Denk, T.I., et al.: Bertgrid: Contextualized embedding for 2d document representation and understanding. CoRR,abs/1909.04948 (2019)
Xiaojing, L., et al.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL, pp. 32–39 (2019)
Majumder, B.P., et al.: Representation learning for information extraction from form-like documents. In: ACL, pp. 6495–6504 (2020)
Gogar, T., Hubacek, O., Sedivy, J.: Deep neural networks for web page information extraction. In: IFIP AIAI (2016)
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Block-based web search. In: SIGIR, pp. 456–463 (2004). Yu et al. 2003
Yu, S., et al.: Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: WWW, pp. 11–18 (2003)
Zhu, J., et al.: Simultaneous record detection and attribute labeling in web data extraction. In: KDD, pp. 494–503 (2006)
Lample, et al.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)
Yang, X.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: CVPR (2017)
Peng, N., Poon, H., Quirk, C., Toutanova, K., Yih, W.-T.: Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. 5, 101–115 (2017)
Kessi, L., Lebourgeois, F., Garcia, C.: An efficient new PDE-based characters reconstruction after graphics removal. In: ICFHR, pp. 441–446 (2016)
Kessi, L., Lebourgeois, F., Garcia, C.: An efficient image registration method based on modified nonlocal-means - application to color business document images. VISAPP (1), pp. 166–173 (2015)
Kessi, L., Lebourgeois, F., Garcia, C., Duong, J.: AColDPS - robust and unsupervised automatic color document processing system. In: VISAPP (1), pp. 174–185 (2015)
Kessi, L., Lebourgeois, F., Garcia, C.: AColDSS: robust unsupervised automatic color segmentation system for noisy heterogeneous document images. EPS (2015)
Acknowledgement
This work was granted by ITESOFT and LIRIS Lab from INSA-LYON for the project DOD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kessi, L., Lebourgeois, F., Garcia, C. (2021). Unsupervised Recognition of the Logical Structure of Business Documents Based on Spatial Relationships. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-89131-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89130-5
Online ISBN: 978-3-030-89131-2
eBook Packages: Computer ScienceComputer Science (R0)