Abstract
We present a methodology for document processing that exploits logic-based machine learning techniques. Our claim is that information capture and indexing can profit by the identification of the document class and of specific function of its single layout components. Indeed, the application of incremental and multistrategy machine learning techniques, rather than the classic ones, allows for an efficient solution to the problem of information capture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
O. Altamura, F. Esposito, and D. Malerba. Transforming paper documents into XML format with WISDOM++. International Journal on Document Analysis and Recognition, 2001. To appear.
H. Brocks, U. Thiel, A. Stein, and A. Dirsch-Weigand. Customizable retrieval functions based on user tasks in the cultural heritage domain. In this book.
F. Esposito, D. Malerba, and F.A. Lisi. Machine learning for intelligent processing of printed documents. Journal of Intelligent Information Systems, 14(2/3):175–198, 2000.
F. Esposito, D. Malerba, G. Semeraro, N. Fanizzi, and S. Ferilli. Adding machine learning and knowledge intensive techniques to a digital library service. International Journal of Digital Libraries, 2(1): 3–19, 1998.
F. Esposito, G. Semeraro, N. Fanizzi, and S. Ferilli. Multistrategy Theory Revision: Induction and abduction in INTHELEX. Machine Learning, 38(1/2):133–156, 2000.
E.A. Fox. How to make intelligent digital libraries. In Z.W. Raś and M. Zemankova, editors, Proceedings of the 8th International Symposium on Methodologies for Intelligent Systems, volume 869 of LNAI, pages 27–38. Springer, 1994.
X. Li and P. Ng. A document classification and extraction system with learning ability. In Proceedings of the 5th International Conference on Document Analysis and Recognition, pages 197–200, 1999.
G. Nagy. Twenty years of document image analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):38–62, 2000.
F. Sebastiani. Machine learning in automated text categorization. Technical Report Technical Report IEI:B4-31-12-99, CNR-IEI, Pisa, Italy, 1999. Rev. 2001.
G. Semeraro, F. Esposito, D. Malerba, N. Fanizzi, and S. Ferilli. Machine learning + on-line libraries = IDL. In C. Peters and C. Thanos, editors, Research and Advanced Technology for Digital Libraries. First European Conference-ECDL97, volume 1324 of LNCS, pages 195–214. Springer, 1997.
Y. Tang, S. Lee, and C. Suen. Automatic document processing: A survey. Pattern Recognition, 29(2):1931–1952, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Semeraro, G., Ferilli, S., Fanizzi, N., Esposito, F. (2001). Document Classification and Interpretation through the Inference of Logic-Based Models. In: Constantopoulos, P., Sølvberg, I.T. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2001. Lecture Notes in Computer Science, vol 2163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44796-2_6
Download citation
DOI: https://doi.org/10.1007/3-540-44796-2_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42537-3
Online ISBN: 978-3-540-44796-2
eBook Packages: Springer Book Archive