Abstract
This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signatures, tables and other structured data. We also pay attention to the processing of matched symbols in the text, especially to the analysis of direct speech notation.
The research is sponsored by the Czech Ministry of Education under the grant VS 97028.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In the 3rd Conference on Applied Natural Language Processing, Trento, Italy 1991.
Riley, M., D.: Some applications of tree-based modeling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 339–352, Morgan Kaufmann 1989.
Palmer, D., D., Hearst, M., A.: Adaptive Sentence Boundary Disambiguation. In The Proceedings of the ANLP’ 1994, Stuttgart, Germany, October 1994.
Pala, K., Rychlý, P., Smrž, P.: DESAM — Approaches to Disambiguation. Technical Report FIMU-RS-97-09, Faculty of Informatics, Masaryk University, Brno, 1997.
Pala, K., Rychlý, P., Smrž, P.: DESAM — Annotated Corpus for Czech. In Proceedings of SOFSEM’97.
Ševeček, P.: LEMMA morphological analyzer and lemmatizer for Czech, program in “C”, Brno, 1996. (manuscript).
Julinek, R.: Automatic Detection of Sentence Boundaries, Master thesis, Masaryk University, Brno, April 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veber, M., Horák, A., Julinek, R., Smrž, P. (1999). Automatic Structuring of Written Texts. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_18
Download citation
DOI: https://doi.org/10.1007/3-540-48239-3_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive