Automatic Structuring of Written Texts

Marek Veber³,
Aleš Horák³,
Rostislav Julinek³ &
…
Pavel Smrž³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

478 Accesses

Abstract

This paper deals with automatic structuring and sentence boundary labelling in natural language texts. We describe the implemented structure tagging algorithm and heuristic rules that are used for automatic or semiautomatic labelling. Inside the detected sentence the algorithm performs a decomposition to clauses and then marks the parts of text which do not form a sentence, i.e. headings, signatures, tables and other structured data. We also pay attention to the processing of matched symbols in the text, especially to the analysis of direct speech notation.

The research is sponsored by the Czech Ministry of Education under the grant VS 97028.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Computational Measure for the Semantic Readability of Segmented Texts

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts

Article 27 May 2021

References

Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In the 3rd Conference on Applied Natural Language Processing, Trento, Italy 1991.
Google Scholar
Riley, M., D.: Some applications of tree-based modeling to speech and language indexing. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 339–352, Morgan Kaufmann 1989.
Google Scholar
Palmer, D., D., Hearst, M., A.: Adaptive Sentence Boundary Disambiguation. In The Proceedings of the ANLP’ 1994, Stuttgart, Germany, October 1994.
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM — Approaches to Disambiguation. Technical Report FIMU-RS-97-09, Faculty of Informatics, Masaryk University, Brno, 1997.
Google Scholar
Pala, K., Rychlý, P., Smrž, P.: DESAM — Annotated Corpus for Czech. In Proceedings of SOFSEM’97.
Google Scholar
Ševeček, P.: LEMMA morphological analyzer and lemmatizer for Czech, program in “C”, Brno, 1996. (manuscript).
Google Scholar
Julinek, R.: Automatic Detection of Sentence Boundaries, Master thesis, Masaryk University, Brno, April 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University, Botanická 68a, 60200, Brno, Czech Republic
Marek Veber, Aleš Horák, Rostislav Julinek & Pavel Smrž

Authors

Marek Veber
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Horák
View author publications
You can also search for this author in PubMed Google Scholar
Rostislav Julinek
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Smrž
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veber, M., Horák, A., Julinek, R., Smrž, P. (1999). Automatic Structuring of Written Texts. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_18

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_18
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Automatic Structuring of Written Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Computational Measure for the Semantic Readability of Segmented Texts

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Structuring of Written Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Computational Measure for the Semantic Readability of Segmented Texts

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Representation, Analysis, and Extraction of Knowledge from Unstructured Natural Language Texts

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation