Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-70242-6_14guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised Learning

Published: 20 September 2024 Publication History

Abstract

In the domain of Text Segmentation (TS), substantial advancements have been achieved through the introduction of innovative supervised learning methodologies. Recent years have witnessed supervised methods surpassing traditional unsupervised techniques in effectiveness. However, this enhanced performance is accompanied by the prerequisites of extensive training datasets and significant initial training duration. Our research relies on an unsupervised approach to TS by devising a graph-based keyword storage mechanism. This architecture results in performance improvements on a variety of datasets in comparison to current state-of-the-art unsupervised TS approaches, as evaluated by Pk and WindowDiff metrics. Furthermore, this study delves into the application of contemporary large-scale Language Large Models (LLMs), including GPT-4, for executing TS tasks.

References

[1]
Arnold S, Schneider R, Cudré-Mauroux P, Gers FA, and Löser A Sector: a neural model for coherent topic segmentation and classification Trans. Assoc. Comput. Linguist. 2019 7 169-184
[2]
Badjatiya P, Kurisinkel LJ, Gupta M, and Varma V Pasi G, Piwowarski B, Azzopardi L, and Hanbury A Attention-based neural text segmentation Advances in Information Retrieval 2018 Cham Springer 180-193
[3]
Barrow, J., Jain, R., Morariu, V., Manjunatha, V., Oard, D.W., Resnik, P.: A joint model for document segmentation and segment labeling. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 313–322 (2020)
[4]
Beeferman, D., Berger, A.L., Lafferty, J.D.: A model of lexical attraction and repulsion. In: Annual Meeting of the Association for Computational Linguistics (1997)
[5]
Brants, T., Chen, F., Tsochantaridis, I.: Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 211–218 (2002)
[6]
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics (2000)
[7]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2019)
[8]
Fragkou P, Petridis V, and Kehagias A A dynamic programming algorithm for linear text segmentation J. Intell. Inf. Syst. 2004 23 2 179-197
[9]
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 562–569 (2003)
[10]
Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130. Association for Computational Linguistics (2016)
[11]
Hearst MA Text tiling: segmenting text into multi-paragraph subtopic passages Comput. Linguist. 1997 23 1 33-64
[12]
Inan, H., Rungta, R., Mehdad, Y.: Structured summarization: unified text segmentation and segment labeling as a generation task. arXiv preprint arXiv:2209.13759 (2022)
[13]
Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text Segmentation as a Supervised Learning Task, pp. 469–473. Association for Computational Linguistics, New Orleans (2018)
[14]
Lee, J., Han, J., Baek, S., Song, M.: Topic segmentation model focusing on local context. arXiv preprint arXiv:2301.01935 (2023)
[15]
Lo, K., Jin, Y., Tan, W., Liu, M., Du, L., Buntine, W.: Transformer over pre-trained transformer for neural text segmentation with enhanced topic coherence. arXiv preprint arXiv:2110.07160 (2021)
[16]
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
[17]
Misra, H., Yvon, F., Jose, J.M., Cappé, O.: Text segmentation via topic modeling: an analytical study. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1553–1556 (2009)
[18]
Pevzner L and Hearst MA A critique and improvement of an evaluation metric for text segmentation Comput. Linguist. 2002 28 1 19-36
[19]
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
[20]
Riedl, M., Biemann, C.: Topictiling: a text segmentation algorithm based on IDA. In: Proceedings of ACL 2012 Student Research Workshop, pp. 37–42 (2012)
[21]
Somasundaran, S., et al.: Two-level transformer and auxiliary coherence modeling for improved text segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7797–7804 (2020)
[22]
Sun, Q., Li, R., Luo, D., Wu, X.: Text segmentation with IDA-based fisher kernel. In: Proceedings of ACL-08: HLT, Short Papers, pp. 269–272 (2008)
[23]
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
[24]
Xing, L., Huber, P., Carenini, G.: Improving topic segmentation by injecting discourse dependencies. arXiv preprint arXiv:2209.08626 (2022)

Index Terms

  1. Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised Learning
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Natural Language Processing and Information Systems: 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part II
    Jun 2024
    428 pages
    ISBN:978-3-031-70241-9
    DOI:10.1007/978-3-031-70242-6
    • Editors:
    • Amon Rapp,
    • Luigi Di Caro,
    • Farid Meziane,
    • Vijayan Sugumaran

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 20 September 2024

    Author Tags

    1. Natural Language Processing
    2. Text Segmentation
    3. LLM

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media