Nothing Special   »   [go: up one dir, main page]

Skip to main content

Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised Learning

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2024)

Abstract

In the domain of Text Segmentation (TS), substantial advancements have been achieved through the introduction of innovative supervised learning methodologies. Recent years have witnessed supervised methods surpassing traditional unsupervised techniques in effectiveness. However, this enhanced performance is accompanied by the prerequisites of extensive training datasets and significant initial training duration. Our research relies on an unsupervised approach to TS by devising a graph-based keyword storage mechanism. This architecture results in performance improvements on a variety of datasets in comparison to current state-of-the-art unsupervised TS approaches, as evaluated by \(P_k\) and WindowDiff metrics. Furthermore, this study delves into the application of contemporary large-scale Language Large Models (LLMs), including GPT-4, for executing TS tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/HumanMachineLab/CoherenceGraph.

  2. 2.

    https://github.com/MaartenGr/KeyBERT.

References

  1. Arnold, S., Schneider, R., Cudré-Mauroux, P., Gers, F.A., Löser, A.: Sector: a neural model for coherent topic segmentation and classification. Trans. Assoc. Comput. Linguist. 7, 169–184 (2019)

    Article  Google Scholar 

  2. Badjatiya, P., Kurisinkel, L.J., Gupta, M., Varma, V.: Attention-based neural text segmentation. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 180–193. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_14

    Chapter  Google Scholar 

  3. Barrow, J., Jain, R., Morariu, V., Manjunatha, V., Oard, D.W., Resnik, P.: A joint model for document segmentation and segment labeling. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 313–322 (2020)

    Google Scholar 

  4. Beeferman, D., Berger, A.L., Lafferty, J.D.: A model of lexical attraction and repulsion. In: Annual Meeting of the Association for Computational Linguistics (1997)

    Google Scholar 

  5. Brants, T., Chen, F., Tsochantaridis, I.: Topic-based document segmentation with probabilistic latent semantic analysis. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 211–218 (2002)

    Google Scholar 

  6. Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics (2000)

    Google Scholar 

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American Chapter of the Association for Computational Linguistics (2019)

    Google Scholar 

  8. Fragkou, P., Petridis, V., Kehagias, A.: A dynamic programming algorithm for linear text segmentation. J. Intell. Inf. Syst. 23(2), 179–197 (2004)

    Article  Google Scholar 

  9. Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H.: Discourse segmentation of multi-party conversation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 562–569 (2003)

    Google Scholar 

  10. Glavaš, G., Nanni, F., Ponzetto, S.P.: Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, pp. 125–130. Association for Computational Linguistics (2016)

    Google Scholar 

  11. Hearst, M.A.: Text tiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)

    Google Scholar 

  12. Inan, H., Rungta, R., Mehdad, Y.: Structured summarization: unified text segmentation and segment labeling as a generation task. arXiv preprint arXiv:2209.13759 (2022)

  13. Koshorek, O., Cohen, A., Mor, N., Rotman, M., Berant, J.: Text Segmentation as a Supervised Learning Task, pp. 469–473. Association for Computational Linguistics, New Orleans (2018)

    Google Scholar 

  14. Lee, J., Han, J., Baek, S., Song, M.: Topic segmentation model focusing on local context. arXiv preprint arXiv:2301.01935 (2023)

  15. Lo, K., Jin, Y., Tan, W., Liu, M., Du, L., Buntine, W.: Transformer over pre-trained transformer for neural text segmentation with enhanced topic coherence. arXiv preprint arXiv:2110.07160 (2021)

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  17. Misra, H., Yvon, F., Jose, J.M., Cappé, O.: Text segmentation via topic modeling: an analytical study. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1553–1556 (2009)

    Google Scholar 

  18. Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)

    Article  Google Scholar 

  19. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)

  20. Riedl, M., Biemann, C.: Topictiling: a text segmentation algorithm based on IDA. In: Proceedings of ACL 2012 Student Research Workshop, pp. 37–42 (2012)

    Google Scholar 

  21. Somasundaran, S., et al.: Two-level transformer and auxiliary coherence modeling for improved text segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7797–7804 (2020)

    Google Scholar 

  22. Sun, Q., Li, R., Luo, D., Wu, X.: Text segmentation with IDA-based fisher kernel. In: Proceedings of ACL-08: HLT, Short Papers, pp. 269–272 (2008)

    Google Scholar 

  23. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  24. Xing, L., Huber, P., Carenini, G.: Improving topic segmentation by injecting discourse dependencies. arXiv preprint arXiv:2209.08626 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Maraj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maraj, A., Vargas Martin, M., Makrehchi, M. (2024). Coherence Graphs: Bridging the Gap in Text Segmentation with Unsupervised Learning. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14763. Springer, Cham. https://doi.org/10.1007/978-3-031-70242-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70242-6_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70241-9

  • Online ISBN: 978-3-031-70242-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics