Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1857999.1858141dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free access

Evaluating hierarchical discourse segmentation

Published: 02 June 2010 Publication History

Abstract

Hierarchical discourse segmentation is a useful technology, but it is difficult to evaluate. I propose an error measure based on the word error rate of Beeferman et al. (1999). I then show that this new measure not only reliably distinguishes baseline segmentations from lexically-informed hierarchical segmentations and more informed segmentations from less informed segmentations, but it also offers an improvement over previous linear error measures.

References

[1]
Roxana Angheluta, Rik De Busser, and Marie-Francine Moens. 2002. The use of topic segmentation for automatic summarization. In DUC 2002.
[2]
Nicholas Asher and Alex Lascarides. 2003. Logics of Conversation. Cambridge University Press.
[3]
Guy Aston and Lou Burnard. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press.
[4]
Jason Baldridge, Nicholas Asher, and Julie Hunter. 2007. Annotation for and robust parsing of discourse structure of unrestricted texts. Zeitschrift für Sprachwis-senschaft, 26(213):239.
[5]
Doug Beeferman, Adam Berger, and John D. Lafferty. 1999. Statistical models for text segmentation. Machine Learning, 34(1--3):177--210.
[6]
Douglas Biber, Eniko Csomay, James K. Jones, and Casey Keck. 2004. A corpus linguistic investigation of vocabulary-based discourse units in university registers. Language and Computers, 20:53--72.
[7]
Branimir Boguraev and Mary S. Neff. 2000. Discourse segmentation in aid of document summarization. In 33rd HICSS.
[8]
Marco Carbone, Ya'akov Gal, Stuart Shieber, and Barbara Grosz. 2004. Unifying annotated discourse hierarchies to create a gold standard. In Proceedings of 4th SIGDIAL Workshop on Discourse and Dialogue.
[9]
Joyce Y. Chai and Rong Jin. 2004. Discourse structure for context question answering. In HLT-NAACL 2004 Workshop on Pragmatics of Question Answering, pages 23--30.
[10]
Freddy Choi, Peter Wiemer-Hastings, and Johanna Moore. 2001. Latent semantic analysis for text segmentation. In Proceedings of 6th EMNLP, pages 109--117.
[11]
Freddy Choi. 2000. Advances in domain independent linear text segmentation. In Proceedings of NAACL-00, pages 26--33.
[12]
Laurence Danlos. 2004. Discourse dependency structures as constrained DAGs. In Proceedings of 5th SIGDIAL Workshop on Discourse and Dialogue, pages 127--135.
[13]
Jacob Eisenstein and Regina Barzilay. 2008. Bayesian unsupervised topic segmentation. In Proceedings of EMNLP 2008.
[14]
Jacob Eisenstein. 2009. Hierarchical text segmentation from multi-scale lexical cohesion. In Proceedings of NAACL09.
[15]
Katherine Forbes, Eleni Miltsakaki, Rashmi Prasad, Anoop Sarkar, Aravind Joshi, and Bonnie Webber. 2003. D-LTAG system: Discourse parsing with a lexicalized tree-adjoining grammar. Journal of Logic, Language and Information, 12(3):261--279, June.
[16]
P. Fragkou, V. Petridis, and Ath. Kehagias. 2004. A dynamic programming algorithm for linear text segmentation. Journal of Int Info Systems, 23:179--197.
[17]
W. Nelson Francis and Henry Kucera. 1979. BROWN Corpus Manual. Brown University, third edition.
[18]
Michael Galley, Kathleen McKeown, Eric Fossler-Lussier, and Hongyan Jing. 2003. Discourse segmentation of multi-party conversation. In 41st ACL.
[19]
Barbara J. Grosz and Candace L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175--204.
[20]
Marti Hearst. 1994. Multi-paragraph segmentation of expository text. In 32nd ACL, pages 9--16, New Mexico State University, Las Cruces, New Mexico.
[21]
Jerry R Hobbs. 1985. On the coherence and structure of discourse. In CSLI 85-37.
[22]
Xiang Ji and Hongyuan Zha. 2003. Domain-independent text segmentation using anisotropic diffusion and dynamic programming. In SIGIR'03.
[23]
Marcin Kaszkiel and Justin Zobel. 1997. Passage retrieval revisited. In Proceedings of 20th ACM SIGIR, pages 178--185.
[24]
David Kauchak and Francine Chen. 2005. Feature-based segmentation of narrative documents. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP.
[25]
Andrew Kehler. 2002. Coherence, reference and the theory of grammar. CSLI Publications.
[26]
Thomas K. Landauer, Peter W. Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25:259--284.
[27]
Igor Malioutov and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 25--32.
[28]
William Mann and Sandra Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text, 8(3):243--281.
[29]
Daniel Marcu. 2000. The theory and practice of discourse parsing and summarization. MIT Press.
[30]
Rebecca J. Passonneau and Diane J. Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1):103--139.
[31]
Lev Pevzner and Marti Hearst. 2001. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 16(1).
[32]
Livia Polanyi, Chris Culy, Martin van den Berg, Gian Lorenzo Thione, and David Ahn. 2004. A rule based approach to discourse parsing. In Proceedings of SIGDIAL.
[33]
Malcolm Slaney and Dulce Ponceleon. 2001. Hierarchical segmentation: Finding changes in a text signal. Proceedings of SIAM 2001 Text Mining Workshop, pages 6--13.
[34]
Marilyn A. Walker. 1997. Centering, anaphora resolution, and discourse structure. In Aravind K. Joshi Marilyn A. Walker and Ellen F. Prince, editors, Centering in Discourse. Oxford University Press.
[35]
Bonnie Webber. 2004. D-LTAG: extending lexicalized TAG to discourse. Cognitive Science, 28:751--779.
[36]
Florian Wolf and Edward Gibson. 2004. Representing discourse coherence: A corpus-based analysis. In 20th COLING.
[37]
Yaakov Yaari. 1997. Segmentation of expository texts by hierarchical agglomerative clustering. In Proceedings of RANLP'97.

Cited By

View all
  • (2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
  • (2015)DocRicherProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735379(901-906)Online publication date: 27-May-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
June 2010
1070 pages
ISBN:1932432655

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 02 June 2010

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
  • (2015)DocRicherProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735379(901-906)Online publication date: 27-May-2015

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media