research-article

Free access

Evaluating hierarchical discourse segmentation

Author:

Lucien CarrollAuthors Info & Claims

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Pages 993 - 1001

Published: 02 June 2010 Publication History

Abstract

Hierarchical discourse segmentation is a useful technology, but it is difficult to evaluate. I propose an error measure based on the word error rate of Beeferman et al. (1999). I then show that this new measure not only reliably distinguishes baseline segmentations from lexically-informed hierarchical segmentations and more informed segmentations from less informed segmentations, but it also offers an improvement over previous linear error measures.

References

[1]

Roxana Angheluta, Rik De Busser, and Marie-Francine Moens. 2002. The use of topic segmentation for automatic summarization. In DUC 2002.

[2]

Nicholas Asher and Alex Lascarides. 2003. Logics of Conversation. Cambridge University Press.

[3]

Guy Aston and Lou Burnard. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press.

[4]

Jason Baldridge, Nicholas Asher, and Julie Hunter. 2007. Annotation for and robust parsing of discourse structure of unrestricted texts. Zeitschrift für Sprachwis-senschaft, 26(213):239.

[5]

Doug Beeferman, Adam Berger, and John D. Lafferty. 1999. Statistical models for text segmentation. Machine Learning, 34(1--3):177--210.

Digital Library

[6]

Douglas Biber, Eniko Csomay, James K. Jones, and Casey Keck. 2004. A corpus linguistic investigation of vocabulary-based discourse units in university registers. Language and Computers, 20:53--72.

[7]

Branimir Boguraev and Mary S. Neff. 2000. Discourse segmentation in aid of document summarization. In 33rd HICSS.

Digital Library

[8]

Marco Carbone, Ya'akov Gal, Stuart Shieber, and Barbara Grosz. 2004. Unifying annotated discourse hierarchies to create a gold standard. In Proceedings of 4th SIGDIAL Workshop on Discourse and Dialogue.

[9]

Joyce Y. Chai and Rong Jin. 2004. Discourse structure for context question answering. In HLT-NAACL 2004 Workshop on Pragmatics of Question Answering, pages 23--30.

[10]

Freddy Choi, Peter Wiemer-Hastings, and Johanna Moore. 2001. Latent semantic analysis for text segmentation. In Proceedings of 6th EMNLP, pages 109--117.

[11]

Freddy Choi. 2000. Advances in domain independent linear text segmentation. In Proceedings of NAACL-00, pages 26--33.

Digital Library

[12]

Laurence Danlos. 2004. Discourse dependency structures as constrained DAGs. In Proceedings of 5th SIGDIAL Workshop on Discourse and Dialogue, pages 127--135.

[13]

Jacob Eisenstein and Regina Barzilay. 2008. Bayesian unsupervised topic segmentation. In Proceedings of EMNLP 2008.

Digital Library

[14]

Jacob Eisenstein. 2009. Hierarchical text segmentation from multi-scale lexical cohesion. In Proceedings of NAACL09.

Digital Library

[15]

Katherine Forbes, Eleni Miltsakaki, Rashmi Prasad, Anoop Sarkar, Aravind Joshi, and Bonnie Webber. 2003. D-LTAG system: Discourse parsing with a lexicalized tree-adjoining grammar. Journal of Logic, Language and Information, 12(3):261--279, June.

Digital Library

[16]

P. Fragkou, V. Petridis, and Ath. Kehagias. 2004. A dynamic programming algorithm for linear text segmentation. Journal of Int Info Systems, 23:179--197.

Digital Library

[17]

W. Nelson Francis and Henry Kucera. 1979. BROWN Corpus Manual. Brown University, third edition.

[18]

Michael Galley, Kathleen McKeown, Eric Fossler-Lussier, and Hongyan Jing. 2003. Discourse segmentation of multi-party conversation. In 41st ACL.

Digital Library

[19]

Barbara J. Grosz and Candace L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3):175--204.

Digital Library

[20]

Marti Hearst. 1994. Multi-paragraph segmentation of expository text. In 32nd ACL, pages 9--16, New Mexico State University, Las Cruces, New Mexico.

Digital Library

[21]

Jerry R Hobbs. 1985. On the coherence and structure of discourse. In CSLI 85-37.

[22]

Xiang Ji and Hongyuan Zha. 2003. Domain-independent text segmentation using anisotropic diffusion and dynamic programming. In SIGIR'03.

Digital Library

[23]

Marcin Kaszkiel and Justin Zobel. 1997. Passage retrieval revisited. In Proceedings of 20th ACM SIGIR, pages 178--185.

Digital Library

[24]

David Kauchak and Francine Chen. 2005. Feature-based segmentation of narrative documents. In Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP.

Digital Library

[25]

Andrew Kehler. 2002. Coherence, reference and the theory of grammar. CSLI Publications.

[26]

Thomas K. Landauer, Peter W. Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25:259--284.

[27]

Igor Malioutov and Regina Barzilay. 2006. Minimum cut model for spoken lecture segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 25--32.

Digital Library

[28]

William Mann and Sandra Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text, 8(3):243--281.

[29]

Daniel Marcu. 2000. The theory and practice of discourse parsing and summarization. MIT Press.

Digital Library

[30]

Rebecca J. Passonneau and Diane J. Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1):103--139.

Digital Library

[31]

Lev Pevzner and Marti Hearst. 2001. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 16(1).

Digital Library

[32]

Livia Polanyi, Chris Culy, Martin van den Berg, Gian Lorenzo Thione, and David Ahn. 2004. A rule based approach to discourse parsing. In Proceedings of SIGDIAL.

[33]

Malcolm Slaney and Dulce Ponceleon. 2001. Hierarchical segmentation: Finding changes in a text signal. Proceedings of SIAM 2001 Text Mining Workshop, pages 6--13.

[34]

Marilyn A. Walker. 1997. Centering, anaphora resolution, and discourse structure. In Aravind K. Joshi Marilyn A. Walker and Ellen F. Prince, editors, Centering in Discourse. Oxford University Press.

[35]

Bonnie Webber. 2004. D-LTAG: extending lexicalized TAG to discourse. Cognitive Science, 28:751--779.

[36]

Florian Wolf and Edward Gibson. 2004. Representing discourse coherence: A corpus-based analysis. In 20th COLING.

Digital Library

[37]

Yaakov Yaari. 1997. Segmentation of expository texts by hierarchical agglomerative clustering. In Proceedings of RANLP'97.

Cited By

Tang YHuang WLiu QTung AWang XYang JZhang BLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132934
Hu QLiu QWang XTung AGoyal SYang JSellis TDavidson SIves Z(2015)DocRicherProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735379(901-906)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2723372.2735379

Recommendations

Unsupervised hierarchical image segmentation through fuzzy entropy maximization

We present an unsupervised multilevel segmentation scheme for automatically segmenting grayscale and color images.Fuzzy 2-partition entropy is combined with Graph Cut to form a bi-level segmentation operator that splits a given region into 2 parts based ...
Contour Detection and Hierarchical Image Segmentation

This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization ...
Geodesic Saliency of Watershed Contours and Hierarchical Segmentation

The watershed is one of the latest segmentation tools developed in mathematical morphology. In order to prevent its oversegmentation, the notion of dynamics of a minimum, based on geodesic reconstruction, has been proposed. In this paper, we extend the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

HLT '10: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

June 2010

1070 pages

ISBN:1932432655

General Chair:
Ronald M. Kaplan
Microsoft Bing

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 02 June 2010

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tang YHuang WLiu QTung AWang XYang JZhang BLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132934
Hu QLiu QWang XTung AGoyal SYang JSellis TDavidson SIves Z(2015)DocRicherProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735379(901-906)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2723372.2735379

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents