Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1073012.1073076dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A statistical model for domain-independent text segmentation

Published: 06 July 2001 Publication History

Abstract

We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.

References

[1]
James Allan, Jaime Carbonell, George Doddington, Jonathan Yamron, and Yiming Yang. 1998. Topic detection and tracking pilot study final report. In Proc. of the DARPA Broadcast News Transcription and Understanding Workshop.
[2]
Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine Learning, 34(1-3):177--210.
[3]
Freddy Y. Y. Choi. 2000. Advances in domain independent linear text segmentation. In Proc. of NAACL-2000.
[4]
Marti A. Hearst and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proc. of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59--68.
[5]
Marti A. Hearst. 1994. Multi-paragraph segmentation of expository text. In Proc. of ACL'94.
[6]
Oskari Heinonen. 1998. Optimal multi-paragraph text segmentation by dynamic programming. In Proc. of COLING-ACL'98.
[7]
Min-Yen Kan, Judith L. Klavans, and Kathleen R. McKeown. 1998. Linear segmentation and segment significance. In Proc. of WVLC-6, pages 197--205.
[8]
Judith L. Klavans, Kathleen R. McKeown, Min-Yen Kan, and Susan Lee. 1998. Resources for the evaluation of summarization techniques. In Proceedings of the 1st International Conference on Language Resources and Evaluation (LREC), pages 899--902.
[9]
Hideki Kozima. 1993. Text segmentation based on similarity between words. In Proc. of ACL'93.
[10]
Sadao Kurohashi, Nobuyuki Shiraki, and Makoto Nagao. 1997. A method for detecting important descriptions of a word based on its density distribution in text (in Japanese). IPSJ (Information Processing Society of Japan) Journal, 38(4):845--854.
[11]
Kikuo Maekawa and Hanae Koiso. 2000. Design of spontaneous speech corpus for Japanese. In Proc of International Symposium: Toward the Realization of Spontaneous Speech Engineering, pages 70--77.
[12]
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. The MIT Press.
[13]
Masaaki Nagata. 1994. A stochastic Japanese morphological analyzer using a forward-DP backward A* n-best search algorithm. In Proc. of COLING'94, pages 201--207.
[14]
Yoshio Nakao. 2000. An algorithm for one-page summarization of a long text based on thematic hierarchy detection. In Proc. of ACL'2000, pages 302--309.
[15]
Manabu Okumura and Takeo Honda. 1994. Word sense disambiguation and text segmentation based on lexical cohesion. In Proc. of COLING-94.
[16]
Jay M. Ponte and W. Bruce Croft. 1997. Text segmentation by topic. In Proc. of the First European Conference on Research and Advanced Technology for Digital Libraries, pages 120--129.
[17]
Jeffrey C. Reynar. 1994. An automatic method of finding topic boundaries. In Proc. of ACL-94.
[18]
Jeffrey C. Reynar. 1998. Topic segmentation: Algorithms and applications. Ph.D. thesis, Computer and Information Science, University of Pennsylvania.
[19]
Jeffrey C. Reynar. 1999. Statistical models for topic segmentation. In Proc. of ACL-99, pages 357--364.
[20]
Gerard Salton, Amit Singhal, Chris Buckley, and Mandar Mitra. 1996. Automatic text decomposition using text segments and text themes. In Proc. of Hypertext'96.
[21]
Andreas Stolcke and Stephen M. Omohundro. 1994. Best-first model merging for hidden Markov model induction. Technical Report TR-94-003, ICSI, Berkeley, CA.
[22]
Yaakov Yaari. 1997. Segmentation of expository texts by hierarchical agglomerative clustering. In Proc. of the Recent Advances in Natural Language Processing.
[23]
J. P. Yamron, I. Carp, S. Lowe, and P. van Mulbregt. 1998. A hidden Markov model approach to text segmentation and event tracking. In Proc. of ICASSP-98.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
July 2001
562 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 July 2001

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)SEGBOTProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304349(4166-4172)Online publication date: 13-Jul-2018
  • (2017)Understand Short Texts by Harvesting and Analyzing Semantic KnowledgeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.257168729:3(499-512)Online publication date: 1-Mar-2017
  • (2016)Topic segmentation using word-level semantic relatedness functionsJournal of Information Science10.1177/016555151560246042:5(597-608)Online publication date: 1-Oct-2016
  • (2016)SegChainProceedings of the 6th International Conference on Web Intelligence, Mining and Semantics10.1145/2912845.2912872(1-8)Online publication date: 13-Jun-2016
  • (2016)SegChainW2VProcedia Computer Science10.1016/j.procs.2016.08.18296:C(1371-1380)Online publication date: 1-Oct-2016
  • (2016)Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity informationInformation Processing and Management: an International Journal10.1016/j.ipm.2016.04.00652:6(1004-1017)Online publication date: 1-Nov-2016
  • (2016)Extracting opinionated (sub)features from a stream of product reviews using accumulated novelty and internal re-organizationInformation Sciences: an International Journal10.1016/j.ins.2015.06.050329:C(876-899)Online publication date: 1-Feb-2016
  • (2015)Use of named entity recognition and co-reference resolution tools for segmenting english textsProceedings of the 19th Panhellenic Conference on Informatics10.1145/2801948.2802004(331-336)Online publication date: 1-Oct-2015
  • (2014)Nonparametric clustering with distance dependent hierarchiesProceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence10.5555/3020751.3020779(260-269)Online publication date: 23-Jul-2014
  • (2014)On automatic text segmentationProceedings of the 2014 ACM symposium on Document engineering10.1145/2644866.2644874(73-80)Online publication date: 16-Sep-2014
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media