article

Automatic summarisation of discussion fora

Authors:

Almer s. Tigelaar,

Rieks Op den akker,

Djoerd HiemstraAuthors Info & Claims

Natural Language Engineering, Volume 16, Issue 2

Pages 161 - 192

https://doi.org/10.1017/S135132491000001X

Published: 01 April 2010 Publication History

Abstract

Web-based discussion fora proliferate on the Internet. These fora consist of threads about specific matters. Existing forum search facilities provide an easy way for finding threads of interest. However, understanding the content of threads is not always trivial. This problem becomes more pressing as threads become longer. It frustrates users that are looking for specific information and also makes it more difficult to make valuable contributions to a discussion. We postulate that having a concise summary of a thread would greatly help forum users. But, how would we best create such summaries? In this paper, we present an automated method of summarising threads in discussion fora. Compared with summarisation of unstructured texts and spoken dialogues, the structural characteristics of threads give important advantages. We studied how to best exploit these characteristics. Messages in threads contain both explicit and implicit references to each other and are structured. Therefore, we term the threads hierarchical dialogues. Our proposed summarisation algorithm produces one summary of an hierarchical dialogue by ‘cherry-picking’ sentences out of the original messages that make up a thread. We try to select sentences usable for obtaining an overview of the discussion. Our method is built around a set of heuristics based on observations of real fora discussions. The data used for this research was in Dutch, but the developed method equally applies to other languages. We evaluated our approach using a prototype. Users judged our summariser as very useful, half of them indicating they would use it regularly or always when visiting fora.

References

[1]

Agresti, A. 2002. Categorical Data Analysis, p. 68, 2nd ed. New York: Wiley-Interscience.

[2]

op den Akker, R., Hospers, M., Kroezen, E., Nijholt, A., and Lie, D. 2002. A rule-based reference resolution method for dutch discourse analysis. In Proceedings of International Symposium on Reference Resolution in NLP, Alicante, Spain, pp. 59-66.

[3]

Baldwin, T., Martinez, D., and Penman, R. B. 2007. Automatic thread classification for linux user forum information access. In Proceedings of ADCS, Melbourne, Australia, pp. 72-9.

[4]

Bird, S., Klein, E., and Loper, E. 2008. Natural language processing in python. http: //nltk.sourceforge.net/index.php/Book (Draft Version 0.9.2).

[5]

Bogers, T. 2004. Dutch Named Entity Recognition: Optimizing Features, Algorithms, and Output. Master's thesis, University of Tilburg.

[6]

Bouma, G., van Noord, G., and Malouf, R. 2000. Alpino: wide-coverage computational analysis of Dutch. In Proceedings of CLIN, Tilburg, The Netherlands, pp. 45-59.

[7]

Carenini, G., Ng, R. T., and Zhou, X. 2007. Summarizing email conversations with clue words. In Proceedings of WWW, Banff, AB, Canada, pp. 91-100.

[8]

Coleman, M., and Liau, T. L. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2): 283-84.

[9]

Dalli, A., Yunqing, X., and Wilks, Y. 2004. FASIL Email summarisation system. In Proceedings of COLING, Geneva, Switzerland, pp. 994-1001.

[10]

van Deemter, K., and Kibble, R. 2000. On coreferring: coreference in MUC and related annotation schemes. Computational Linguistics 26(2): 629-37.

Digital Library

[11]

DuBay, W. H. 2004. The principles of readability. Technical Report, Impact Information. http://www.impact-information.com/impactinfo/readability02.pdf.

[12]

van Eynde, F. 2004. Part of Speech Tagging en Lemmatisering van het Corpus Gesproken Nederlands. Centre for Computerlinguistics, Catholic University of Leuven.

[13]

Farell, R. 2002. Summarizing electronic discourse. International Journal of Intelligent Systems in Accounting, Finance & Management 11: 23-38.

[14]

Farell, R., Fait-weather, P. G., and Snyder, K. 2001. Summarization of discussion groups. In Proceedings of CIKM, Atlanta, GA, pp. 532-34.

[15]

Feng, D., Shaw, E., Kim, J., and Hovy, E. 2006. Learning to detect conversation focus of threaded discussions. In Proceedings of HLT-NAACL, New York, pp. 208-15.

[16]

Francis, W. N., and Kûcera, H. 1979. Brown corpus manual. http://icame.uib.no/brown/ bcm.html

[17]

Hoste, V., and Daelemans, W. 2005. Learning Dutch coreference resolution. In Proceedings of CLIN'04, Leiden, The Netherlands.

[18]

Hoste, V., and van den Bosch, A. 2007. A modular approach to learning Dutch coreference resolution. In Proceedings of WAR I, Bergen, Norway, pp. 51-75.

[19]

Hovy, E. 2004. The Oxford Handbook of Computational Linguistics: Text Summarization, chapter 32, pp. 583-98. Oxford, UK: Oxford University Press.

[20]

Hovy, E., Hermjakob, U., and Ravichandran, D. 2002. Qtargets used in webclopedia. http://www.isi.edu/natural-language/projects/webclopedia/Taxonomy

[21]

Jurafsky, D., and Martin, J. H. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, p. 340. Upper Saddle River, NJ: Prentice-Hall.

[22]

Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006a Mining and assessing discussions on the web through speech act analysis. In Proceedings of ISWC, Athens, GA.

[23]

Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006b Modeling and assessing student activities in on-line discussions. In Proceedings of AAAI EDM. Boston, MA.

[24]

Kiss, T., and Strunk, J. 2006. Unsupervised multilingual sentence boundary detection. Computational Linguistics 32(4): 485-525.

Digital Library

[25]

Klaas, M. 2005. Toward indicative discussion fora summarization. Technical Report UBC-CS TR-2005-04, University of British Columbia.

[26]

Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5): 604-632.

Digital Library

[27]

Lam, D., Rohall, S. L., Schmandt, C., and Stern, M. K. 2002. Exploiting e-mail structure to improve summarization. In Proceedings of CSCW (Interactive Posters), New Orleans, LA.

[28]

Lang, K. 1995. Newsweeder: learning to filter netnews. In Proceedings of ICML, Tahoe City, CA, pp. 331-39.

[29]

Lin, C.-Y. 2004. Looking for a few good metrics: ROUGE and its evaluation. In Proceedings of NTCIR Workshop, Tokyo, Japan, pp. 1765-76.

[30]

Manning, C., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing, p. 371. Cambridge, MA: MIT Press.

Digital Library

[31]

McKeown, K., Shrestha, L., and Rambow, O. 2007. Using question-answer pairs in extractive summarization of email conversations. In Proceedings of CIC Ling, Mexico City, Mexico, pp. 542-50.

[32]

Mitkov, R. 1999. Multilingual anaphora resolution. Machine Translation 14(3-4): 281-99.

Digital Library

[33]

Rambow, O., Shrestha, L., Chen, J., and Lauridsen, C. 2004. Summarizing email threads. In Proceedings of HTL/NAACL Short Papers, Boston, MA, pp. 105-8.

[34]

Ratcliff, J. W., and Metzener, D. M. 1988. Gestalt: an introduction to the Ratcliff/Obershelp pattern matching algorithm. Dr. Dobbs Journal, 7, p. 46.

[35]

Rienks, R. 2007. Meetings in Smart Environments: Implications of Progressing Technology. Ph.D. thesis, University of Twente.

[36]

Sang, E. T. K. 2005. Language-independent named entity recognition. http://www.cnts.ua. ac.be/conll2002/ner/

[37]

Schuth, A., Marx, M., and de Rijke, M. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of CIKM/WIDM, Lisbon, Portugal, vol. 123, pp. 97-104.

[38]

Stegeman, L. 2007. Hammer tagger. http://wwwhome.cs.utwente.nl/~infrieks/stt/stt.html

[39]

Wan, S., and McKeown, K. 2004. Generating overview summaries of ongoing email thread discussions. In Proceedings of COLING, Geneva, Switzerland, pp. 549-56.

[40]

Weimer, M., Gurevych, I., and Mühlhäuser, M. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of ACL Demo and Poster Sessions, Prague, Czech Republic, pp. 125-28.

[41]

Zechner, K. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics 28(4): 447-485.

Digital Library

Cited By

Khan AShah QUddin MUllah FAlharbi AAlyami HGul MAziz F(2020)Sentence Embedding Based Semantic Clustering Approach for Discussion Thread SummarizationComplexity10.1155/2020/47508712020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/4750871
Hoogeveen DWang LBaldwin TVerspoor K(2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
https://dl.acm.org/doi/10.1561/1500000062
Verberne SKrahmer EHendrickx IWubben SBosch A(2018)Creating a reference data set for the summarization of discussion forum threadsLanguage Resources and Evaluation10.1007/s10579-017-9389-452:2(461-483)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s10579-017-9389-4
Show More Cited By

Automatic summarisation of discussion fora
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Opinion summarization: automatically creating useful representations of the opinions expressed in text
Automatic Text Summarization Methods: A Comprehensive Review
Abstract
Text summarization is the process of condensing a long text into a shorter version by maintaining the key information and its meaning. Automatic text summarization can save time and helps in selecting the important and relevant sentences from the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Natural Language Engineering

Natural Language Engineering Volume 16, Issue 2

April 2010

98 pages

ISSN:1351-3249

Issue’s Table of Contents

Publisher

Cambridge University Press

United States

Publication History

Published: 01 April 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan AShah QUddin MUllah FAlharbi AAlyami HGul MAziz F(2020)Sentence Embedding Based Semantic Clustering Approach for Discussion Thread SummarizationComplexity10.1155/2020/47508712020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/4750871
Hoogeveen DWang LBaldwin TVerspoor K(2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
https://dl.acm.org/doi/10.1561/1500000062
Verberne SKrahmer EHendrickx IWubben SBosch A(2018)Creating a reference data set for the summarization of discussion forum threadsLanguage Resources and Evaluation10.1007/s10579-017-9389-452:2(461-483)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s10579-017-9389-4
Verberne Svan den Bosch AWubben SKrahmer ENordlie RPharo NFreund LLarsen BRussel D(2017)Automatic Summarization of Domain-specific Forum ThreadsProceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval10.1145/3020165.3022127(253-256)Online publication date: 7-Mar-2017
https://dl.acm.org/doi/10.1145/3020165.3022127
Wanner FRamm TKeim DAkerkar R(2011)ForAVisProceedings of the International Conference on Web Intelligence, Mining and Semantics10.1145/1988688.1988705(1-10)Online publication date: 25-May-2011
https://dl.acm.org/doi/10.1145/1988688.1988705

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents