Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Automatic summarisation of discussion fora

Published: 01 April 2010 Publication History

Abstract

Web-based discussion fora proliferate on the Internet. These fora consist of threads about specific matters. Existing forum search facilities provide an easy way for finding threads of interest. However, understanding the content of threads is not always trivial. This problem becomes more pressing as threads become longer. It frustrates users that are looking for specific information and also makes it more difficult to make valuable contributions to a discussion. We postulate that having a concise summary of a thread would greatly help forum users. But, how would we best create such summaries? In this paper, we present an automated method of summarising threads in discussion fora. Compared with summarisation of unstructured texts and spoken dialogues, the structural characteristics of threads give important advantages. We studied how to best exploit these characteristics. Messages in threads contain both explicit and implicit references to each other and are structured. Therefore, we term the threads hierarchical dialogues. Our proposed summarisation algorithm produces one summary of an hierarchical dialogue by ‘cherry-picking’ sentences out of the original messages that make up a thread. We try to select sentences usable for obtaining an overview of the discussion. Our method is built around a set of heuristics based on observations of real fora discussions. The data used for this research was in Dutch, but the developed method equally applies to other languages. We evaluated our approach using a prototype. Users judged our summariser as very useful, half of them indicating they would use it regularly or always when visiting fora.

References

[1]
Agresti, A. 2002. Categorical Data Analysis, p. 68, 2nd ed. New York: Wiley-Interscience.
[2]
op den Akker, R., Hospers, M., Kroezen, E., Nijholt, A., and Lie, D. 2002. A rule-based reference resolution method for dutch discourse analysis. In Proceedings of International Symposium on Reference Resolution in NLP, Alicante, Spain, pp. 59-66.
[3]
Baldwin, T., Martinez, D., and Penman, R. B. 2007. Automatic thread classification for linux user forum information access. In Proceedings of ADCS, Melbourne, Australia, pp. 72-9.
[4]
Bird, S., Klein, E., and Loper, E. 2008. Natural language processing in python. http: //nltk.sourceforge.net/index.php/Book (Draft Version 0.9.2).
[5]
Bogers, T. 2004. Dutch Named Entity Recognition: Optimizing Features, Algorithms, and Output. Master's thesis, University of Tilburg.
[6]
Bouma, G., van Noord, G., and Malouf, R. 2000. Alpino: wide-coverage computational analysis of Dutch. In Proceedings of CLIN, Tilburg, The Netherlands, pp. 45-59.
[7]
Carenini, G., Ng, R. T., and Zhou, X. 2007. Summarizing email conversations with clue words. In Proceedings of WWW, Banff, AB, Canada, pp. 91-100.
[8]
Coleman, M., and Liau, T. L. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology 60(2): 283-84.
[9]
Dalli, A., Yunqing, X., and Wilks, Y. 2004. FASIL Email summarisation system. In Proceedings of COLING, Geneva, Switzerland, pp. 994-1001.
[10]
van Deemter, K., and Kibble, R. 2000. On coreferring: coreference in MUC and related annotation schemes. Computational Linguistics 26(2): 629-37.
[11]
DuBay, W. H. 2004. The principles of readability. Technical Report, Impact Information. http://www.impact-information.com/impactinfo/readability02.pdf.
[12]
van Eynde, F. 2004. Part of Speech Tagging en Lemmatisering van het Corpus Gesproken Nederlands. Centre for Computerlinguistics, Catholic University of Leuven.
[13]
Farell, R. 2002. Summarizing electronic discourse. International Journal of Intelligent Systems in Accounting, Finance & Management 11: 23-38.
[14]
Farell, R., Fait-weather, P. G., and Snyder, K. 2001. Summarization of discussion groups. In Proceedings of CIKM, Atlanta, GA, pp. 532-34.
[15]
Feng, D., Shaw, E., Kim, J., and Hovy, E. 2006. Learning to detect conversation focus of threaded discussions. In Proceedings of HLT-NAACL, New York, pp. 208-15.
[16]
Francis, W. N., and Kûcera, H. 1979. Brown corpus manual. http://icame.uib.no/brown/ bcm.html
[17]
Hoste, V., and Daelemans, W. 2005. Learning Dutch coreference resolution. In Proceedings of CLIN'04, Leiden, The Netherlands.
[18]
Hoste, V., and van den Bosch, A. 2007. A modular approach to learning Dutch coreference resolution. In Proceedings of WAR I, Bergen, Norway, pp. 51-75.
[19]
Hovy, E. 2004. The Oxford Handbook of Computational Linguistics: Text Summarization, chapter 32, pp. 583-98. Oxford, UK: Oxford University Press.
[20]
Hovy, E., Hermjakob, U., and Ravichandran, D. 2002. Qtargets used in webclopedia. http://www.isi.edu/natural-language/projects/webclopedia/Taxonomy
[21]
Jurafsky, D., and Martin, J. H. 2000. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, p. 340. Upper Saddle River, NJ: Prentice-Hall.
[22]
Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006a Mining and assessing discussions on the web through speech act analysis. In Proceedings of ISWC, Athens, GA.
[23]
Kim, J., Chem, G., Feng, D., Shaw, E., and Hovy, E. 2006b Modeling and assessing student activities in on-line discussions. In Proceedings of AAAI EDM. Boston, MA.
[24]
Kiss, T., and Strunk, J. 2006. Unsupervised multilingual sentence boundary detection. Computational Linguistics 32(4): 485-525.
[25]
Klaas, M. 2005. Toward indicative discussion fora summarization. Technical Report UBC-CS TR-2005-04, University of British Columbia.
[26]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5): 604-632.
[27]
Lam, D., Rohall, S. L., Schmandt, C., and Stern, M. K. 2002. Exploiting e-mail structure to improve summarization. In Proceedings of CSCW (Interactive Posters), New Orleans, LA.
[28]
Lang, K. 1995. Newsweeder: learning to filter netnews. In Proceedings of ICML, Tahoe City, CA, pp. 331-39.
[29]
Lin, C.-Y. 2004. Looking for a few good metrics: ROUGE and its evaluation. In Proceedings of NTCIR Workshop, Tokyo, Japan, pp. 1765-76.
[30]
Manning, C., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing, p. 371. Cambridge, MA: MIT Press.
[31]
McKeown, K., Shrestha, L., and Rambow, O. 2007. Using question-answer pairs in extractive summarization of email conversations. In Proceedings of CIC Ling, Mexico City, Mexico, pp. 542-50.
[32]
Mitkov, R. 1999. Multilingual anaphora resolution. Machine Translation 14(3-4): 281-99.
[33]
Rambow, O., Shrestha, L., Chen, J., and Lauridsen, C. 2004. Summarizing email threads. In Proceedings of HTL/NAACL Short Papers, Boston, MA, pp. 105-8.
[34]
Ratcliff, J. W., and Metzener, D. M. 1988. Gestalt: an introduction to the Ratcliff/Obershelp pattern matching algorithm. Dr. Dobbs Journal, 7, p. 46.
[35]
Rienks, R. 2007. Meetings in Smart Environments: Implications of Progressing Technology. Ph.D. thesis, University of Twente.
[36]
Sang, E. T. K. 2005. Language-independent named entity recognition. http://www.cnts.ua. ac.be/conll2002/ner/
[37]
Schuth, A., Marx, M., and de Rijke, M. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of CIKM/WIDM, Lisbon, Portugal, vol. 123, pp. 97-104.
[38]
Stegeman, L. 2007. Hammer tagger. http://wwwhome.cs.utwente.nl/~infrieks/stt/stt.html
[39]
Wan, S., and McKeown, K. 2004. Generating overview summaries of ongoing email thread discussions. In Proceedings of COLING, Geneva, Switzerland, pp. 549-56.
[40]
Weimer, M., Gurevych, I., and Mühlhäuser, M. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of ACL Demo and Poster Sessions, Prague, Czech Republic, pp. 125-28.
[41]
Zechner, K. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics 28(4): 447-485.

Cited By

View all
  • (2020)Sentence Embedding Based Semantic Clustering Approach for Discussion Thread SummarizationComplexity10.1155/2020/47508712020Online publication date: 1-Jan-2020
  • (2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
  • (2018)Creating a reference data set for the summarization of discussion forum threadsLanguage Resources and Evaluation10.1007/s10579-017-9389-452:2(461-483)Online publication date: 1-Jun-2018
  • Show More Cited By
  1. Automatic summarisation of discussion fora

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Natural Language Engineering
    Natural Language Engineering  Volume 16, Issue 2
    April 2010
    98 pages

    Publisher

    Cambridge University Press

    United States

    Publication History

    Published: 01 April 2010

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Sentence Embedding Based Semantic Clustering Approach for Discussion Thread SummarizationComplexity10.1155/2020/47508712020Online publication date: 1-Jan-2020
    • (2018)Web Forum Retrieval and Text AnalyticsFoundations and Trends in Information Retrieval10.1561/150000006212:1(1-163)Online publication date: 3-Jan-2018
    • (2018)Creating a reference data set for the summarization of discussion forum threadsLanguage Resources and Evaluation10.1007/s10579-017-9389-452:2(461-483)Online publication date: 1-Jun-2018
    • (2017)Automatic Summarization of Domain-specific Forum ThreadsProceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval10.1145/3020165.3022127(253-256)Online publication date: 7-Mar-2017
    • (2011)ForAVisProceedings of the International Conference on Web Intelligence, Mining and Semantics10.1145/1988688.1988705(1-10)Online publication date: 25-May-2011

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media