Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/11827252_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Discovering emerging topics in unlabelled text collections

Published: 03 September 2006 Publication History

Abstract

As document collections accummulate over time, some of the discussion subjects in them become outfashioned, while new ones emerge. Then, old classification schemes should be updated. In this paper, we address the challenge of finding emerging and persistent “themes”, i.e. subjects that live long enough to be incorporated into a taxonomy or ontology describing the document collection. We focus on the identification of cluster labels that “survive” changes in the constitution of the underlying population of documents, including changes in the feature space of dominant words, because the terminology of the document archive also changes over time. We have conducted a set of promising experiments on the identification of themes that manifested themselves in section H2.8 of the ACM digital library and juxtapose them with the classes foreseen in the ACM taxonomy for this section.

References

[1]
Charu Aggarwal. On change diagnosis in evolving data streams. IEEE TKDE, 17(5):587-600, May 2005.
[2]
J. Allan. Introduction to Topic Detection and Tracking. Kluwer Academic Publishers, 2002.
[3]
Christian Borgelt and Andreas Nürnberger. Experiments in Document Clustering using Cluster Specific TermWeights. In Proc. Workshop Machine Learning and Interaction for Text-based Information Retrieval (TIR 2004), pages 55-68, University of Ulm, Germany 2004, 2004.
[4]
Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. A Framework for Measuring Changes in Data Characteristics. In Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 126-137, Philadelphia, Pennsylvania, May 1999. ACM Press.
[5]
A. Kontostathis, L. Galitsky, W.M. Pottenger, S. Roy, and D.J. Phelps. A Survey of Emerging Trend Detection in Textual Data Mining. Springer Verlag, 2003.
[6]
Satoshi Moringa and Kenji Yamanishi. Tracking Dynamics of Topic Trends Using a Finite Mixture Model. In Ronny Kohavi, Johannes Gehrke, William DuMouchel, and Joydeep Ghosh, editors, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 811-816. ACM Press New York, NY, USA, August 2004.
[7]
QiaizhuMei and ChengXiang Zhai. Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 198-207, Chicago, Illinois, USA, August 2005. ACM Press.
[8]
Daniel Neill, Andrew Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. In Proc. of KDD 2005, pages 218-227, Chicago, IL, Aug. 2005.
[9]
Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis, and Rene Schult. Monic - modeling and monitoring cluster transitions. In Proc. of 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06), page (6 pages total), Philadelphia, USA, Aug. 2006. ACM. poster paper, to appear (acceptance quote: 23%).
[10]
Rene Schult and Myra Spiliopoulou. Expanding the Taxonomies of Bibliographic Archives with Persistent Long-Term Themes. In Procedings of the 21th Annual ACM Symposium on Applied Computing (SAC'06). ACM, ACM Press, April 2006.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ADBIS'06: Proceedings of the 10th East European conference on Advances in Databases and Information Systems
September 2006
447 pages
ISBN:3540378995
  • Editors:
  • Yannis Manolopoulos,
  • Jaroslav Pokorný,
  • Timos K. Sellis

Sponsors

  • Altec: Altec
  • Ministry of National Education and Religious Affairs: Ministry of National Education and Religious Affairs
  • G-net: G-net

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 September 2006

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Discovering and monitoring product features and the opinions on them with OPINSTREAMNeurocomputing10.1016/j.neucom.2014.04.079150:PA(318-330)Online publication date: 20-Feb-2015
  • (2014)Analyzing topic evolution in bioinformaticsScientometrics10.1007/s11192-014-1246-2101:1(397-428)Online publication date: 1-Oct-2014
  • (2012)TUTProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2396884(972-981)Online publication date: 29-Oct-2012
  • (2012)Discovering global and local bursts in a stream of newsProceedings of the 27th Annual ACM Symposium on Applied Computing10.1145/2245276.2245433(807-812)Online publication date: 26-Mar-2012
  • (2011)Following the social mediaProceedings of the 4th international conference on Social computing, behavioral-cultural modeling and prediction10.5555/1964698.1964739(292-300)Online publication date: 29-Mar-2011
  • (2011)Discovering market trends in the biotechnology industryInternational Journal of Business Intelligence and Data Mining10.1504/IJBIDM.2011.0394116:2(184-201)Online publication date: 1-Apr-2011
  • (2010)Domain-specific identification of topics and trends in the blogosphereProceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects10.5555/1880672.1880718(490-504)Online publication date: 12-Jul-2010
  • (2010)From bursty patterns to bursty factsProceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence10.5555/1860967.1861069(517-522)Online publication date: 4-Aug-2010
  • (2009)Detecting topic evolution in scientific literatureProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646076(957-966)Online publication date: 2-Nov-2009
  • (2009)STORIES in TimeProceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 0310.1109/WI-IAT.2009.342(531-534)Online publication date: 15-Sep-2009
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media