Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/11611257_48guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Improved ROCK for text clustering using asymmetric proximity

Published: 21 January 2006 Publication History

Abstract

The ROCK algorithm can be applied to text clustering in large databases. The effectiveness of ROCK, however, is limited, because of the high dimensionality of textual data and traditional proximity measure of documents. In this paper, we propose an improved approach to strengthen the discriminative feature of text documents, which uses asymmetric proximity. Instead of the links count in ROCK, we propose a novel concept of link weight overlaps to measure the proximity between two clusters. The IROCK (Improved ROCK) algorithm performs clustering analysis based on the overlap information of asymmetric proximities between text objects. We carry on the clustering process in an agglomerative hierarchical way. To demonstrate the effectiveness of IROCK, we perform an experimental evaluation on real textual data. A comparison with ROCK and classical algorithms indicates the superiority of our approach.

References

[1]
Cliff, A., Haggett, P., Smallman-Raynor, M., Stroup, D., and Williamson, G.: The Application of Multidimensional Scaling Methods to Epidemiologial Data. Statistical Methods in Medical Research 4 (1995) 345-366
[2]
Guha, S., Rastogi, R., and Shim, K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems 25 5 (2000) 345-366
[3]
Han, J., and Kamber, M.: Data Mining: Concept and Techniques. Morgan Kaufmann Publishers (2001)
[4]
Karypis, G., Han, E.H., and Kumar, V.: Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer 32 8 (1999) 68-75
[5]
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., and Saarela, S.: Self Organization of a Massive Document Collection. In IEEE Transactions on Neural Networks 11 3 (2000) 574-585
[6]
Lang, K.: Newsweeder: Learning to Filter Netnews. In Proceedings of the 12th International Conference on Machine Learning, ICML95 (1995) 331-339
[7]
Lewis, D., Yang, Y., Rose, T., and Li, F.: Rcv1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research 5 (2004) 361-397
[8]
Salton, G.: Automatic Text Processing-The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley (1989)
[9]
Steinbach, M., Karypis, G., and Kumar, V.: A Comparison of Document Clustering Techniques. In KDD Workshop on Text Mining (2000)
[10]
Wermter, S., and Hung, C.: Selforganising Classification on the Reuters News Corpus. In The 19th International Conference on Computational Linguistics (COLING 2002) (2002) 1086-1092

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SOFSEM'06: Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
January 2006
574 pages
ISBN:354031198X
  • Editors:
  • Jiří Wiedermann,
  • Gerard Tel,
  • Jaroslav Pokorný,
  • Mária Bieliková,
  • Július Štuller

Sponsors

  • ERCIM: European Research Consortium for Informatics & Mathematics

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 21 January 2006

Author Tags

  1. data mining
  2. text clustering

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media