Nothing Special   »   [go: up one dir, main page]

Skip to main content

Semantic Correlation Network Based Text Clustering

  • Conference paper
AI 2005: Advances in Artificial Intelligence (AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:

  • 1813 Accesses

Abstract

Text documents have sparse data spaces, and nearest neighbors may belong to different classes when using current existing proximity measures to describe the correlation of documents. In this paper, we propose an asymmetric similarity measure to strengthen the discriminative feature of document objects. We construct a semantic correlation network by asymmetric similarity between documents and conjecture the power law feature of the connections distributions. Hub points which exist in semantic correlation network are classified by an agglomerative hierarchical clustering approach named SCN. Both objects similarity and neighbors similarity are considered in the definition of hub points proximity. Finally, we assign the rest text objects to their nearest hub points. The experimental evaluation on textual data sets demonstrates the validity and efficiency of SCN. The comparison with other clustering algorithms shows the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD 2000 Workshop on Text Mining (2000)

    Google Scholar 

  2. Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  3. Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)

    Article  Google Scholar 

  4. Han, J., Kamber, M.: Data mining: concept and techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  5. Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer 32(8), 68–75 (1999)

    Google Scholar 

  6. Steyvers, M., Tenenbaum, J.: Small worlds in semantic networks. Unpublished manuscript (2001)

    Google Scholar 

  7. Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  8. Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)

    Article  MathSciNet  Google Scholar 

  9. Pissanetzky, S.: Sparse matrix technology. Academic Press, London (1984)

    MATH  Google Scholar 

  10. Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  11. Lang, K.: NewsWeeder: Learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, ICML 1995, pp. 331–339 (1995)

    Google Scholar 

  12. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000)

    Article  Google Scholar 

  13. Wermter, S., Hung, C.: Selforganising classification on the Reuters news corpus. In: The 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1086–1092 (2002)

    Google Scholar 

  14. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proc. of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, S., Li, C. (2005). Semantic Correlation Network Based Text Clustering. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_63

Download citation

  • DOI: https://doi.org/10.1007/11589990_63

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30462-3

  • Online ISBN: 978-3-540-31652-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics