Semantic Correlation Network Based Text Clustering

Shaoxu Song²⁰ &
Chunping Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3809))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1813 Accesses

Abstract

Text documents have sparse data spaces, and nearest neighbors may belong to different classes when using current existing proximity measures to describe the correlation of documents. In this paper, we propose an asymmetric similarity measure to strengthen the discriminative feature of document objects. We construct a semantic correlation network by asymmetric similarity between documents and conjecture the power law feature of the connections distributions. Hub points which exist in semantic correlation network are classified by an agglomerative hierarchical clustering approach named SCN. Both objects similarity and neighbors similarity are considered in the definition of hub points proximity. Finally, we assign the rest text objects to their nearest hub points. The experimental evaluation on textual data sets demonstrates the validity and efficiency of SCN. The comparison with other clustering algorithms shows the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Semantic Framework to Text Clustering with Neighbors

Locality-Sensitive Term Weighting for Short Text Clustering

Word Mover’s Distance for Agglomerative Short Text Clustering

References

Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD 2000 Workshop on Text Mining (2000)
Google Scholar
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading (1989)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)
Article Google Scholar
Han, J., Kamber, M.: Data mining: concept and techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer 32(8), 68–75 (1999)
Google Scholar
Steyvers, M., Tenenbaum, J.: Small worlds in semantic networks. Unpublished manuscript (2001)
Google Scholar
Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press, Cambridge (1998)
MATH Google Scholar
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet Google Scholar
Pissanetzky, S.: Sparse matrix technology. Academic Press, London (1984)
MATH Google Scholar
Lewis, D.D., Yang, Y., Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)
Google Scholar
Lang, K.: NewsWeeder: Learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, ICML 1995, pp. 331–339 (1995)
Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11(3), 574–585 (2000)
Article Google Scholar
Wermter, S., Hung, C.: Selforganising classification on the Reuters news corpus. In: The 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1086–1092 (2002)
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proc. of the 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, 100084, China
Shaoxu Song & Chunping Li

Authors

Shaoxu Song
View author publications
You can also search for this author in PubMed Google Scholar
Chunping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Guangxi Normal University, College of CS and IT, Guilin, China, and University of Technology, Faculty of Engineering and Information Technology, Sydney, Australia
Shichao Zhang
Department of Electrical and Computer Systems Engineering, Monash University, 3800, Melbourne, Victoria, Australia
Ray Jarvis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, S., Li, C. (2005). Semantic Correlation Network Based Text Clustering. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_63

Download citation

DOI: https://doi.org/10.1007/11589990_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Correlation Network Based Text Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Semantic Framework to Text Clustering with Neighbors

Locality-Sensitive Term Weighting for Short Text Clustering

Word Mover’s Distance for Agglomerative Short Text Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Correlation Network Based Text Clustering

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Semantic Framework to Text Clustering with Neighbors

Locality-Sensitive Term Weighting for Short Text Clustering

Word Mover’s Distance for Agglomerative Short Text Clustering

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation