Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ICMLA.2012.84guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Scalable Overlapping Co-clustering of Word-Document Data

Published: 12 December 2012 Publication History

Abstract

Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.

Cited By

View all
  • (2016)A hash-based co-clustering algorithm for categorical dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.07.02464:C(24-35)Online publication date: 1-Dec-2016
  • (2015)A biclustering approach for classification with mislabeled dataExpert Systems with Applications: An International Journal10.5555/2781921.278248542:12(5065-5075)Online publication date: 15-Jul-2015

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICMLA '12: Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01
December 2012
706 pages
ISBN:9780769549132

Publisher

IEEE Computer Society

United States

Publication History

Published: 12 December 2012

Author Tags

  1. co-clustering
  2. hashing
  3. text clustering

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)A hash-based co-clustering algorithm for categorical dataExpert Systems with Applications: An International Journal10.1016/j.eswa.2016.07.02464:C(24-35)Online publication date: 1-Dec-2016
  • (2015)A biclustering approach for classification with mislabeled dataExpert Systems with Applications: An International Journal10.5555/2781921.278248542:12(5065-5075)Online publication date: 15-Jul-2015

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media