Authors:
Elnaz Bigdeli
1
;
Mahdi Mohammadi
2
;
Bijan Raahemi
2
and
Stan Matwin
3
Affiliations:
1
Ottawa University, Canada
;
2
University of Ottawa, Canada
;
3
Dalhousie, Canada
Keyword(s):
Density-based Clustering, Cluster Summarization, Gaussian Mixture Model.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
One of the main concerns in the area of arbitrary shape clustering is how to summarize clusters. An accurate representation of clusters with arbitrary shapes is to characterize a cluster with all its members. However, this approach is neither practical nor efficient. In many applications such as stream data mining, preserving all samples for a long period of time in presence of thousands of incoming samples is not practical. Moreover, in the absence of labelled data, clusters are representative of each class, and in case of arbitrary shape clusters, finding the closest cluster to a new incoming sample using all objects of clusters is not accurate and efficient. In this paper, we present a new algorithm to summarize arbitrary shape clusters. Our proposed method, called SGMM, summarizes a cluster using a set of objects as core objects, then represents each cluster with corresponding Gaussian Mixture Model (GMM). Using GMM, the closest cluster to the new test sample is identified with l
ow computational cost. We compared the proposed method with ABACUS, a well-known algorithm, in terms of time, space and accuracy for both categorization and summarization purposes. The experimental results confirm that the proposed method outperforms ABACUS on various datasets including syntactic and real datasets.
(More)