Multiobjective clustering with automatic k-determination for large-scale data

N Matake, T Hiroyasu, M Miki, T Senda - … of the 9th annual conference on …, 2007 - dl.acm.org
N Matake, T Hiroyasu, M Miki, T Senda
Proceedings of the 9th annual conference on Genetic and evolutionary computation, 2007dl.acm.org
Web mining-data mining for web data-is a key factor of web technologies. Especially, web
behavior mining has attracted a great deal of attention recently. Behavior mining involves
analyzing the behavior of users, finding patterns of user behavior, and predicting their
subsequent behaviors or interests. Web behavior mining is used in web advertising systems
or content recommendation systems. To analyze huge amounts of data, such as web data,
data-clustering techniques are usually used. Data clustering is a technique involving the …
Web mining - data mining for web data - is a key factor of web technologies. Especially, web behavior mining has attracted a great deal of attention recently. Behavior mining involves analyzing the behavior of users, finding patterns of user behavior, and predicting their subsequent behaviors or interests. Web behavior mining is used in web advertising systems or content recommendation systems. To analyze huge amounts of data, such as web data, data-clustering techniques are usually used. Data clustering is a technique involving the separation of data into groups according to similarity, and is usually used in the first step of data mining. In the present study, we developed a scalable data-clustering algorithm for web mining based on existent evolutionary multiobjective clustering algorithm. To derive clusters, we applied multiobjective clustering with automatic k-determination (MOCK). It has been reported that MOCK shows better performance than k-means, agglutination methods, and other evolutionary clustering algorithms. MOCK can also find the appropriate number of clusters using the information of the trade-off curve. The k-determination scheme of MOCK is powerful and strict. However the computational costs are too high when applied to clustering huge data. In this paper, we propose a scalable automatic k-determination scheme. The proposed scheme reduces Pareto-size and the appropriate number of clusters can usually be determined.
ACM Digital Library