Abstract
The last decade has seen a growing body of research illustrating the advantages of the evolutionary multiobjective approach to data clustering. The scalability of such an approach, however, is a topic which merits more attention given the unprecedented volumes of data generated nowadays. This paper proposes a reduced-length representation for evolutionary multiobjective clustering. The new encoding explicitly prunes the solution space and allows the search method to focus on its most promising regions. Moreover, it allows us to precompute information in order to alleviate the computational overhead caused by the processing of candidate individuals during optimisation. We investigate the suitability of this proposal in the context of a representative algorithm from the literature: MOCK. Our results indicate that the new reduced-length representation significantly improves the effectiveness and computational efficiency of MOCK specifically, and can be seen as a further step towards a better scalability of evolutionary multiobjective clustering in general.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Whereas the clustering phase is responsible for generating a PFA comprising high-quality partitions, the model-selection phase is concerned with selecting and reporting one (or more) candidate partition(s) from this PFA as the problem’s solution.
- 2.
It should be noted that changing the optimisation criteria is the only adaptation of MOCK required by the representation scheme proposed in this paper. Such an adaptation, however, is only intended to exploit the advantages that the new representation can provide in terms of computational efficiency; this change is not found to affect MOCK’s behaviour and performance as discussed at the end of Sect. 3.2.
- 3.
Notice that when parameter \(\delta \) is set to \({\delta =0}\), the encoding scheme proposed here is equivalent to the original (full-length) locus-based adjacency representation.
- 4.
Since the creation of new solutions relies mainly on recombination (due to the low mutation rates used), the genetic material introduced during initialisation plays a key role in delimiting the extent of the solution space that is reached by the method.
- 5.
During the evolutionary process, the clusters defined by the partial solution are combined, which can only lead to the decrease of k and the increase of VAR.
References
Chan, T., Golub, G., LeVeque, R.: Algorithms for computing the sample variance: analysis and recommendations. Am. Stat. 37(3), 242–247 (1983)
Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M.J.: PESA-II: region-based selection in evolutionary multiobjective optimization. In: Genetic and Evolutionary Computation Conference, pp. 283–290. Morgan Kaufmann Publishers, San Francisco (2001)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Grunert da Fonseca, V., Fonseca, C.M., Hall, A.O.: Inferential performance assessment of stochastic optimisers and the attainment function. In: Zitzler, E., Thiele, L., Deb, K., Coello Coello, C.A., Corne, D. (eds.) EMO 2001. LNCS, vol. 1993, pp. 213–225. Springer, Heidelberg (2001). doi:10.1007/3-540-44719-9_15
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2), 107–145 (2001)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Handl, J., Knowles, J.: An investigation of representations and operators for evolutionary data clustering with a variable number of clusters. In: Runarsson, T.P., Beyer, H.-G., Burke, E., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 839–849. Springer, Heidelberg (2006). doi:10.1007/11844297_85
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A survey of evolutionary algorithms for clustering. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 39(2), 133–155 (2009)
Ishibuchi, H., Masuda, H., Tanigaki, Y., Nojima, Y.: Modified distance calculation in generational distance and inverted generational distance. In: Gaspar-Cunha, A., Henggeler Antunes, C., Coello, C.C. (eds.) EMO 2015. LNCS, vol. 9019, pp. 110–125. Springer, Cham (2015). doi:10.1007/978-3-319-15892-1_8
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
López-Ibáñez, M., Paquete, L., Stützle, T.: Exploratory analysis of stochastic local search algorithms in biobjective optimization. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 209–222. Springer, Heidelberg (2010)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: A survey of multiobjective evolutionary clustering. ACM Comput. Surv. 47(4), 61:1–61:46 (2015)
Park, Y.J., Song, M.S.: A genetic algorithm for clustering problems. In: Genetic Programming, pp. 568–575. Morgan Kaufmann, Madison, July 1998
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Rothlauf, F., Goldberg, D.E.: Redundant representations in evolutionary computation. Evol. Comput. 11(4), 381–415 (2003)
Syswerda, G.: Uniform crossover in genetic algorithms. In: International Conference on Genetic Algorithms, pp. 2–9. Morgan Kaufmann Publishers Inc., San Francisco (1989)
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Grunert da Fonseca, V.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Garza-Fabre, M., Handl, J., Knowles, J. (2017). A New Reduced-Length Genetic Representation for Evolutionary Multiobjective Clustering. In: Trautmann, H., et al. Evolutionary Multi-Criterion Optimization. EMO 2017. Lecture Notes in Computer Science(), vol 10173. Springer, Cham. https://doi.org/10.1007/978-3-319-54157-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-54157-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54156-3
Online ISBN: 978-3-319-54157-0
eBook Packages: Computer ScienceComputer Science (R0)