Abstract
Cloud computing has recently emerged as a new paradigm to provide computing services through large-size data centers where customers may run their applications in a virtualized environment. The advantages of cloud in terms of flexibility and economy encourage many enterprises to migrate from local data centers to cloud platforms, thus contributing to the success of such infrastructures. However, as size and complexity of cloud infrastructures grow, scalability issues arise in monitoring and management processes. Scalability issues are exacerbated because available solutions typically consider each virtual machine (VM) as a black box with independent characteristics, which is monitored at a fine-grained granularity level for management purposes, thus generating huge amounts of data to handle. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper, we propose an automated methodology to cluster similar VMs starting from their resource usage information, assuming no knowledge of the software executed on them. This is an innovative methodology that combines the Bhattacharyya distance and ensemble techniques to provide a stable evaluation of similarity between probability distributions of multiple VM resource usage, considering both system- and network-related data. We evaluate the methodology through a set of experiments on data coming from an enterprise data center. We show that our proposal achieves high and stable performance in automatic VMs clustering, with a significant reduction in the amount of data collected which allows to lighten the monitoring requirements of a cloud data center.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
R project home page: http://www.r-project.org/.
Python home page: http://www.python.org/.
Bourne shell home page: http://www.gnu.org/software/bash/.
Cacti home page: http://www.cacti.net.
Munin home page: http://munin-monitoring.org/.
Ganglia Monitoring System home page: http://ganglia.sourceforge.net/.
References
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. J. Inf. Retr. 12(4), 461–486 (2009)
Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: Proc. of the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Cyprus (2011)
Ardagna, D., Panicucci, B., Trubian, M., Zhang, L.: Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans. Serv. Comput. 5(1), 2–19 (2012)
Beloglazov, A., Buyya, R.: Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proc. of (MGC’10), Bangalore, India (2010)
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
Canali, C., Lancellotti, R.: Automated clustering of virtual machines based on correlation of resource usage. Commun. Softw. Syst. 8(4), 102–109 (2012a)
Canali, C., Lancellotti, R.: Automated clustering of VMs for scalable cloud monitoring and management. In: Proc. of 20th International Conference on Software, Telecommunications and Computer Networks (SOFTCOM’12), Split, Croatia (2012b)
Canali, C., Lancellotti, R.: Automatic clustering of VM based on Bhattacharyya distance. In: Proc. of International Workshop on Multi-Cloud Applications and Federated Clouds (MultiCloud’13), Prague, Czech Republic (2013)
Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: OSDI, pp. 173–186 (1999)
Choi, E., Lee, C.: Feature extraction based on the Bhattacharyya distance. Pattern Recognit. 36(8), 1703–1709 (2003)
Chung, W.C., Chang, R.S.: A new mechanism for resource monitoring in grid computing. Future Gener. Comput. Syst. 25(1), 1–7 (2009)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’04, pp. 551–556. ACM, New York (2004). doi:10.1145/1014052.1014118
Durkee, D.: Why cloud computing will never be free. ACM Queue 8(4), 20:20–20:29 (2010)
Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008)
Freedman, D., Diaconis, P.: On the histogram as a density estimator:L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981)
Gmach, D., Rolia, J., Cherkasova, L., Kemper, A.: Resource pool management: reactive versus proactive or let’s be friends. Comput. Netw. 53(17), 2905–2922 (2009)
Gong, Z., Gu, X.: PAC: pattern-driven application consolidation for efficient cloud computing. In: Proc. of IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS’10), Miami Beach, Florida (2010)
Gullo, F., Tagarelli, A., Greco, S.: Diversity-based weighting schemes for clustering ensembles. In: Proc. of the 9th SIAM International Conference on Data Mining (SDM’09), Sparks, Nevada, USA (2009)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab—an S4 package for kernel methods in R. Tech. Rep. 9, WU Vienna University of Economics and Business (2004)
Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environment via lookahead. Clust. Comput. 12(1), 1–15 (2009)
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). doi:10.1007/s11222-007-9033-z
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, San Diego, California, USA (2010)
Naeem, A.N., Ramadass, S., Yong, C.: Controlling scale sensor networks data quality in the ganglia grid monitoring tool. Commun. Comput. 7(11), 18–26 (2010)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2001)
Sanguinetti, G., Laidler, J., Lawrence, N.: Automatic determination of the number of clusters using spectral algorithms. In: IEEE Workshop on Machine Learning for Signal Processing, pp. 55–60 (2005). doi:10.1109/MLSP.2005.1532874
Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979)
Setzer, T., Stage, A.: Decision support for virtual machine reassignments in enterprise data centers. In: Proc. of IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS’10), Osaka, Japan (2010)
Setzer, T., Stage, A.: Filtering multivariate workload non-conformance in shared IT-infrastructures. In: Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’11), Dublin, Ireland (2011)
Singh, R., Shenoy, P.J., Natu, M., Sadaphal, V.P., Vin, H.M.: Predico: a system for what-if analysis in complex data center applications. In: Proc. of 12th International Middleware Conference, Lisbon, Portugal (2011)
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
Tan, J., Dube, P., Meng, X., Zhang, L.: Exploiting resource usage patterns for better utilization prediction. In: Proc. of the 31st International Conference on Distributed Computing Systems Workshops (ICDCSW’11), Minneapolis, USA (2011)
Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A scalable application placement controller for enterprise data centers. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada (2007)
Tu, C.Y., Kuo, W.C., Teng, W.H., Wang, Y.T., Shiau, S.: A power-aware cloud architecture with smart metering. In: Proc. of 39th International Conference on Parallel Processing Workshops (ICPPW’10), San Diego, CA (2010)
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007a)
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proc. of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007b)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Canali, C., Lancellotti, R. Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems. Autom Softw Eng 21, 319–344 (2014). https://doi.org/10.1007/s10515-013-0134-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-013-0134-y