Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems

Claudia Canali¹ &
Riccardo Lancellotti¹

489 Accesses
Explore all metrics

Abstract

Cloud computing has recently emerged as a new paradigm to provide computing services through large-size data centers where customers may run their applications in a virtualized environment. The advantages of cloud in terms of flexibility and economy encourage many enterprises to migrate from local data centers to cloud platforms, thus contributing to the success of such infrastructures. However, as size and complexity of cloud infrastructures grow, scalability issues arise in monitoring and management processes. Scalability issues are exacerbated because available solutions typically consider each virtual machine (VM) as a black box with independent characteristics, which is monitored at a fine-grained granularity level for management purposes, thus generating huge amounts of data to handle. We claim that scalability issues can be addressed by leveraging the similarity between VMs in terms of resource usage patterns. In this paper, we propose an automated methodology to cluster similar VMs starting from their resource usage information, assuming no knowledge of the software executed on them. This is an innovative methodology that combines the Bhattacharyya distance and ensemble techniques to provide a stable evaluation of similarity between probability distributions of multiple VM resource usage, considering both system- and network-related data. We evaluate the methodology through a set of experiments on data coming from an enterprise data center. We show that our proposal achieves high and stable performance in automatic VMs clustering, with a significant reduction in the amount of data collected which allows to lighten the monitoring requirements of a cloud data center.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

Notes

R project home page: http://www.r-project.org/.
Python home page: http://www.python.org/.
Bourne shell home page: http://www.gnu.org/software/bash/.
Cacti home page: http://www.cacti.net.
Munin home page: http://munin-monitoring.org/.
Ganglia Monitoring System home page: http://ganglia.sourceforge.net/.

References

Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. J. Inf. Retr. 12(4), 461–486 (2009)
Article Google Scholar
Andreolini, M., Colajanni, M., Tosi, S.: A software architecture for the analysis of large sets of data streams in cloud infrastructures. In: Proc. of the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Cyprus (2011)
Google Scholar
Ardagna, D., Panicucci, B., Trubian, M., Zhang, L.: Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Trans. Serv. Comput. 5(1), 2–19 (2012)
Article Google Scholar
Beloglazov, A., Buyya, R.: Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In: Proc. of (MGC’10), Bangalore, India (2010)
Google Scholar
Bhattacharyya, A.: On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)
MATH MathSciNet Google Scholar
Canali, C., Lancellotti, R.: Automated clustering of virtual machines based on correlation of resource usage. Commun. Softw. Syst. 8(4), 102–109 (2012a)
Google Scholar
Canali, C., Lancellotti, R.: Automated clustering of VMs for scalable cloud monitoring and management. In: Proc. of 20th International Conference on Software, Telecommunications and Computer Networks (SOFTCOM’12), Split, Croatia (2012b)
Google Scholar
Canali, C., Lancellotti, R.: Automatic clustering of VM based on Bhattacharyya distance. In: Proc. of International Workshop on Multi-Cloud Applications and Federated Clouds (MultiCloud’13), Prague, Czech Republic (2013)
Google Scholar
Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: OSDI, pp. 173–186 (1999)
Google Scholar
Choi, E., Lee, C.: Feature extraction based on the Bhattacharyya distance. Pattern Recognit. 36(8), 1703–1709 (2003)
Article Google Scholar
Chung, W.C., Chang, R.S.: A new mechanism for resource monitoring in grid computing. Future Gener. Comput. Syst. 25(1), 1–7 (2009)
Article Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’04, pp. 551–556. ACM, New York (2004). doi:10.1145/1014052.1014118
Google Scholar
Durkee, D.: Why cloud computing will never be free. ACM Queue 8(4), 20:20–20:29 (2010)
Google Scholar
Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008)
Article MATH Google Scholar
Freedman, D., Diaconis, P.: On the histogram as a density estimator:L2 theory. Probab. Theory Relat. Fields 57(4), 453–476 (1981)
MATH MathSciNet Google Scholar
Gmach, D., Rolia, J., Cherkasova, L., Kemper, A.: Resource pool management: reactive versus proactive or let’s be friends. Comput. Netw. 53(17), 2905–2922 (2009)
Article Google Scholar
Gong, Z., Gu, X.: PAC: pattern-driven application consolidation for efficient cloud computing. In: Proc. of IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS’10), Miami Beach, Florida (2010)
Google Scholar
Gullo, F., Tagarelli, A., Greco, S.: Diversity-based weighting schemes for clustering ensembles. In: Proc. of the 9th SIAM International Conference on Data Mining (SDM’09), Sparks, Nevada, USA (2009)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab—an S4 package for kernel methods in R. Tech. Rep. 9, WU Vienna University of Economics and Business (2004)
Kusic, D., Kephart, J.O., Hanson, J.E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environment via lookahead. Clust. Comput. 12(1), 1–15 (2009)
Google Scholar
Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). doi:10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Meng, X., Pappas, V., Zhang, L.: Improving the scalability of data center networks with traffic-aware virtual machine placement. In: Proceedings of the 29th Conference on Information Communications, INFOCOM’10, San Diego, California, USA (2010)
Google Scholar
Naeem, A.N., Ramadass, S., Yong, C.: Controlling scale sensor networks data quality in the ganglia grid monitoring tool. Commun. Comput. 7(11), 18–26 (2010)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2001)
Google Scholar
Sanguinetti, G., Laidler, J., Lawrence, N.: Automatic determination of the number of clusters using spectral algorithms. In: IEEE Workshop on Machine Learning for Signal Processing, pp. 55–60 (2005). doi:10.1109/MLSP.2005.1532874
Chapter Google Scholar
Scott, D.W.: On optimal and data-based histograms. Biometrika 66(3), 605–610 (1979)
Article MATH MathSciNet Google Scholar
Setzer, T., Stage, A.: Decision support for virtual machine reassignments in enterprise data centers. In: Proc. of IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS’10), Osaka, Japan (2010)
Google Scholar
Setzer, T., Stage, A.: Filtering multivariate workload non-conformance in shared IT-infrastructures. In: Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’11), Dublin, Ireland (2011)
Google Scholar
Singh, R., Shenoy, P.J., Natu, M., Sadaphal, V.P., Vin, H.M.: Predico: a system for what-if analysis in complex data center applications. In: Proc. of 12th International Middleware Conference, Lisbon, Portugal (2011)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)
MATH MathSciNet Google Scholar
Tan, J., Dube, P., Meng, X., Zhang, L.: Exploiting resource usage patterns for better utilization prediction. In: Proc. of the 31st International Conference on Distributed Computing Systems Workshops (ICDCSW’11), Minneapolis, USA (2011)
Google Scholar
Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A scalable application placement controller for enterprise data centers. In: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada (2007)
Google Scholar
Tu, C.Y., Kuo, W.C., Teng, W.H., Wang, Y.T., Shiau, S.: A power-aware cloud architecture with smart metering. In: Proc. of 39th International Conference on Parallel Processing Workshops (ICPPW’10), San Diego, CA (2010)
Google Scholar
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007a)
Google Scholar
Wood, T., Shenoy, P., Venkataramani, A., Yousif, M.: Black-box and gray-box strategies for virtual machine migration. In: Proc. of the 4th USENIX Conference on Networked Systems Design and Implementation, NSDI’07, Cambridge, MA (2007b)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Modena, Italy
Claudia Canali & Riccardo Lancellotti

Authors

Claudia Canali
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo Lancellotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudia Canali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Canali, C., Lancellotti, R. Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems. Autom Softw Eng 21, 319–344 (2014). https://doi.org/10.1007/s10515-013-0134-y

Download citation

Received: 30 October 2012
Accepted: 06 September 2013
Published: 28 September 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10515-013-0134-y

Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A new weighted fuzzy C-means clustering for workload monitoring in cloud datacenter platforms

Virtual Machine Allocation in Heterogeneous Cloud for Load Balancing Based on Virtual Machine Classification

An Efficient Approach for VM and Database Segmentation of Cloud Resources Over Cloud Computing

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Exploiting ensemble techniques for automatic virtual machine clustering in cloud systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A new weighted fuzzy C-means clustering for workload monitoring in cloud datacenter platforms

Virtual Machine Allocation in Heterogeneous Cloud for Load Balancing Based on Virtual Machine Classification

An Efficient Approach for VM and Database Segmentation of Cloud Resources Over Cloud Computing

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation