Abstract
Cloud computing has recently emerged as a leading paradigm to allow customers to run their applications in virtualized large-scale data centers. Existing solutions for monitoring and management of these infrastructures consider virtual machines (VMs) as independent entities with their own characteristics. However, these approaches suffer from scalability issues due to the increasing number of VMs in modern cloud data centers. We claim that scalability issues can be addressed by leveraging the similarity among VMs behavior in terms of resource usage patterns. In this paper we propose an automated methodology to cluster VMs starting from the usage of multiple resources, assuming no knowledge of the services executed on them. The innovative contribution of the proposed methodology is the use of the statistical technique known as principal component analysis (PCA) to automatically select the most relevant information to cluster similar VMs. We apply the methodology to two case studies, a virtualized testbed and a real enterprise data center. In both case studies, the automatic data selection based on PCA allows us to achieve high performance, with a percentage of correctly clustered VMs between 80% and 100% even for short time series (1 day) of monitored data. Furthermore, we estimate the potential reduction in the amount of collected data to demonstrate how our proposal may address the scalability issues related to monitoring and management in cloud computing data centers.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Singh R, Shenoy P J, Natu M, Sadaphal V P, Vin H M. Predico: A system for what-if analysis in complex data center applications. In Proc. the 12th International Middleware Conference, Dec. 2011, pp.123-142.
Wood T, Shenoy P, Venkataramani A, Yousif M. Black-box and gray-box strategies for virtual machine migration. In Proc. the 4th USENIX Conference on Networked Systems Design and Implementation, Apr. 2007, pp.229-242.
Andreolini M, Colajanni M, Tosi S. A software architecture for the analysis of large sets of data streams in cloud infras-tructures. In Proc. the 11th IEEE International Conference on Computer and Information Technology (IEEE CIT 2011), Aug. 31-Sept. 2, 2011, pp.389-394.
Ardagna D, Panicucci B, Trubian M, Zhang L. Energy-aware autonomic resource allocation in multitier virtualized environments. IEEE Transactions on Services Computing, 2012, 5(1): 2–19.
Beloglazov A, Buyya R. Adaptive threshold-based approach for energy-efficient consolidation of virtual machines in cloud data centers. In Proc. the 8th Int. Workshop on Middlewave for Grids, Clouds and e-Science, Dec. 2010, Article No.4.
Gmach D, Rolia J, Cherkasova L, Kemper A. Resource pool management: Reactive versus proactive or let’s be friends. Computer Networks, 2009, 53(17): 2905–2922.
Lancellotti R, Andreolini M, Canali C, Colajanni M. Dynamic request management algorithms for Web-based services in cloud computing. In Proc. the 35th IEEE Computer Soft-ware and Applications Conference, Jul. 2011, pp.401-406.
Tang C, Steinder M, Spreitzer M, Pacifici G. A scalable application placement controller for enterprise data centers. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.331-340.
Durkee D. Why cloud computing will never be free. Queue, 2010, 8(4): 20:20–20:29.
Canali C, Lancellotti R. Automated clustering of virtual machines based on correlation of resource usage. Communications Software and Systems, 2012, 8(4): 102–109.
Canali C, Lancellotti R. Automated clustering of VMs for scalable cloud monitoring and management. In Proc. the 20th International Conference on Software, Telecommunications and Computer Networks, Sept. 2012, pp.1-5.
Gong Z, Gu X. PAC: Pattern-driven application consolidation for efficient cloud computing. In Proc. the IEEE Int. Symp. Modeling, Analysis & Simulation of Computer and Telecommunication Systems, Aug. 2010, pp.24-33.
Setzer T, Stage A. Decision support for virtual machine reassignments in enterprise data centers. In Proc. the IEEE/IFIP Network Operations and Management Symposium Workshops (NOMS), Apr. 2010, pp.88-94.
Castro M, Liskov B. Practical Byzantine fault tolerance. In Proc. the 3rd Symposium on Operating Systems Design and Implementation, Feb. 1999, pp.173-186.
Cecchet E, Chanda A, Elnikety S, Marguerite J, Zwaenepoel W. Performance comparison of middleware architectures for generating dynamic Web content. In Proc. the 4th International Middleware Conference, Jun. 2003, pp.242-261.
Kavalanekar S, Narayanan D, Sankar S, Thereska E, Vaid K, Worthington B. Measuring database performance in on-line services: A trace-based approach. In Lecture Notes in Computer Science 5895, Nambiar R, Poess M (eds.), Berlin, Heidelberg: Springer-Verlag, 2009, pp.132-145.
de Menezes M A, Barabási A L. Separating internal and external dynamics of complex systems. Physical Review Letters, 2004, 93(6).
Hyvärinen A, Oja E. Independent component analysis: Algorithms and applications. Neural Networks, 2000, 13(4/5): 411–430.
Greenacre M. Correspondence Analysis in Practice. Chapman and Hall/CRC, 2007.
Mardia K V, Kent J T, Bibby J M. Multivariate Analysis (Probability and Mathematical Statistics). Academic Press, 1995.
Abdi H, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459.
Jain A K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 2010, 31(8): 651–666.
Filippone M, Camastra F, Masulli F, Rovetta S. A survey of kernel and spectral methods for clustering. Pattern Recognition, 2008, 41(1): 176–190.
Andreolini M, Colajanni M, Pietri M. A scalable architecture for real-time monitoring of large information systems. In Proc. the 2nd IEEE Symposium on Network Cloud Computing and Applications, Dec. 2012, pp.143-150.
Dinda P A, O’Hallaron D R. Host load prediction using linear models. Cluster Computing, 2000, 3(4): 265–280.
Vogels W. Beyond server consolidation. ACM Queue, 2008, 6(1): 20–26.
Amigó E, Gonzalo J, Artiles J, Verdejo F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval, 2009, 12(4): 461-486.
Manning C D, Raghavan P, Schtze H. Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.
Kusic D, Kephart J O, Hanson J E, Kandasamy N, Jiang G. Power and performance management of virtualized computing environment via lookahead. Cluster Computing, 2009, 12(1): 1–15.
Chung W C, Chang R S. A new mechanism for resource monitoring in Grid computing. Future Generation Computer Systems, 2009, 25(1): 1–7.
Naeem A N, Ramadass S, Yong C. Controlling scale sensor networks data quality in the Ganglia grid monitoring tool. Communication and Computer, 2010, 7(11): 18–26.
Tu C Y, Kuo W C, Teng W H, Wang Y T, Shiau S. A power- aware cloud architecture with smart metering. In Proc. the 39th International Conference on Parallel Processing Work-shops, Sept. 2010, pp.497-503.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Canali, C., Lancellotti, R. Improving Scalability of Cloud Monitoring Through PCA-Based Clustering of Virtual Machines. J. Comput. Sci. Technol. 29, 38–52 (2014). https://doi.org/10.1007/s11390-013-1410-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1410-9