Abstract
Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern computational environments need applications that can recover quickly from major disruptions while also being environmentally sustainable. Balancing system resilience with energy efficiency is challenging, as efforts to improve one can harm the other. This paper presents a method to enhance disaster survivability in microservice architectures, particularly those using Kubernetes in cloud-based environments, focusing on optimizing electrical energy use. Aiming to save energy, our work adopt the consolidation strategy that means grouping multiple microservices on a single host. Our aproach uses a widely adopted analytical model, the Generalized Stochastic Petri Net (GSPN). GSPN are a powerful modeling technique that is widely used in various fields, including engineering, computer science, and operations research. One of the primary advantages of GSPN is its ability to model complex systems with a high degree of accuracy. Additionally, GSPN allows for the modeling of both logical and stochastic behavior, making it ideal for systems that involve a combination of both. Our GSPN models compute a number of metrics such as: recovery time, system availability, reliability, Mean Time to Failure, and the configuration of cloud-based microservices. We compared our approach against others focusing on survivability or efficiency. Our approach aligns with Recovery Time Objectives during sudden disasters and offers the fastest recovery, requiring 9% less warning time to fully recover in cases of disaster with alert when compared to strategies with similar electrical consumption. It also saves about 27% energy compared to low consolidation strategies and 5% against high consolidation under static conditions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
Data sharing not applicable.
References
Ramasamy B, Na Y, Kim W, Chea K, Kim J (2022) Hacm: high availability control method in container-based microservice applications over multiple clusters. IEEE Access 11:3461–3471
Detti A (2023) Microservices from cloud to edge: an analytical discussion on risks, opportunities and enablers. IEEE Access
Kubernetes Production-Grade Container Orchestration. https://kubernetes.io/. Accessed: 2023-08-21
Blinowski G, Ojdowska A, Przybyłek A (2022) Monolithic vs. microservice architecture: a performance and scalability evaluation. IEEE Access 10:20357–20374
Charfeddine L, Umlai M (2023) Ict sector, digitization and environmental sustainability: a systematic review of the literature from 2000 to 2022. Renew Sustain Energy Rev 184:113482
Wang JC (2022) Understanding the energy consumption of information and communications equipment: a case study of schools in taiwan. Energy 249:123701
Belkhir L, Elmeligi A (2018) Assessing ict global emissions footprint: trends to 2040 & recommendations. J Clean Prod 177:448–463
Tchana A, De Palma N, Safieddine I, Hagimont D (2016) Software consolidation as an efficient energy and cost saving solution. Future Gener Comput Syst 58:1–12
Helali L, Omri MN (2021) A survey of data center consolidation in cloud computing systems. Comput Sci Rev 39:100366
Abualkishik AZ, Alwan AA, Gulzar Y (2020) Disaster recovery in cloud computing systems: An overview. Int J Adv Comput Sci Appl 11(9)
Silvaa B, Maciela PRM, Zimmermannb A, Brilhantea J (2014) Survivability evaluation of disaster tolerant cloud computing systems. In: Proc. Probabilistic Safety Assessment & Management Conference, p 12
Trivedi KS, Xia R (2015) Quantification of system survivability. Telecommun Syst 60:451–470
Longo F, Ghosh R, Naik VK, Rindos AJ, Trivedi KS (2017) An approach for resiliency quantification of large scale systems. ACM Sigmetr Perform Eval Rev 44(4):37–48
Avizienis A, Laprie J-C, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secur Comput 1(1):11–33
Welsh T, Benkhelifa E (2020) On resilience in cloud computing: a survey of techniques across the cloud domain. ACM Comput Surv (CSUR) 53(3):1–36
Andrade E, Nogueira B (2019) Performability evaluation of a cloud-based disaster recovery solution for it environments. J Grid Comput 17:603–621
Di Mauro M, Galatro G, Longo M, Postiglione F, Tambasco M (2022) Performability analysis of containerized ims through queueing networks and stochastic models. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, pp 1–8. IEEE
Gorbenko A, Karpenko A, Tarasyuk O (2020) Analysis of trade-offs in fault-tolerant distributed computing and replicated databases. In: 2020 IEEE 11th International Conference on Dependable Systems, Services and Technologies (DESSERT), pp 1–6. IEEE
Nguyen TA, Kim DS, Park JS (2016) Availability modeling and analysis of a data center for disaster tolerance. Future Gener Comput Syst 56:27–50. https://doi.org/10.1016/j.future.2015.08.017
Hu H, Yu J, Li Z, Chen J, Hu H (2020) Modeling and analysis of cyber-physical system based on object-oriente generalized stochastic petri net. IEEE Trans Relia 70(3):1271–1285
Nourredine O, Menouar B, Campo E, Bossche A (2023) A new generalized stochastic petri net modeling for energy-harvesting-wireless sensor network assessment. Int J Commun Syst 36(11):5505
Sun X, Yu Z, Gao H, Li X (2023) Trustworthiness analysis and evaluation for command and control cyber-physical systems using generalized stochastic petri nets. Inf Sci 638:118942
Trivedi KS, Kim D-S, Ghosh R (2013) System availability assessment using stochastic models. Appl Stoch Models Bus Ind 29(2):94–109
Nong M, Huang L, Liu M (2022) Allocation of resources for cloud survivability in smart manufacturing. ACM Trans Manag Inf Syst (TMIS) 13(4):1–11
Ma L, Su W, Wu B, Yang B, Jiang X (2020) Early warning disaster-aware service protection in geo-distributed data centers. Comput Netw 180:107419
Ayoub O, De Sousa A, Mendieta S, Musumeci F, Tornatore M (2021) Online virtual machine evacuation for disaster resilience in inter-data center networks. IEEE Trans Netw Serv Manag 18(2):1990–2001
Colman-Meixner C, Dikbiyik F, Habib MF, Tornatore M, Chuah C-N, Mukherjee B (2014) Disaster-survivable cloud-network mapping. Photonic Netw Commun 27:141–153
Sun X, Lin C, Liu W, Xiao Y (2009) Survivability evaluation of distributed service using stochastic petri net. In: 2009 Fourth International Conference on Communications and Networking in China, pp 1–5. IEEE
Hamadah S, Aqel D (2019) A proposed virtual private cloud-based disaster recovery strategy. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp 469–473. IEEE
Isa ISM, Musa MO, El-Gorashi TE, Elmirghani JM (2019) Energy efficient and resilient infrastructure for fog computing health monitoring applications. In: 2019 21st International Conference on Transparent Optical Networks (ICTON), pp 1–5. IEEE
Gandhi A, Gupta V, Harchol-Balter M, Kozuch MA (2010) Optimality analysis of energy-performance trade-off for server farm management. Perform Eval 67(11):1155–1171
Silva Pinheiro TF, Pereira P, Silva B, Maciel P (2023) A performance modeling framework for microservices-based cloud infrastructures. The J Supercomput 79(7):7762–7803
Soylu GK, Demirörs O (2023) An exploratory case study: using petri nets for modelling microservice-based systems. In: 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp 254–261. IEEE
Fé I, Nguyen TA, Soares A, Son S, Choi E, Min D, Lee J-W, Silva FA (2023) Model-driven dependability and power consumption quantification of kubernetes based cloud-fog continuum. IEEE Access
Kaur S, Bawa S (2016) A review on energy aware vm placement and consolidation techniques. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 3, pp 1–7. IEEE
Sharma O, Saini H (2016) Vm consolidation for cloud data center using median based threshold approach. Proced Comput Sci 89:27–33
Pereira P, Melo C, Araujo J, Dantas J, Santos V, Maciel P (2022) Availability model for edge-fog-cloud continuum: an evaluation of an end-to-end infrastructure of intelligent traffic management service. The J Supercomput, 1–28
Clemente D, Pereira P, Dantas J, Maciel P (2022) Availability evaluation of system service hosted in private cloud computing through hierarchical modeling process. The J Supercomput 78(7):9985–10024
Bendechache M, Silva I, Santos GL, Guedes LA, Svorobej S, Mario MN, Ares ME, Byrne J, Endo PT, Lynn T (2019) Analysing dependability and performance of a real-world elastic search application. In: 2019 9th Latin-American Symposium on Dependable Computing (LADC), pp 1–8. IEEE
Silva FA, Brito C, Araújo G, Fé I, Tyan M, Lee J-W, Nguyen TA, Maciel PRM (2022) Model-driven impact quantification of energy resource redundancy and server rejuvenation on the dependability of medical sensor networks in smart hospitals. Sensors 22(4):1595
Melo C, Araujo J, Dantas J, Pereira P, Maciel P (2022) A model-based approach for planning blockchain service provisioning. Computing 104(2):315–337
Araujo E, Pereira P, Dantas J, Maciel P (2020) Dependability impact in the smart solar power systems: An analysis of smart buildings. Energies 14(1):124
Tuffin B, Choudhary P, Hirel C, Trivedi K (2007) Simulation versus analytic-numeric methods: a petri net example. In: Proc. of the 2nd VALUETOOLS Conference
Ungsunan PD, Lin C, Wang Y, Gai Y (2009) Network processing performability evaluation on heterogeneous reliability multicore processors using srn model. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp 1–6. IEEE
Maciel PRM (2023) Performance, Reliability, and Availability Evaluation of Computational Systems, Volume I: Performance and Background. CRC Press, New York
Sheldon FT, Greiner S, Benzinger M (2000) Specification, safety and reliability analysis using stochastic petri net models. In: Tenth International Workshop on Software Specification and Design. IWSSD-10 2000, pp 123–132. IEEE
Trivedi KS, Ciardo G, Malhotra M, Garg S (2005) Dependability and performability analysis using stochastic petri nets. In: 11th International Conference on Analysis and Optimization of Systems Discrete Event Systems: Sophia-Antipolis, June 15–16–17, 1994, pp 144–157. Springer
Jin C, Bai X, Yang C, Mao W, Xu X (2020) A review of power consumption models of servers in data centers. Appl Energy 265:114806
Lin W, Shi F, Wu W, Li K, Wu G, Mohammed A-A (2020) A taxonomy and survey of power models and power modeling for cloud servers. ACM Comput Surv (CSUR) 53(5):1–41
Tadesse SS, Malandrino F, Chiasserini C-F (2017) Energy consumption measurements in docker. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp 272–273. IEEE
AWS Instance types. https://aws.amazon.com/pt/ec2/instance-types/. Accessed: 2025-05-02
Gomes C, Tavares E, Junior MNDO, Nogueira B (2022) Cloud storage availability and performance assessment: a study based on nosql dbms. The J Supercomput 78(2):2819–2839
Kharchenko V, Ponochovnyi Y, Ivanchenko O, Fesenko H, Illiashenko O (2022) Combining markov and semi-markov modelling for assessing availability and cybersecurity of cloud and iot systems. Cryptography 6(3):44
Sebastio S, Ghosh R, Mukherjee T (2018) An availability analysis approach for deployment configurations of containers. IEEE Trans Serv Comput 14(1):16–29
Morabito R (2015) Power consumption of virtualization technologies: an empirical investigation. In: 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC), pp 522–527. IEEE
Maciel P, Matos R, Silva B, Figueiredo J, Oliveira D, Fé I, Maciel R, Dantas J (2017) Mercury: Performance and dependability evaluation of systems with exponential, expolynomial, and general distributions. In: 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC), pp 50–57. IEEE
Melo C, Dantas J, Oliveira A, Oliveira D, Fé I, Araujo J, Matos R, Maciel P (2018) Availability models for hyper-converged cloud computing infrastructures. In: 2018 Annual IEEE International Systems Conference (SysCon), pp 1–7. IEEE
Gonçalves I, Rodrigues L, Silva FA, Nguyen TA, Min D, Lee J-W (2021) Surveillance system in smart cities: a dependability evaluation based on stochastic models. Electronics 10(8):876
Jain R (1991) The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, vol 1. Wiley, New York
Funding
This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(No. 2020R1A6A1A03046811). This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2021R1A2C2094943)
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fé, I., Nguyen, T.A., Mauro, M.D. et al. Energy-aware dynamic response and efficient consolidation strategies for disaster survivability of cloud microservices architecture. Computing 106, 2737–2783 (2024). https://doi.org/10.1007/s00607-024-01305-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-024-01305-x