Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Energy-aware dynamic response and efficient consolidation strategies for disaster survivability of cloud microservices architecture

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern computational environments need applications that can recover quickly from major disruptions while also being environmentally sustainable. Balancing system resilience with energy efficiency is challenging, as efforts to improve one can harm the other. This paper presents a method to enhance disaster survivability in microservice architectures, particularly those using Kubernetes in cloud-based environments, focusing on optimizing electrical energy use. Aiming to save energy, our work adopt the consolidation strategy that means grouping multiple microservices on a single host. Our aproach uses a widely adopted analytical model, the Generalized Stochastic Petri Net (GSPN). GSPN are a powerful modeling technique that is widely used in various fields, including engineering, computer science, and operations research. One of the primary advantages of GSPN is its ability to model complex systems with a high degree of accuracy. Additionally, GSPN allows for the modeling of both logical and stochastic behavior, making it ideal for systems that involve a combination of both. Our GSPN models compute a number of metrics such as: recovery time, system availability, reliability, Mean Time to Failure, and the configuration of cloud-based microservices. We compared our approach against others focusing on survivability or efficiency. Our approach aligns with Recovery Time Objectives during sudden disasters and offers the fastest recovery, requiring 9% less warning time to fully recover in cases of disaster with alert when compared to strategies with similar electrical consumption. It also saves about 27% energy compared to low consolidation strategies and 5% against high consolidation under static conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

Data sharing not applicable.

Notes

  1. https://gitlab.com/iuresf/disaster_validator.

  2. https://www.modcs.org/

References

  1. Ramasamy B, Na Y, Kim W, Chea K, Kim J (2022) Hacm: high availability control method in container-based microservice applications over multiple clusters. IEEE Access 11:3461–3471

    Article  Google Scholar 

  2. Detti A (2023) Microservices from cloud to edge: an analytical discussion on risks, opportunities and enablers. IEEE Access

  3. Kubernetes Production-Grade Container Orchestration. https://kubernetes.io/. Accessed: 2023-08-21

  4. Blinowski G, Ojdowska A, Przybyłek A (2022) Monolithic vs. microservice architecture: a performance and scalability evaluation. IEEE Access 10:20357–20374

    Article  Google Scholar 

  5. Charfeddine L, Umlai M (2023) Ict sector, digitization and environmental sustainability: a systematic review of the literature from 2000 to 2022. Renew Sustain Energy Rev 184:113482

    Article  Google Scholar 

  6. Wang JC (2022) Understanding the energy consumption of information and communications equipment: a case study of schools in taiwan. Energy 249:123701

    Article  Google Scholar 

  7. Belkhir L, Elmeligi A (2018) Assessing ict global emissions footprint: trends to 2040 & recommendations. J Clean Prod 177:448–463

    Article  Google Scholar 

  8. Tchana A, De Palma N, Safieddine I, Hagimont D (2016) Software consolidation as an efficient energy and cost saving solution. Future Gener Comput Syst 58:1–12

    Article  Google Scholar 

  9. Helali L, Omri MN (2021) A survey of data center consolidation in cloud computing systems. Comput Sci Rev 39:100366

    Article  Google Scholar 

  10. Abualkishik AZ, Alwan AA, Gulzar Y (2020) Disaster recovery in cloud computing systems: An overview. Int J Adv Comput Sci Appl 11(9)

  11. Silvaa B, Maciela PRM, Zimmermannb A, Brilhantea J (2014) Survivability evaluation of disaster tolerant cloud computing systems. In: Proc. Probabilistic Safety Assessment & Management Conference, p 12

  12. Trivedi KS, Xia R (2015) Quantification of system survivability. Telecommun Syst 60:451–470

    Article  Google Scholar 

  13. Longo F, Ghosh R, Naik VK, Rindos AJ, Trivedi KS (2017) An approach for resiliency quantification of large scale systems. ACM Sigmetr Perform Eval Rev 44(4):37–48

    Article  Google Scholar 

  14. Avizienis A, Laprie J-C, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secur Comput 1(1):11–33

    Article  Google Scholar 

  15. Welsh T, Benkhelifa E (2020) On resilience in cloud computing: a survey of techniques across the cloud domain. ACM Comput Surv (CSUR) 53(3):1–36

    Article  Google Scholar 

  16. Andrade E, Nogueira B (2019) Performability evaluation of a cloud-based disaster recovery solution for it environments. J Grid Comput 17:603–621

    Article  Google Scholar 

  17. Di Mauro M, Galatro G, Longo M, Postiglione F, Tambasco M (2022) Performability analysis of containerized ims through queueing networks and stochastic models. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, pp 1–8. IEEE

  18. Gorbenko A, Karpenko A, Tarasyuk O (2020) Analysis of trade-offs in fault-tolerant distributed computing and replicated databases. In: 2020 IEEE 11th International Conference on Dependable Systems, Services and Technologies (DESSERT), pp 1–6. IEEE

  19. Nguyen TA, Kim DS, Park JS (2016) Availability modeling and analysis of a data center for disaster tolerance. Future Gener Comput Syst 56:27–50. https://doi.org/10.1016/j.future.2015.08.017

    Article  Google Scholar 

  20. Hu H, Yu J, Li Z, Chen J, Hu H (2020) Modeling and analysis of cyber-physical system based on object-oriente generalized stochastic petri net. IEEE Trans Relia 70(3):1271–1285

    Article  Google Scholar 

  21. Nourredine O, Menouar B, Campo E, Bossche A (2023) A new generalized stochastic petri net modeling for energy-harvesting-wireless sensor network assessment. Int J Commun Syst 36(11):5505

    Article  Google Scholar 

  22. Sun X, Yu Z, Gao H, Li X (2023) Trustworthiness analysis and evaluation for command and control cyber-physical systems using generalized stochastic petri nets. Inf Sci 638:118942

    Article  Google Scholar 

  23. Trivedi KS, Kim D-S, Ghosh R (2013) System availability assessment using stochastic models. Appl Stoch Models Bus Ind 29(2):94–109

    Article  MathSciNet  Google Scholar 

  24. Nong M, Huang L, Liu M (2022) Allocation of resources for cloud survivability in smart manufacturing. ACM Trans Manag Inf Syst (TMIS) 13(4):1–11

    Article  Google Scholar 

  25. Ma L, Su W, Wu B, Yang B, Jiang X (2020) Early warning disaster-aware service protection in geo-distributed data centers. Comput Netw 180:107419

    Article  Google Scholar 

  26. Ayoub O, De Sousa A, Mendieta S, Musumeci F, Tornatore M (2021) Online virtual machine evacuation for disaster resilience in inter-data center networks. IEEE Trans Netw Serv Manag 18(2):1990–2001

    Article  Google Scholar 

  27. Colman-Meixner C, Dikbiyik F, Habib MF, Tornatore M, Chuah C-N, Mukherjee B (2014) Disaster-survivable cloud-network mapping. Photonic Netw Commun 27:141–153

    Article  Google Scholar 

  28. Sun X, Lin C, Liu W, Xiao Y (2009) Survivability evaluation of distributed service using stochastic petri net. In: 2009 Fourth International Conference on Communications and Networking in China, pp 1–5. IEEE

  29. Hamadah S, Aqel D (2019) A proposed virtual private cloud-based disaster recovery strategy. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp 469–473. IEEE

  30. Isa ISM, Musa MO, El-Gorashi TE, Elmirghani JM (2019) Energy efficient and resilient infrastructure for fog computing health monitoring applications. In: 2019 21st International Conference on Transparent Optical Networks (ICTON), pp 1–5. IEEE

  31. Gandhi A, Gupta V, Harchol-Balter M, Kozuch MA (2010) Optimality analysis of energy-performance trade-off for server farm management. Perform Eval 67(11):1155–1171

    Article  Google Scholar 

  32. Silva Pinheiro TF, Pereira P, Silva B, Maciel P (2023) A performance modeling framework for microservices-based cloud infrastructures. The J Supercomput 79(7):7762–7803

    Article  Google Scholar 

  33. Soylu GK, Demirörs O (2023) An exploratory case study: using petri nets for modelling microservice-based systems. In: 2023 49th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp 254–261. IEEE

  34. Fé I, Nguyen TA, Soares A, Son S, Choi E, Min D, Lee J-W, Silva FA (2023) Model-driven dependability and power consumption quantification of kubernetes based cloud-fog continuum. IEEE Access

  35. Kaur S, Bawa S (2016) A review on energy aware vm placement and consolidation techniques. In: 2016 International Conference on Inventive Computation Technologies (ICICT), vol. 3, pp 1–7. IEEE

  36. Sharma O, Saini H (2016) Vm consolidation for cloud data center using median based threshold approach. Proced Comput Sci 89:27–33

    Article  Google Scholar 

  37. Pereira P, Melo C, Araujo J, Dantas J, Santos V, Maciel P (2022) Availability model for edge-fog-cloud continuum: an evaluation of an end-to-end infrastructure of intelligent traffic management service. The J Supercomput, 1–28

  38. Clemente D, Pereira P, Dantas J, Maciel P (2022) Availability evaluation of system service hosted in private cloud computing through hierarchical modeling process. The J Supercomput 78(7):9985–10024

    Article  Google Scholar 

  39. Bendechache M, Silva I, Santos GL, Guedes LA, Svorobej S, Mario MN, Ares ME, Byrne J, Endo PT, Lynn T (2019) Analysing dependability and performance of a real-world elastic search application. In: 2019 9th Latin-American Symposium on Dependable Computing (LADC), pp 1–8. IEEE

  40. Silva FA, Brito C, Araújo G, Fé I, Tyan M, Lee J-W, Nguyen TA, Maciel PRM (2022) Model-driven impact quantification of energy resource redundancy and server rejuvenation on the dependability of medical sensor networks in smart hospitals. Sensors 22(4):1595

    Article  Google Scholar 

  41. Melo C, Araujo J, Dantas J, Pereira P, Maciel P (2022) A model-based approach for planning blockchain service provisioning. Computing 104(2):315–337

    Article  Google Scholar 

  42. Araujo E, Pereira P, Dantas J, Maciel P (2020) Dependability impact in the smart solar power systems: An analysis of smart buildings. Energies 14(1):124

    Article  Google Scholar 

  43. Tuffin B, Choudhary P, Hirel C, Trivedi K (2007) Simulation versus analytic-numeric methods: a petri net example. In: Proc. of the 2nd VALUETOOLS Conference

  44. Ungsunan PD, Lin C, Wang Y, Gai Y (2009) Network processing performability evaluation on heterogeneous reliability multicore processors using srn model. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, pp 1–6. IEEE

  45. Maciel PRM (2023) Performance, Reliability, and Availability Evaluation of Computational Systems, Volume I: Performance and Background. CRC Press, New York

  46. Sheldon FT, Greiner S, Benzinger M (2000) Specification, safety and reliability analysis using stochastic petri net models. In: Tenth International Workshop on Software Specification and Design. IWSSD-10 2000, pp 123–132. IEEE

  47. Trivedi KS, Ciardo G, Malhotra M, Garg S (2005) Dependability and performability analysis using stochastic petri nets. In: 11th International Conference on Analysis and Optimization of Systems Discrete Event Systems: Sophia-Antipolis, June 15–16–17, 1994, pp 144–157. Springer

  48. Jin C, Bai X, Yang C, Mao W, Xu X (2020) A review of power consumption models of servers in data centers. Appl Energy 265:114806

    Article  Google Scholar 

  49. Lin W, Shi F, Wu W, Li K, Wu G, Mohammed A-A (2020) A taxonomy and survey of power models and power modeling for cloud servers. ACM Comput Surv (CSUR) 53(5):1–41

    Article  Google Scholar 

  50. Tadesse SS, Malandrino F, Chiasserini C-F (2017) Energy consumption measurements in docker. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp 272–273. IEEE

  51. AWS Instance types. https://aws.amazon.com/pt/ec2/instance-types/. Accessed: 2025-05-02

  52. Gomes C, Tavares E, Junior MNDO, Nogueira B (2022) Cloud storage availability and performance assessment: a study based on nosql dbms. The J Supercomput 78(2):2819–2839

    Article  Google Scholar 

  53. Kharchenko V, Ponochovnyi Y, Ivanchenko O, Fesenko H, Illiashenko O (2022) Combining markov and semi-markov modelling for assessing availability and cybersecurity of cloud and iot systems. Cryptography 6(3):44

    Article  Google Scholar 

  54. Sebastio S, Ghosh R, Mukherjee T (2018) An availability analysis approach for deployment configurations of containers. IEEE Trans Serv Comput 14(1):16–29

    Article  Google Scholar 

  55. Morabito R (2015) Power consumption of virtualization technologies: an empirical investigation. In: 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC), pp 522–527. IEEE

  56. Maciel P, Matos R, Silva B, Figueiredo J, Oliveira D, Fé I, Maciel R, Dantas J (2017) Mercury: Performance and dependability evaluation of systems with exponential, expolynomial, and general distributions. In: 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC), pp 50–57. IEEE

  57. Melo C, Dantas J, Oliveira A, Oliveira D, Fé I, Araujo J, Matos R, Maciel P (2018) Availability models for hyper-converged cloud computing infrastructures. In: 2018 Annual IEEE International Systems Conference (SysCon), pp 1–7. IEEE

  58. Gonçalves I, Rodrigues L, Silva FA, Nguyen TA, Min D, Lee J-W (2021) Surveillance system in smart cities: a dependability evaluation based on stochastic models. Electronics 10(8):876

    Article  Google Scholar 

  59. Jain R (1991) The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, vol 1. Wiley, New York

    Google Scholar 

Download references

Funding

This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(No. 2020R1A6A1A03046811). This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2021R1A2C2094943)

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Iure Fé or Tuan Anh Nguyen.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fé, I., Nguyen, T.A., Mauro, M.D. et al. Energy-aware dynamic response and efficient consolidation strategies for disaster survivability of cloud microservices architecture. Computing 106, 2737–2783 (2024). https://doi.org/10.1007/s00607-024-01305-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-024-01305-x

Keywords

Mathematics Subject Classification

Navigation