Nothing Special   »   [go: up one dir, main page]

skip to main content
Open access

On Resilience in Cloud Computing: A Survey of Techniques across the Cloud Domain

Published: 28 May 2020 Publication History


Cloud infrastructures are highly favoured as a computing delivery model worldwide, creating a strong societal dependence. It is therefore vital to enhance their resilience, providing persistent service delivery under a variety of conditions. Cloud environments are highly complex and continuously evolving. Additionally, the plethora of use-cases ensures requirements for persistent service delivery vary. As a contribution to knowledge, this work surveys resilience techniques for cloud environments. We apply a novel perspective using a layered model of traditional and emerging cloud paradigms. Works are then classified according to the Resilinets model. For each layer, the most common techniques with limitations are derived including an actor’s strength in influencing resilience in the cloud with each technique. We conclude with some future challenges to the field of resilient cloud computing.


N. A. S. Abdullah, N. L. Md Noor, and E. N. M. Ibrahim. 2013. Resilient organization: Modelling the capacity for resilience. In Proceedings of the International Conference on Research and Innovation in Information Systems (ICRIIS’13). 319--324.
Yuan Ai, Mugen Peng, and Kecheng Zhang. 2018. Edge computing technologies for Internet of Things: A primer. Dig. Commun. Netw. 4, 2 (2018), 77--86.
M. Aibin and K. Walkowiak. 2018. Monte Carlo tree search for cross-stratum optimization of survivable inter-data center elastic optical network. In Proceedings of the 10th International Workshop on Resilient Networks Design and Modeling (RNDM’18). 1--7.
Opeyemi O. Ajibola, Taisir E. H. El-Gorashi, and Jaafar M. H. Elmirghani. 2019. Disaggregation for improved efficiency in fog computing era. In Proceedings of the 21st International Conference on Transparent Optical Networks (ICTON’19). IEEE, 1--7.
Mahmoud Al-Ayyoub, Muneera Al-Quraan, Yaser Jararweh, Elhadj Benkhelifa, and Salim Hariri. 2018. Resilient service provisioning in cloud based data centers. Fut. Gen. Comput. Syst. 86 (2018), 765--774.
M. J. F. Alenazi and J. P. G. Sterbenz. 2015. Comprehensive comparison and accuracy of graph metrics in predicting network resilience. In Proceedings of the 11th International Conference on the Design of Reliable Communication Networks (DRCN’15). 157--164.
M. J. F. Alenazi and J. P. G. Sterbenz. 2015. Evaluation and improvement of network resilience against attacks using graph spectral metrics. In Proceedings of the Resilience Week Symposium (RWS’15). 1--6.
S. Antony, S. Antony, A. S. A. Beegom, and M. S. Rajasree. 2012. Task scheduling algorithm with fault tolerance for cloud. In Proceedings of the International Conference on Computing Sciences. 180--182.
A. Aral and I. Brandic. 2018. Dependency mining for service resilience at the edge. In Proceedings of the IEEE/ACM Symposium on Edge Computing (SEC’18). 228--242.
J. P. Araujo Neto, D. M. Pianto, and C. G. Ralha. 2018. An agent-based fog computing architecture for resilience on Amazon EC2 spot instances. In Proceedings of the 7th Brazilian Conference on Intelligent Systems (BRACIS’18). 360--365.
A. C. Baktir, A. Ozgovde, and C. Ersoy. 2017. How can edge computing benefit from software-defined networking: A survey, use cases, and future directions. IEEE Commun. Surv. Tutor. 19, 4 (2017), 2359--2391.
I. B. Barla, K. Hoffmann, M. Hoffmann, D. A. Schupke, and G. Carle. 2013. Shared protection in virtual networks. In Proceedings of the IEEE International Conference on Communications Workshops (ICC’13). 240--245.
I. B. Barla, D. A. Schupke, and G. Carle. 2012. Delay performance of resilient cloud services over networks. In Proceedings of the IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA’12). 512--517.
I. B. Barla, D. A. Schupke, M. Hoffmann, and G. Carle. 2013. Optimal design of virtual networks for resilient cloud services. In Proceedings of the 9th International Conference on the Design of Reliable Communication Networks (DRCN’13). 218--225.
I. B. Barla Harter, D. A. Schupke, M. Hoffmann, and G. Carle. 2015. Optimal design of resilient virtual networks. IEEE/OSA J. Opt. Commun. Netw. 7, 2 (Feb. 2015), A218--A234.
A. Benameur, N. S. Evans, and M. C. Elder. 2013. Cloud resiliency and security via diversified replica execution and monitoring. In Proceedings of the 6th International Symposium on Resilient Control Systems (ISRCS’13). 150--155.
E. Benkhelifa, T. Welsh, and W. Hamouda. 2018. A critical review of practices and challenges in intrusion detection systems for IoT: Toward universal and resilient systems. IEEE Commun. Surv. Tutor. 20, 4 (2018), 3496--3509.
K. E. Benson, G. Wang, N. Venkatasubramanian, and Y. Kim. 2018. Ride: A resilient IoT data exchange middleware leveraging SDN and edge cloud resources. In Proceedings of the IEEE/ACM 3rd International Conference on Internet-of-Things Design and Implementation (IoTDI’18). 72--83.
Alysson Bessani, Miguel Correia, Bruno Quaresma, Fernando André, and Paulo Sousa. 2013. DepSky: Dependable and secure storage in a cloud-of-clouds. Trans. Stor. 9, 4, Article 12 (Nov. 2013).
Kashif Bilal, Osman Khalid, Aiman Erbad, and Samee U. Khan. 2018. Potentials, trends, and prospects in edge technologies: Fog, cloudlet, mobile edge, and micro data centers. Comput. Netw. 130 (2018), 94--120.
A. Binun, M. Bloch, S. Dolev, M. R. Kahil, B. Menuhin, R. Yagel, T. Coupaye, M. Lacoste, and A. Wailly. 2014. Self-stabilizing virtual machine hypervisor architecture for resilient cloud. In Proceedings of the IEEE World Congress on Services (SERVICES’14). 200--207.
Minh Bui, B. Jaumard, and C. Develder. 2013. Anycast end-to-end resilience for cloud services over virtual optical networks. In Proceedings of the 15th International Conference on Transparent Optical Networks (ICTON’13). 1--7.
Minh Bui, Ting Wang, B. Jaumard, D. Medhi, and C. Develder. 2014. Time-varying resilient virtual network mapping for multi-location cloud data centers. In Proceedings of the 16th International Conference on Transparent Optical Networks (ICTON’16). 1--8.
Gokhan Calis and Onur Ozan Koyluoglu. 2014. Repairable block failure resilient codes. CoRR abs/1406.7264 (2014).
John Cartlidge and Ilango Sriram. 2011. Modelling resilience in cloud-scale data centres. CoRR abs/1106.5457 (2011).
Marco Carvalho, Dipankar Dasgupta, Michael Grimaila, and Carlos Perez. 2011. Mission resilience in cloud computing: A biologically inspired approach. In Proceedings of the 6th International Conference on Information Warfare and Security. 42--52.
Sonali Chandna, Nabil Naas, and Hussein Mouftah. 2019. Software defined survivable optical interconnect for data centers. Opt. Switch. Netw. 31 (2019), 86--99.
Brijesh Kashyap Chejerla and Sanjay K. Madria. 2017. QoS guaranteeing robust scheduling in attack resilient cloud integrated cyber physical system. Fut. Gen. Comput. Syst. 75 (2017), 145--157.
Mehdi Nazari Cheraghlou, Ahmad Khadem-Zadeh, and Majid Haghparast. 2016. A survey of fault tolerance architecture in cloud computing. J. Netw. Comput. Applic. 61 (2016), 81--92.
C. Colman-Meixner, C. Develder, M. Tornatore, and B. Mukherjee. 2016. A survey on resiliency techniques in cloud computing infrastructures and applications. IEEE Commun. Surv. Tutor. 18, 3 (2016), 2244--2281.
R. Courteaud, Yingjie Xu, and C. Cerin. 2012. Practical solutions for resilience in SlapOS. In Proceedings of the IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom’12). 488--495.
Rodrigo S. Couto, Stefano Secci, Miguel Elias M. Campista, and Luís Henrique M. K. Costa. 2015. Server placement with shared backups for disaster-resilient clouds. Comput. Netw. 93 (2015), 423--434.
R. S. Couto, S. Secci, M. E. M. Campista, and L. H. M. K. Costa. 2014. Latency versus survivability in geo-distributed data center design. In Proceedings of the IEEE Global Communications Conference. 1102--1107.
Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. 2008. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI’08). USENIX Association. Retrieved from
Miguel Franklin de Castro, Levi Bayde Ribeiro, and Camila Helena Souza Oliveira. 2012. An autonomic bio-inspired algorithm for wireless sensor network self-organization and efficient routing. J. Netw. Comput. Applic. 35, 6 (2012), 2003--2015.
I. P. Egwutuoha, S. Chen, D. Levy, and B. Selic. 2012. A fault tolerance framework for high performance computing in cloud. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’12). 709--710.
S. Eisele, I. Mardari, A. Dubey, and G. Karsai. 2017. RIAPS: Resilient information architecture platform for decentralized smart systems. In Proceedings of the IEEE 20th International Symposium on Real-time Distributed Computing (ISORC’17). 125--132.
S. Ferdousi, F. Dikhiyik, M. F. Habib, and B. Mukherjee. 2013. Disaster-aware data-center and content placement in cloud networks. In Proceedings of the IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS’13). 1--3.
Marc Eduard Frîncu. 2014. Scheduling highly available applications on cloud environments. Fut. Gen. Comput. Syst. 32 (2014), 138--153.
Yue Gao, S. K. Gupta, Yanzhi Wang, and M. Pedram. 2014. An energy-aware fault tolerant scheduling framework for soft error resilient cloud computing systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’14). 1--6.
G. Garlick. 2011. Improving resilience with community cloud computing. In Proceedings of the 6th International Conference on Availability, Reliability and Security (ARES’11). 650--655.
Rahul Ghosh, Francesco Longo, Vijay K. Naik, and Kishor S. Trivedi. 2010. Quantifying resiliency of IaaS cloud. In Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems (SRDS’10). IEEE Computer Society, Washington, DC, 343--347.
J. L. Gonzalez, Jesus Carretero Perez, Victor J. Sosa-Sosa, Luis M. Sanchez, and Borja Bergua. 2015. SkyCDS: A resilient content delivery service based on diversified cloud storage. Simul. Modell. Pract. Theor. 54 (2015), 64--85.
Róża Goścień and Krzysztof Walkowiak. 2017. Modeling and optimization of data center location and routing and spectrum allocation in survivable elastic optical networks.Opt. Switch. Netw. 23 (2017), 129--143.
Minzhe Guo and Prabir Bhattacharya. 2014. Diverse virtual replicas for improving intrusion tolerance in cloud. In Proceedings of the 9th Cyber and Information Security Research Conference (CISR’14). ACM, New York, NY, 41--44.
Salim Hariri, Mohamed Eltoweissy, and Youssif Al-Nashif. 2011. BioRAC: Biologically inspired resilient autonomic cloud. In Proceedings of the 7th Workshop on Cyber Security and Information Intelligence Research (CSIIRW’11). ACM, New York, NY.
I. B. B. Harter, M. Hoffmann, D. A. Schupke, and G. Carle. 2014. Scalable resilient virtual network design algorithms for cloud services. In Proceedings of the 6th International Workshop on Reliable Networks Design and Modeling (RNDM’14). 123--130.
I. B. B. Harter, D. A. Schupke, M. Hoffmann, and G. Carle. 2014. Network virtualization for disaster resilience of cloud services. IEEE Commun. Mag. 52, 12 (Dec. 2014), 88--95.
T. Hecht, P. Smith, and M. Scholler. 2014. Critical services in the cloud: Understanding security and resilience risks. In Proceedings of the 6th International Workshop on Reliable Networks Design and Modeling (RNDM’14). 131--137.
A. Hussein, I. H. Elhajj, A. Chehab, and A. Kayssi. 2017. SDN VANETs in 5G: An architecture for resilient security services. In Proceedings of the 4th International Conference on Software Defined Systems (SDS’17). 67--74.
A. Imran, A. U. Gias, R. Rahman, A. Seal, T. Rahman, F. Ishraque, and K. Sakib. 2014. Cloud-Niagara: A high availability and low overhead fault tolerance middleware for the cloud. In Proceedings of the 16th International Conference on Computer and Information Technology. 271--276.
Abdul Jabbar. 2010. A Framework to Quantify Network Resilience and Survivability. Ph.D. Dissertation. University of Kansas.
V. Jaiswal, A. Sen, and A. Verma. 2014. Integrated resiliency planning in storage clouds. IEEE Trans. Netw. Serv. Manag. 11, 1 (Mar. 2014), 3--14.
Ravi Jhawar and Vincenzo Piuri. 2013. Fault tolerance and resilience in cloud computing environments. Computer and Information Security Handbook. Morgan Kaufmann, 125--141.
Xiaoen Ju, Livio Soares, Kang G. Shin, Kyung Dong Ryu, and Dilma Da Silva. 2013. On fault resilience of OpenStack. In Proceedings of the 4th Symposium on Cloud Computing (SOCC’13). ACM, New York, NY.
M. Kahla, M. Azab, and A. Mansour. 2018. Secure, resilient, and self-configuring fog architecture for untrustworthy IoT environments. In Proceedings of the 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE’18). 49--54.
M. Kanter and S. Taylor. 2013. Diversity in cloud systems through runtime and compile-time relocation. In Proceedings of the IEEE International Conference on Technologies for Homeland Security (HST’13). 396--402.
A. D. Keromytis, R. Geambasu, S. Sethumadhavan, S. J. Stolfo, Junfeng Yang, A. Benameur, M. Dacier, M. Elder, D. Kienzle, and A. Stavrou. 2012. The MEERKATS cloud security architecture. In Proceedings of the 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW’12). 446--450.
A. Khalifa, M. Azab, and M. Eltoweissy. 2014. Resilient hybrid mobile ad hoc cloud over collaborating heterogeneous nodes. In Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’14). 134--143.
C. Klein et al. 2014. Brownout: Building more robust cloud applications. In Proceedings of the 36th International Conference on Software Engineering.
C. Klein et al. 2014. Improving cloud service resilience using brownout-aware load-balancing. In Proceedings of the IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS’14). 31--40.
J.-C. Laprie. 2005. Resilience for the scalability of dependability. In Proceedings of the 4th IEEE International Symposium on Network Computing and Applications. 5--6.
M. Le, Z. Song, Y. Kwon, and E. Tilevich. 2017. Reliable and efficient mobile edge computing in highly dynamic and volatile environments. In Proceedings of the 2nd International Conference on Fog and Mobile Edge Computing (FMEC’17). 113--120.
X. Li, T. Gao, L. Zhang, Y. Tang, Y. Zhang, and S. Huang. 2018. Survivable K-node (edge) content connected virtual optical network (KC-VON) embedding over elastic optical data center networks. IEEE Access 6 (2018), 38780--38793.
Qianhui Liang and Bu-Sung Lee. 2011. Delivering high resilience in designing platform-as-a-service clouds. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD’11). 676--683.
Hsien-Chun Liao and Chien-Fu Cheng. 2014. A malicious-resilient protocol for consistent scheduling problem in the cloud computing environment. Comput. J. 58, 2 (04 2014), 315--330.
Guanglei Liu and Chuanyi Ji. 2009. Scalability of network-failure resilience: Analysis using multi-layer probabilistic graphical models. IEEE/ACM Trans. Netw. 17, 1 (Feb. 2009), 319--331.
J. Liu and H. Shen. 2016. A low-cost multi-failure resilient replication scheme for high data availability in cloud storage. In Proceedings of the IEEE 23rd International Conference on High Performance Computing (HiPC’16). 242--251.
F. Lombardi, R. Di Pietro, and C. Soriente. 2010. CReW: Cloud resilience for windows guests through monitored virtualization. In Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems. 338--342.
Thouraya Louati, Heithem Abbes, and Christophe Cérin. 2018. LXCloudFT: Towards high availability, fault tolerant cloud system based Linux containers. J. Parallel Distrib. Comput. 122 (2018), 51--69.
Bing Luo and W. Liu. 2011. The sustainability and survivabiltiy network design for next generation cloud networking. In Proceedings of the IEEE 9th International Conference on Dependable, Autonomic and Secure Computing (DASC’11). 555--560.
P. Mach and Z. Becvar. 2017. Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun. Surv. Tutor. 19, 3 (2017), 1628--1656.
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief. 2017. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 19, 4 (2017), 2322--2358.
David Marsh, Richard Tynan, Donal O’Kane, and Gregory M. P. O’Hare. 2004. Autonomic wireless sensor networks. Eng. Applic. Artif. Intell. 17, 7 (2004), 741--748.
David R. Matos, Miguel L. Pardal, Georg Carle, and Miguel Correia. 2018. RockFS: Cloud-backed file system resilience to client-side attacks. In Proceedings of the 19th International Middleware Conference (Middleware’18). ACM, New York, NY, 107--119.
Peter M. Mell and Timothy Grance. 2011. SP 800-145. The NIST Definition of Cloud Computing. Technical Report. National Institute of Science and Technology.
Madalin Mihailescu et al. 2011. Enhancing application robustness in cloud data centers. In Proceedings of the Conference of the Center for Advanced Studies on Collaborative Research (CASCON’11). IBM Corp., 133--147.
Bahareh Alami Milani and Nima Jafari Navimipour. 2016. A comprehensive review of the data replication techniques in the cloud environments: Major trends and future directions. J. Netw. Comput. Applic. 64 (2016), 229--238.
A. Modarresi, S. Gangadhar, and J. P. G. Sterbenz. 2017. A framework for improving network resilience using SDN and fog nodes. In Proceedings of the 9th International Workshop on Resilient Networks Design and Modeling (RNDM’17). 1--7.
A. Modarresi and J. P. G. Sterbenz. 2017. Toward resilient networks with fog computing. In Proceedings of the 9th International Workshop on Resilient Networks Design and Modeling (RNDM’17). 1--7.
Yehia H. Khalil Mohamed. 2011. Data center resilience assessment: Storage, networking and security. PhD Thesis. University of Louisville.
C. Mouradian, D. Naboulsi, S. Yangui, R. H. Glitho, M. J. Morrow, and P. A. Polakos. 2018. A comprehensive survey on fog computing: State-of-the-art and research challenges. IEEE Commun. Surv. Tutor. 20, 1 (2018), 416--464.
Rekha Nachiappan, Bahman Javadi, Rodrigo N. Calheiros, and Kenan M. Matawie. 2017. Cloud storage reliability for big data applications: A state-of-the-art survey. J. Netw. Comput. Applic. 97 (2017), 35--47.
W. Najjar and J.-L. Gaudiot. 1990. Network resilience: A measure of network fault tolerance. IEEE Trans. Comput. 39, 2 (Feb. 1990), 174--181.
Toan Nguyen, J.-A. Desideri, and L. Trifan. 2012. Applications resilience on clouds. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS’12). 60--66.
B. Nicolae and F. Cappello. 2011. BlobCR: Efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). 1--12.
Opeyemi Osanaiye, Kim-Kwang Raymond Choo, and Mqhele Dlodlo. 2016. Distributed denial of service (DDoS) resilience in cloud: Review and conceptual cloud DDoS mitigation framework. J. Netw. Comput. Applic. 67 (2016), 147--165.
Umar Ozeer, Xavier Etchevers, Loïc Letondeur, François-Gaël Ottogalli, Gwen Salaün, and Jean-Marc Vincent. 2018. Resilience of stateful IoT applications in a dynamic fog environment. In Proceedings of the 15th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous’18). ACM, New York, NY, 332--341.
Albert Pages, Rubén Serrano, Jordi Perelló, and Salvatore Spadaro. 2017. On the benefits of resource disaggregation for virtual data centre provisioning in optical data centres. Comput. Commun. 107 (2017), 60--74.
J. Pan and J. McElhannon. 2018. Future edge cloud and edge computing for Internet of Things applications. IEEE Internet Things J. 5, 1 (Feb. 2018), 439--449.
Deepak Poola, Mohsen Amini Salehi, Kotagiri Ramamohanarao, and Rajkumar Buyya. 2017. A taxonomy and survey of fault-tolerant workflow management systems in cloud and distributed computing environments. In Software Architecture for Big Data and the Cloud, Ivan Mistrik, Rami Bahsoon, Nour Ali, Maritta Heisel, and Bruce Maxim (Eds.). Morgan Kaufmann, Boston, MA, 285--320.
Jesús M. T. Portocarrero, Flávia C. Delicato, Paulo F. Pires, Nadia Gámez, Lidia Fuentes, David Ludovino, and Paulo Ferreira. 2014. Autonomic wireless sensor networks: A systematic literature review. J. Sensors 2014 (2014).
J. S. Preden, K. Tammemäe, A. Jantsch, M. Leier, A. Riid, and E. Calis. 2015. The benefits of self-awareness and attention in fog and mist computing. Computer 48, 7 (July 2015), 37--45.
Y. Qu and N. Xiong. 2012. RFH: A resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage. In Proceedings of the 41st International Conference on Parallel Processing. 520--529.
C. Queiroz, S. K. Garg, and Z. Tari. 2013. A probabilistic model for quantifying the resilience of networked systems. IBM J. Res. Dev. 57, 5 (Sept. 2013), 3:1--3:9.
H. P. Reiser and R. Kapitza. 2007. Hypervisor-based efficient proactive recovery. In Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems (SRDS’07). 83--92.
R. Rios, R. Roman, J. A. Onieva, and J. Lopez. 2017. From SMOG to fog: A security perspective. In Proceedings of the 2nd International Conference on Fog and Mobile Edge Computing (FMEC’17). 56--61.
Rodrigo Roman, Javier Lopez, and Masahiro Mambo. 2018. Mobile edge computing, Fog et al.: A survey and analysis of security threats and challenges. Fut. Gen. Comput. Syst. 78 (2018), 680--698.
V. Salapura, R. Harper, and M. Viswanathan. 2013. Resilient cloud computing. IBM J. Res. Dev. 57, 5 (Sept. 2013), 10:1--10:12.
Arjuna Sathiaseelan, Mennan Selimi, Carlos Molina, Adisorn Lertsinsrubtavee, Leandro Navarro, Felix Freitag, Fernando Ramos, and Roger Baig. 2017. Towards decentralised resilient community clouds. In Proceedings of the 2nd Workshop on Middleware for Edge Clouds 8 Cloudlets (MECC’17). ACM, New York, NY.
Daniel J. Scales, Mike Nelson, and Ganesh Venkitachalam. 2010. The design of a practical system for fault-tolerant virtual machines. SIGOPS Oper. Syst. Rev. 44, 4 (Dec. 2010), 30--39.
Sibylle Schaller and Dave Hood. 2017. Software defined networking architecture standardization. Comput. Stand. Interf. 54 (2017), 197--202. SI: Standardization SDN8NFV.
M. Scholler et al. 2013. Resilient deployment of virtual network functions. In Proceedings of the 5th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT’13). 208--214.
M. Scholler, R. Bless, F. Pallas, J. Horneber, and P. Smith. 2013. An architectural model for deploying critical infrastructure services in the cloud. In Proceedings of the IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom’13), Vol. 1. 458--466.
S. Secci and S. Murugesan. 2014. Cloud networks: Enhancing performance and resiliency. Computer 47, 10 (Oct. 2014), 82--85.
Vibhu Saujanya Sharma and Aravindan Santharam. 2013. Implementing a resilient application architecture for state management on a PaaS cloud. In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science (CLOUDCOM’13), Vol. 1. IEEE Computer Society, Washington, DC, 142--147.
Noor-ul-hassan Shirazi, Steven Simpson, Simon Oechsner, Andreas Mauthe, and David Hutchison. 2015. A framework for resilience management in the cloud. Elekt. Inf. 132, 2 (1 Mar. 2015), 122--132.
Bruno Sousa, Kostas Pentikousis, and Marilia Curado. 2014. MeTHODICAL: Towards the next generation of multihomed applications. Comput. Netw. 65 (2014), 21--40.
B. Sousa, K. Pentikousis, and M. Curado. 2014. Optimizing quality of resilience in the cloud. In Proceedings of the Global Communications Conference (GLOBECOM’14). 1133--1138.
R. Souza Couto, S. Secci, M. Mitre Campista, and L. M. Kosmalski Costa. 2014. Network design requirements for disaster resilience in IaaS clouds. IEEE Commun. Mag. 52, 10 (Oct. 2014), 52--58.
J. P. G. Sterbenz and P. Kulkarni. 2013. Diverse infrastructure and architecture for datacenter and cloud resilience. In Proceedings of the 22nd International Conference on Computer Communications and Networks (ICCCN’13). 1--7.
James P. G. Sterbenz, David Hutchison, Egemen K. Çetinkaya, Abdul Jabbar, Justin P. Rohrer, Marcus Schöller, and Paul Smith. 2010. Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. Comput. Netw. 54, 8 (June 2010), 1245--1265.
G. Suciu, C. Cernat, G. Todoran, V. Suciu, V. Poenaru, T. Militaru, and S. Halunga. 2012. A solution for implementing resilience in open source cloud platforms. In Proceedings of the 9th International Conference on Communications (COMM’12). 335--338.
J. Suzuki, Y. Hidaka, J. Higuchi, Y. Hayashi, M. Kan, and T. Yoshikawa. 2016. Disaggregation and sharing of I/O devices in cloud data centers. IEEE Trans. Comput. 65, 10 (Oct. 2016), 3013--3026.
A. Tchana, L. Broto, and D. Hagimont. 2012. Approaches to cloud computing fault tolerance. In Proceedings of the International Conference on Computer, Information and Telecommunication Systems (CITS’12). 1--6.
M. H. C. Torres and T. Holvoet. 2014. Self-adaptive resilient service composition. In Proceedings of the International Conference on Cloud and Autonomic Computing (ICCAC’14). 141--150.
Phuoc Nguyen Tran and Nadia Boukhatem. 2008. The distance to the ideal alternative (DiA) algorithm for interface selection in heterogeneous wireless networks. In Proceedings of the 6th ACM International Symposium on Mobility Management and Wireless Access (MobiWac’08). ACM, New York, NY, 61--68.
Manghui Tu and Dianxiang Xu. 2013. System resilience modeling and enhancement for the cloud. In Proceedings of the International Conference on Computing, Networking and Communications (ICNC’13). 1021--1025.
D. Vasconcelos, V. Severino, J. Neuman, R. Andrade, and M. Maia. 2018. Bio-inspired model for data distribution in fog and mist computing. In Proceedings of the IEEE 42nd Computer Software and Applications Conference (COMPSAC’18), Vol. 02. 777--782.
P. Verissimo, A. Bessani, and M. Pasin. 2012. The TClouds architecture: Open and resilient cloud-of-clouds computing. In Proceedings of the IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops (DSN-W’12). 1--6.
Alexandre Viejo and David Sánchez. 2019. Secure and privacy-preserving orchestration and delivery of fog-enabled IoT services. Ad Hoc Netw. 82 (2019), 113--125.
M. Villarreal-Vasquez, B. Bhargava, P. Angin, N. Ahmed, D. Goodwin, K. Brin, and J. Kobes. 2017. An MTD-based self-adaptive resilience approach for cloud systems. In Proceedings of the IEEE 10th International Conference on Cloud Computing (CLOUD’17). 723--726.
C. Wang, Q. Wang, K. Ren, N. Cao, and W. Lou. 2012. Toward secure and dependable storage services in cloud computing. IEEE Trans. Serv. Comput. 5, 2 (Apr. 2012), 220--232.
S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang, and W. Wang. 2017. A survey on mobile edge networks: Convergence of computing, caching, and communications. IEEE Access 5 (2017), 6757--6779.
T. Welsh and E. Benkhelifa. 2017. Perspectives on resilience in cloud computing: Review and trends. In Proceedings of the IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA’17). 696--703.
V. R. Westmark. 2004. A definition for information system survivability. In Proceedings of the 37th Hawaii International Conference on System Sciences.
Xin Xu and H. H. Huang. 2015. DualVisor: Redundant hypervisor execution for achieving hardware error resilience in datacenters. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). 485--494.
J. Yanez-Sierra, A. Diaz-Perez, V. Sosa-Sosa, and J. L. Gonzalez. 2015. Towards secure and dependable cloud storage based on user-defined workflows. In Proceedings of the IEEE 2nd International Conference on Cyber Security and Cloud Computing. 405--410.
J. Yao, P. Lu, and Z. Zhu. 2014. Minimizing disaster backup window for geo-distributed multi-datacenter cloud systems. In Proceedings of the IEEE International Conference on Communications (ICC’14). 3631--3635.
Q. Zhang, Q. She, Y. Zhu, X. Wang, P. Palacharla, and M. Sekiya. 2013. Survivable resource orchestration for optically interconnected data center networks. In Proceedings of the 39th European Conference and Exhibition on Optical Communication (ECOC’13). 1--3.
W. Zhao, P. M. Melliar-Smith, and L. E. Moser. 2010. Fault tolerance middleware for cloud computing. In Proceedings of the IEEE 3rd International Conference on Cloud Computing. 67--74.
Z. Zheng, T. C. Zhou, M. R. Lyu, and I. King. 2010. FTCloud: A component ranking framework for fault-tolerant cloud applications. In Proceedings of the IEEE 21st International Symposium on Software Reliability Engineering. 398--407.
Yun Zhou, Yuguang Fang, and Yanchao Zhang. 2008. Securing wireless sensor networks: A survey. IEEE Commun. Surv. Tutor. 10, 3 (2008), 6--28.

Cited By

View all
  • (2025)Correlating node centrality metrics with node resilience in self-healing systems with limited neighbourhood informationFuture Generation Computer Systems10.1016/j.future.2024.107553163(107553)Online publication date: Feb-2025
  • (2025)Defining and measuring the resilience of network servicesComputer Networks10.1016/j.comnet.2025.111036258(111036)Online publication date: Feb-2025
  • (2025)Analysis of sensor reliability in IoT solutionsReliability Assessment and Optimization of Complex Systems10.1016/B978-0-443-29112-8.00014-1(435-453)Online publication date: 2025
  • Show More Cited By



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 53, Issue 3
May 2021
787 pages
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2020
Online AM: 07 May 2020
Accepted: 01 March 2020
Revised: 01 February 2020
Received: 01 September 2019
Published in CSUR Volume 53, Issue 3


Request permissions for this article.

Check for updates

Author Tags

  1. Resilience
  2. cloud
  3. edge
  4. fog
  5. survey


  • Survey
  • Research
  • Refereed

Funding Sources

  • Science Foundation Ireland


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)1,201
  • Downloads (Last 6 weeks)86
Reflects downloads up to 12 Feb 2025

Other Metrics


Cited By

View all
  • (2025)Correlating node centrality metrics with node resilience in self-healing systems with limited neighbourhood informationFuture Generation Computer Systems10.1016/j.future.2024.107553163(107553)Online publication date: Feb-2025
  • (2025)Defining and measuring the resilience of network servicesComputer Networks10.1016/j.comnet.2025.111036258(111036)Online publication date: Feb-2025
  • (2025)Analysis of sensor reliability in IoT solutionsReliability Assessment and Optimization of Complex Systems10.1016/B978-0-443-29112-8.00014-1(435-453)Online publication date: 2025
  • (2024)TextRefine: A Novel approach to improve the accuracy of LLM ModelsData and Metadata10.56294/dm20243313(331)Online publication date: 20-May-2024
  • (2024)A Survey on Resilience in Information Sharing on Networks: Taxonomy and Applied TechniquesACM Computing Surveys10.1145/365994456:12(1-36)Online publication date: 20-Apr-2024
  • (2024)Cyber Resilience, Risk Management, and Security Challenges in Enterprise-Scale Cloud Systems: Comprehensive Review2024 13th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO62516.2024.10577956(1-8)Online publication date: 11-Jun-2024
  • (2024)Resilient VirtualizationComputer10.1109/MC.2023.330661757:2(70-78)Online publication date: 31-Jan-2024
  • (2024)Edge Computing as an Enabler of Energy and Water System ResilienceIEEE Engineering Management Review10.1109/EMR.2023.332087652:1(28-42)Online publication date: Feb-2024
  • (2024)Towards antifragility of cloud systemsInformation and Software Technology10.1016/j.infsof.2024.107519174:COnline publication date: 1-Oct-2024
  • (2024)Towards a Cyber Resilience Quantification Framework (CRQF) for IT infrastructureComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110446247:COnline publication date: 18-Jul-2024
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


HTML Format

View this article in HTML Format.

HTML Format

Login options

Full Access






Share this Publication link

Share on social media