Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3281411.3281441acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article
Public Access

REINFORCE: achieving efficient failure resiliency for network function virtualization based services

Published: 04 December 2018 Publication History

Abstract

Ensuring high availability (HA) for software-based networks is a critical design feature that will help the adoption of software-based network functions (NFs) in production networks. It is important for NFs to avoid outages and maintain mission-critical operations. However, HA support for NFs on the critical data path can result in unacceptable performance degradation. We present REINFORCE, an integrated framework to support efficient resiliency for NFs and NF service chains. REINFORCE includes timely failure detection and consistent failover mechanisms. REINFORCE replicates state to standby NFs (local and remote) while enforcing correctness. It minimizes the number of state transfers by exploiting the concept of external synchrony, and leverages opportunistic batching and multi-buffering to optimize performance. Experimental results show that, even at line-rate packet processing (10 Gbps), REINFORCE achieves chain-level failover across servers in a LAN (or within the same node) within 10ms (100/μs), incurring less than 10% (1%) performance overhead, and adds average latency of only ~400/μs (5/μs), with a worst-case latency of less than 1ms (10/μs).

Supplementary Material

ZIP File (p41-kulkarni-1.zip)
Supplemental material.
ZIP File (p41-kulkarni.zip)
Supplemental material.
MP4 File (p41-kulkarni.mp4)

References

[1]
Data plane development kit. http://dpdk.org/, 2014. {online}.
[2]
Criu: Checkpoint restore in userspace. http://criu.org/, 2017. {online}.
[3]
ndpi test pcap traces. https://github.com/ntop/nDPI/tree/dev/tests/pcap, 2018. {online}.
[4]
wrk: a http benchmarking tool. https://github.com/wg/wrk, 2018. {online}.
[5]
Alpernas, K., Manevich, R., Panda, A., Sagiv, M., Shenker, S., Shoham, S., and Velner, Y. Abstract interpretation of stateful networks. In International Static Analysis Symposium (2018), Springer, pp. 86--106.
[6]
Bench, A. ab-apache http server benchmarking tool.
[7]
Cachin, C., Schubert, S., and Vukolić, M. Non-determinism in byzantine fault-tolerant replication. arXiv preprint arXiv:1603.07351 (2016).
[8]
Deri, L., Martinelli, M., Bujlow, T., and Cardigliano, A. nDPI: Open-source high-speed deep packet inspection. In 2014 International Wireless Communications and Mobile Computing Conference (IWCMC) (Aug. 2014), pp. 617--622.
[9]
Dragojević, A., Narayanan, D., Castro, M., and Hodson, O. Farm: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14) (Seattle, WA, 2014), USENIX Association, pp. 401--414.
[10]
Emmerich, P., Gallenmüller, S., Raumer, D., Wohlfart, F., and Carle, G. Moongen: a scriptable high-speed packet generator. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference (2015), ACM, pp. 275--287.
[11]
ETSI-GS-NFV-002. Network Functions Virtualization (NFV): Architectural Framework. http://www.etsi.org/deliver/etsi_gs/nfv/001_099/002/01.01.01_60/gs_nfv002v010101p.pdf, 2013. {online}.
[12]
ETSI-GS-NFV-REL-001. Network Functions Virtualization (NFV): Resiliency Requirements. http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf, 2015. {online}.
[13]
Gallenmüller, S., Emmerich, p., Wohlfart, F., Raumer, D., and Carle, G. Comparison of frameworks for high-performance packet io. In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems (2015), IEEE Computer Society, pp. 29--38.
[14]
Gember, A., Krishnamurthy, A., John, S. S., Grandl, R., Gao, X., Anand, A., Benson, T., Akella, A., and Sekar, V. Stratos: A network-aware orchestration layer for middleboxes in the cloud. CoRR abs/1305.0209 (2013).
[15]
Gember-Jacobson, a., Viswanathan, R., Prakash, C., Grandl, R., Khalid, J., Das, S., and Akella, A. Opennf: Enabling innovation in network function control. SIGCOMM Comput. Commun. Rev. 44, 4 (Aug. 2014), 163--174.
[16]
Gill, P., Jain, N., and Nagappan, N. Understanding network failures in data centers: Measurement, analysis, and implications. SIGCOMM Comput. Commun. Rev. 41, 4 (Aug. 2011), 350--361.
[17]
Gunawi, H. S., Hao, M., Leesatapornwongsa, T., Patana-anake, T., Do, T., Adityatama, J., Eliazar, K. J., Laksono, A., Lukman, J. F., Martin, V., and Satria, A. D. What bugs live in the cloud? a study of 3000+ issues in cloud systems. In Proceedings of the ACM Symposium on Cloud Computing (New York, NY, USA, 2014), SOCC '14, ACM, pp. 7:1--7:14.
[18]
Gunawi, H. S., Hao, M., Suminto, R. O., Laksono, a., Satria, A. D., Adityatama, J., and Eliazar, K. J. Why does the cloud stop computing?: Lessons from hundreds of service outages. In Proceedings of the Seventh ACM Symposium on Cloud Computing (New York, NY, USA, 2016), SoCC '16, ACM, pp. 1--16.
[19]
Jackson, E. J., Walls, M., Panda, A., Pettit, J., Pfaff, B., Rajahalme, J., Koponen, T., and Shenker, S. Softflow: A middlebox architecture for open vswitch. In USENIX Annual Technical Conference (2016), pp. 15--28.
[20]
Kablan, M., Alsudais, A., Keller, E., and Le, F. Stateless network functions: Breaking the tight coupling of state and processing. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (Boston, MA, 2017), USENIX Association, pp. 97--112.
[21]
Katz, D., and Ward, D. Bidirectional Forwarding Detection (BFD). RFC 5880, June 2010.
[22]
Katz, D., and Ward, D. Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop). RFC 5881, June 2010.
[23]
Katz, D., and Ward, D. Generic Application of Bidirectional Forwarding Detection (BFD). RFC 5882, June 2010.
[24]
Khalid, J., and Akella, A. Streamnf: Performance and correctness for stateful chained nfs. CoRR abs/1612.01497 (2016).
[25]
Khalid, J., Gember-Jacobson, A., Michael, R., Abhashkumar, A., and Akella, A. Paving the way for NFV: Simplifying middlebox modifications using statealyzr. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (Santa Clara, CA, 2016), USENIX Association, pp. 239--253.
[26]
Kohler, E., Morris, R., Chen, B., Jannotti, J., and Kaashoek, M. F. The click modular router. ACM Trans. Comput. Syst. 18, 3 (Aug. 2000), 263--297.
[27]
Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., and Crowcroft, J. Unikernels: Library operating systems for the cloud. SIGPLAN Not. 48, 4 (Mar. 2013), 461--472.
[28]
Martins, J., Ahmed, M., Raiciu, C., Olteanu, V., Honda, M., Bifulco, R., and Huici, F. Clickos and the art of network function virtualization. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2014), NSDI'14, USENIX Association, pp. 459--473.
[29]
Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Berkeley, CA, USA, 2006), OSDI '06, USENIX Association, pp. 1--14.
[30]
Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. Rethink the sync. ACM Trans. Comput. Syst. 26, 3 (Sept. 2008), 6:1--6:26.
[31]
Ongaro, D., Rumble, S. M., Stutsman, R., Ousterhout, J., and Rosenblum, M. Fast crash recovery in ramcloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (New York, NY, USA, 2011), SOSP '11, ACM, pp. 29--41.
[32]
Palkar, S., Lan, C., Han, S., Jang, K., Panda, A., Ratnasamy, S., Rizzo, L., and Shenker, S. E2: A framework for nfv applications. In Proceedings of the 25th Symposium on Operating Systems Principles (New York, NY, USA, 2015), SOSP '15, ACM, pp. 121--136.
[33]
Panda, A., Lahav, O., Argyraki, K., Sagiv, M., and Shenker, S. Verifying isolation properties in the presence of middleboxes. arXiv preprint arXiv:1409.7687 (2014).
[34]
Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and Networks, J. Seamless Bidirectional Forwarding Detection (S-BFD). RFC 7880, July 2016.
[35]
Potharaju, R., and Jain, N. Demystifying the dark side of the middle: A field study of middlebox failures in datacenters. In Proceedings of the 2013 Conference on Internet Measurement Conference (New York, NY, USA, 2013), IMC '13, ACM, pp. 9--22.
[36]
Quinn, P., and Nadeau, T. Problem Statement for Service Function Chaining. RFC 7498, Apr. 2015.
[37]
Rajagopalan, S., Williams, D., and Jamjoom, H. Pico replication: A high availability framework for middleboxes. In Proceedings of the 4th Annual Symposium on Cloud Computing (New York, NY, USA, 2013), SOCC '13, ACM, pp. 1:1--1:15.
[38]
Rajagopalan, S., Williams, D., Jamjoom, H., and Warfield, A. Split/merge: System support for elastic execution in virtual middleboxes. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (Lombard, IL, 2013), USENIX, pp. 227--240.
[39]
Sahoo, S. K., Criswell, J., and Adve, V. An empirical study of reported bugs in server software with implications for automated bug diagnosis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (New York, NY, USA, 2010), ICSE '10, ACM, pp. 485--494.
[40]
Sherry, J., Gao, P. X., Basu, S., Panda, a., Krishnamurthy, A., Maciocco, C., Manesh, M., Martins, J. a., Ratnasamy, S., Rizzo, L., and Shenker, S. Rollback-recovery for middleboxes. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. 2015), 227--240.
[41]
Velner, Y., Alpernas, K., Panda, a., Rabinovich, a., Sagiv, M., Shenker, S., and Shoham, S. Some complexity results for stateful network verification. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (2016), Springer, pp. 811--830.
[42]
Wang, C., Chen, X., Jia, W., Li, B., Qiu, H., Zhao, S., and Cui, H. PLOVER: Fast, multi-core scalable virtual machine fault-tolerance. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (Renton, WA, 2018), USENIX Association, pp. 483--489.
[43]
Woo, S., Sherry, J., Han, S., Moon, S., Ratnasamy, S., and Shenker, S. Elastic scaling of stateful network functions. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (Renton, WA, 2018), USENIX Association, pp. 299--312.
[44]
Zhang, W., Liu, G., Zhang, W., Shah, N., Lopreiato, P., Todeschi, G., Ramakrishnan, K., and Wood, T. Opennetvm: A platform for high performance network service chains. In Proceedings of the 2016 Workshop on Hot Topics in Middleboxes and Network Function Virtualization (New York, NY, USA, 2016), HotMIddlebox '16, ACM, pp. 26--31.

Cited By

View all
  • (2024)L26GC: Evolving the Low-Latency Core for Future Cellular NetworksIEEE Internet Computing10.1109/MIC.2024.337665528:2(29-36)Online publication date: 18-Mar-2024
  • (2023)DEFT: Distributed, Elastic, and Fault-Tolerant State Management of Network Functions2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327813(1-7)Online publication date: 30-Oct-2023
  • (2023)Serpens: A High Performance FaaS Platform for Network FunctionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326327234:8(2448-2463)Online publication date: Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CoNEXT '18: Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies
December 2018
408 pages
ISBN:9781450360807
DOI:10.1145/3281411
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2018

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. availability
  2. fault-tolerance
  3. network functions (NF)
  4. resiliency
  5. service function chains (SFC)

Qualifiers

  • Research-article

Funding Sources

Conference

CoNEXT '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 198 of 789 submissions, 25%

Upcoming Conference

CoNEXT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)203
  • Downloads (Last 6 weeks)37
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)L26GC: Evolving the Low-Latency Core for Future Cellular NetworksIEEE Internet Computing10.1109/MIC.2024.337665528:2(29-36)Online publication date: 18-Mar-2024
  • (2023)DEFT: Distributed, Elastic, and Fault-Tolerant State Management of Network Functions2023 19th International Conference on Network and Service Management (CNSM)10.23919/CNSM59352.2023.10327813(1-7)Online publication date: 30-Oct-2023
  • (2023)Serpens: A High Performance FaaS Platform for Network FunctionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.326327234:8(2448-2463)Online publication date: Aug-2023
  • (2023)SAFE: Service Availability via Failure Elimination Through VNF ScalingIEEE/ACM Transactions on Networking10.1109/TNET.2022.323348831:5(2042-2057)Online publication date: Oct-2023
  • (2022)Kernel-Based Container File Access Control Architecture to Protect Important Application InformationElectronics10.3390/electronics1201005212:1(52)Online publication date: 23-Dec-2022
  • (2022)NHAM: An NFV High Availability Architecture for Building Fault-Tolerant Stateful Virtual Functions and ServicesProceedings of the 11th Latin-American Symposium on Dependable Computing10.1145/3569902.3569907(35-44)Online publication date: 21-Nov-2022
  • (2022)Multi-Resource VNF Deployment in a Heterogeneous CloudIEEE Transactions on Computers10.1109/TC.2020.304224771:1(81-91)Online publication date: 1-Jan-2022
  • (2022)On the Performance Benefits of Heterogeneous Virtual Network Function Execution Frameworks2022 IEEE 8th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft54395.2022.9844115(109-114)Online publication date: 27-Jun-2022
  • (2022)AIDA-DB: A Data Management Architecture for the Edge and Cloud Continuum2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC)10.1109/CCNC49033.2022.9700692(1-6)Online publication date: 8-Jan-2022
  • (2022)Summarization and Future WorkResource Allocation in Network Function Virtualization10.1007/978-981-19-4815-2_7(129-135)Online publication date: 30-Aug-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media