Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3627703.3650062acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Halflife: An Adaptive Flowlet-based Load Balancer with Fading Timeout in Data Center Networks

Published: 22 April 2024 Publication History

Abstract

Modern data centers (DCs) employ various traffic load balancers to achieve high bisection bandwidth. Among them, flowlet switching has shown remarkable performance in both load balancing and upper-layer protocol (e.g., TCP) friendliness. However, flowlet-based load balancers suffer from the inflexibility of flowlet timeout value (FTV) and result in sub-optimal performance under various application workloads. To this end, we propose Halflife, a novel flowlet-based load balancer that leverages fading FTVs to reroute traffic promptly under different workloads without any prior knowledge. Halflife not only balances traffic better, but also avoids the performance degradation caused by frequent oscillation or shifting of lows between paths. Furthermore, Halflife's fading mechanism is not only compatible with most flowlet-based load balancers, such as CONGA and LetFlow, but also improves their performance when leveraging flowlet switching in RDMA network. Through testbed experiments and simulations, we prove that Halflife improves the performance of CONGA and LetFlow by 10% ~ 150%, and it outperforms other load balancers by 30% ~ 200% across most application workloads.

References

[1]
Mohammad Alizadeh and Tom Edsall. 2013. On the Data Path Performance of Leaf-Spine Datacenter Fabrics. In 2013 IEEE 21st Annual Symposium on High-Performance Interconnects. 71--74. https://doi.org/10.1109/HOTI.2013.23
[2]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. ACM SIGCOMM computer communication review 38, 4 (2008), 63--74.
[3]
Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat, et al. 2010. Hedera: dynamic flow scheduling for data center networks. In Nsdi, Vol. 10. San Jose, USA, 89--92.
[4]
Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication. 51--62.
[5]
Christian Hopps et al. 2000. Analysis of an equal-cost multi-path algorithm. Technical Report. RFC 2992, Internet Engineering Task Force.
[6]
Soudeh Ghorbani, Zibin Yang, P Brighten Godfrey, Yashar Ganjali, and Amin Firoozshahian. 2017. Drill: Micro load balancing for low-latency data center networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 225--238.
[7]
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo, Guohan Lu, Lihua Yuan, Yixin Zheng, Haitao Wu, Yongqiang Xiong, and Dave Maltz. 2013. Per-packet load-balanced, low-latency routing for clos-based data center networks. In Proceedings of the ninth ACM conference on Emerging networking experiments and technologies. 49--60.
[8]
Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowdhury. 2017. Resilient datacenter load balancing in the wild. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication. 253--266.
[9]
Srikanth Kandula, Dina Katabi, Shantanu Sinha, and Arthur Berger. 2007. Dynamic load balancing without packet reordering. ACM SIGCOMM Computer Communication Review 37, 2 (2007), 51--62.
[10]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, et al. 2014. CONGA: Distributed congestion-aware load balancing for datacenters. In Proceedings of the 2014 ACM conference on SIGCOMM. 503--514.
[11]
Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. Hula: Scalable load balancing using programmable data planes. In Proceedings of the Symposium on SDN Research. 1--12.
[12]
Erico Vanini, Rong Pan, Mohammad Alizadeh, Parvin Taheri, and Tom Edsall. 2017. Let it flow: Resilient asymmetric load balancing with flowlet switching. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 407--420.
[13]
Cristian Hernandez Benet and Andreas J Kassler. 2019. FlowDyn: Towards a Dynamic Flowlet Gap Detection using Programmable Data Planes. In 2019 IEEE 8th International Conference on Cloud Networking (CloudNet). IEEE, 1--7.
[14]
Jonathan Perry, Hari Balakrishnan, and Devavrat Shah. 2017. Flowtune: Flowlet control for datacenter networks. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 421--435.
[15]
Zhiqiang Guo, Xiaodong Dong, Sheng Chen, Xiaobo Zhou, and Keqiu Li. 2018. EasyLB: Adaptive Load Balancing Based on Flowlet Switching for Wireless Sensor Networks. Sensors 18, 9 (2018), 3060.
[16]
Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. 2013. B4: Experience with a globally-deployed software defined WAN. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 3--14.
[17]
Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. 2013. Achieving high utilization with software-driven WAN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM. 15--26.
[18]
Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. 2011. Improving datacenter performance and robustness with multipath TCP. ACM SIGCOMM Computer Communication Review 41, 4 (2011), 266--277.
[19]
Yanpei Chen, Rean Griffith, Junda Liu, Randy H Katz, and Anthony D Joseph. 2009. Understanding TCP incast throughput collapse in data-center networks. In Proceedings of the 1st ACM workshop on Research on enterprise networking. 73--82.
[20]
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. 2009. The nature of data center traffic: measurements & analysis. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement. 202--208.
[21]
Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic characteristics of data centers in the wild. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. 267--280.
[22]
Linux Foundation. 2015. Data Plane Development Kit (DPDK). (2015). http://www.dpdk.org
[23]
D. Shan, W. Jiang, and F. Ren. 2017. Analyzing and Enhancing Dynamic Threshold Policy of Data Center Switches. IEEE Transactions on Parallel and Distributed Systems 28, 9 (Sept 2017), 2454--2470. https://doi.org/10.1109/TPDS.2017.2671429
[24]
Advait Dixit, Pawan Prakash, Y Charlie Hu, and Ramana Rao Kompella. 2013. On the impact of packet spraying in data center networks. In 2013 Proceedings IEEE INFOCOM. IEEE, 2130--2138.
[25]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center tcp (dctcp). In Proceedings of the ACM SIGCOMM 2010 Conference. 63--74.
[26]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 123--137.
[27]
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. 2018. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 221--235.
[28]
Michael Mitzenmacher. 2001. The power of two choices in randomized load balancing. IEEE Transactions on Parallel and Distributed Systems 12, 10 (2001), 1094--1104.
[29]
Fujie Fan, Hangyu Meng, Bing Hu, Kwan L Yeung, and Zhifeng Zhao. 2021. Roulette wheel balancing algorithm with dynamic flowlet switching for multipath datacenter networks. IEEE/ACM Transactions on Networking 29, 2 (2021), 834--847.
[30]
Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44, 3 (2014), 87--95.
[31]
Zirui Liu, Yikai Zhao, Zhuochen Fan, Tong Yang, Xiaodong Li, Ruwen Zhang, Kaicheng Yang, Zihan Jiang, Zheng Zhong, Yi Huang, et al. 2023. Burstbalancer: Do less, better balance for large-scale data center traffic. IEEE Transactions on Parallel and Distributed Systems (2023).
[32]
Morteza Kheirkhah, Ian Wakeman, and George Parisis. 2016. MMPTCP: A multipath transport protocol for data centers. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications. IEEE, 1--9.
[33]
Guo Chen, Yuanwei Lu, Yuan Meng, Bojie Li, Kun Tan, Dan Pei, Peng Cheng, Layong Larry Luo, Yongqiang Xiong, Xiaoliang Wang, et al. 2016. Fast and cautious: Leveraging multi-path diversity for transport loss recovery in data centers. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 29--42.
[34]
Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, Leon Poutievski, Arjun Singh, and Amin Vahdat. 2014. WCMP: Weighted cost multipathing for improved fairness in data centers. In Proceedings of the Ninth European Conference on Computer Systems. 1--14.
[35]
Keqiang He, Eric Rozner, Kanak Agarwal, Wes Felter, John Carter, and Aditya Akella. 2015. Presto: Edge-based load balancing for fast data-center networks. ACM SIGCOMM Computer Communication Review 45, 4(2015), 465--478.

Index Terms

  1. Halflife: An Adaptive Flowlet-based Load Balancer with Fading Timeout in Data Center Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
    April 2024
    1245 pages
    ISBN:9798400704376
    DOI:10.1145/3627703
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    EuroSys '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 315
      Total Downloads
    • Downloads (Last 12 months)315
    • Downloads (Last 6 weeks)62
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media