Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472716.3472855acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
poster

A cloud-scale per-flow backpressure system via FPGA-based heavy hitter detection

Published: 23 August 2021 Publication History

Abstract

Virtual private clouds provide sharing resources to a massive number of tenants for economics of scale. In such clouds, off-the-shelf x86 boxes are widely deployed as network intermediate nodes. However, due to rapid growth of cloud traffic and significant slowdown of CPU improvement in recent years, although horizontal scaling is still leveraged, CPU overload and packet losses caused by heavy hitters are occasionally observed in production environment, which seriously damage tenant's SLAs. To address this, we propose a cloud-scale per-flow backpressure system designed in Alibaba Cloud. The basic idea is to (1) trigger the heavy-hitter flow acquisition at the intermediate node in an on-demand manner only when the CPU utilization exceeds a predefined threshold and (2) backpressure the identified heavy-hitter flow to the traffic source via rate limiting at sender's NIC or hypervisor. To handle the extremely large traffic rate of cloud traffic, we leverage a high-speed FPGA for heavy hitter detection acceleration. To accommodate highly concurrent flows in the cloud, we design a hierarchical memory system for accurate heavy hitter counting during a large time window. Under the per-flow backpressure mechanism, the rate of the heavy-hitter flow is accurately throttled while the rate of mice flows is completely unaffected during the backpressure.

References

[1]
2021. Virtex UltraScale+. https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale-plus.html. (2021).
[2]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center tcp (dctcp). In Proceedings of the ACM SIGCOMM 2010 Conference. 63–74.
[3]
David Barach, Leonardo Linguaglossa, Damjan Marion, Pierre Pfister, Salvatore Pontarelli, and Dario Rossi. 2018. High-speed software data plane via vectorized packet processing. IEEE Communications Magazine 56, 12 (2018), 97–103.
[4]
Tom Barbette, Cyril Soldani, and Laurent Mathy. 2015. Fast userspace packet processing. In 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). IEEE, 5–16.
[5]
Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Jennifer Rexford, Ori Rottenstreich, Steven A Monetti, and Tzuu-Yi Wang. 2019. Fine-grained queue measurement in the data plane. In Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies. 15–29.
[6]
Graham Cormode and Shan Muthukrishnan. 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58–75.
[7]
Prateesh Goyal, Preey Shah, Kevin Zhao, Georgios Nikolaidis, Mohammad Alizadeh, and Thomas E. Anderson. 2021. Backpressure Flow Control. (2021). [arxiv]cs.NI/1909.09923
[8]
Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. kvm: the Linux virtual machine monitor. In Proceedings of the Linux symposium, Vol. 1. Dttawa, Dntorio, Canada, 225–230.
[9]
Praveen Kumar, Nandita Dukkipati, Nathan Lewis, Yi Cui, Yaogong Wang, Chonggang Li, Valas Valancius, Jake Adriaens, Steve Gribble, Nate Foster, et al. 2019. PicNIC: predictable virtualized NIC. In Proceedings of the ACM Special Interest Group on Data Communication. 351–366.
[10]
Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert Greenberg, David A Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, et al. 2013. Ananta: Cloud scale load balancing. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 207–218.
[11]
Lawrence G Roberts. 2000. Beyond Moore's law: Internet growth trends. Computer 33, 1 (2000), 117–119.
[12]
Neil C Thompson and Svenja Spanuth. 2021. The decline of computers as a general purpose technology. Commun. ACM 64, 3 (2021), 64–72.
[13]
Timothy Wood, Prashant J Shenoy, Alexandre Gerber, Jacobus E van der Merwe, and Kadangode K Ramakrishnan. 2009. The Case for Enterprise-Ready Virtual Private Clouds. In HotCloud.

Cited By

View all
  • (2023)RateSheriff: Multipath Flow-aware and Resource Efficient Rate Limiter Placement for Data Center Networks2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188742(01-10)Online publication date: 19-Jun-2023
  • (2021)Bridging Network and Parallel I/O Research for Improving Data-Intensive Distributed Applications2021 IEEE Workshop on Innovating the Network for Data-Intensive Science (INDIS)10.1109/INDIS54524.2021.00011(50-56)Online publication date: Nov-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '21: Proceedings of the SIGCOMM '21 Poster and Demo Sessions
August 2021
94 pages
ISBN:9781450386296
DOI:10.1145/3472716
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2021

Check for updates

Author Tags

  1. FPGA
  2. heavy hitter detection
  3. per-flow backpressure

Qualifiers

  • Poster

Conference

SIGCOMM '21
Sponsor:
SIGCOMM '21: ACM SIGCOMM 2021 Conference
August 23 - 27, 2021
Virtual Event

Acceptance Rates

SIGCOMM '21 Paper Acceptance Rate 30 of 56 submissions, 54%;
Overall Acceptance Rate 92 of 158 submissions, 58%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)RateSheriff: Multipath Flow-aware and Resource Efficient Rate Limiter Placement for Data Center Networks2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS)10.1109/IWQoS57198.2023.10188742(01-10)Online publication date: 19-Jun-2023
  • (2021)Bridging Network and Parallel I/O Research for Improving Data-Intensive Distributed Applications2021 IEEE Workshop on Innovating the Network for Data-Intensive Science (INDIS)10.1109/INDIS54524.2021.00011(50-56)Online publication date: Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media