Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3603269.3604882acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Public Access

Unleashing SmartNIC Packet Processing Performance in P4

Published: 01 September 2023 Publication History

Abstract

SmartNICs are on the rise as a packet processing platform, with the trend towards a uniform P4 programming model. However, unleashing SmartNIC packet processing performance in P4 is a formidable task. Traditional SmartNIC optimizations rely on low-level program tuning, but P4 abstractions operate at one level above. At the same time, today's P4 optimizations primarily focus on resource packing rather than performance tuning. We develop Pipeleon, an automated performance optimization framework for P4 programmable SmartNICs. We introduce techniques that are tailored to the performance characteristics of SmartNICs, and further leverage dynamic workload patterns for profile-guided optimization. Pipeleon pinpoints program hotspots at the P4 level and computes runtime optimization plans to specialize the program layout based on the latest profile. We have prototyped Pipeleon and applied it to optimize two popular P4 SmartNICs---Nvidia BlueField2 and Netronome Agilio CX---as well as a software SmartNIC emulator extended based on BMv2. Our results show that Pipeleon significantly improves SmartNIC packet processing performance in realistic scenarios.

References

[1]
Accessed 2023. AMD Pensando Infrastructure Accelerators. (Accessed 2023). https://www.amd.com/en/accelerators/pensando.
[2]
Accessed 2023. Announcing Project Monterey---Redefining Hybrid Cloud Architecture. (Accessed 2023). https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html.
[3]
Accessed 2023. AWS Nitro System. (Accessed 2023). https://aws.amazon.com/ec2/nitro/.
[4]
Accessed 2023. Bringing the power of P4 to OvS! (Accessed 2023). https://github.com/osinstom/P4-OvS.
[5]
Accessed 2023. Disaggregated APIs for SONiC Hosts. (Accessed 2023). https://github.com/Azure/DASH.
[6]
Accessed 2023. Intel Tofino: P4-programmable Ethernet switch ASIC that delivers better performance at lower power. (Accessed 2023). https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-series.html.
[7]
Accessed 2023. IPU Based Cloud Infrastructure White Paper. (Accessed 2023). https://www.intel.com/content/www/us/en/products/docs/programmable/ipu-based-cloud-infrastructure-white-paper.html.
[8]
Accessed 2023. Netronome Agilio CX SmartNICs. (Accessed 2023). https://www.netronome.com/products/agilio-cx/.
[9]
Accessed 2023. NVIDIA BlueField Data Processing Units. (Accessed 2023). https://www.nvidia.com/en-us/networking/products/data-processing-unit.
[10]
Accessed 2023. NVIDIA DOCA Software Framework. Accelerate application development for the NVIDIA BlueField DPU. (Accessed 2023). https://developer.nvidia.com/networking/doca.
[11]
Accessed 2023. P4 behavioral model. (Accessed 2023). https://github.com/p4lang/behavioral-model.
[12]
Accessed 2023. P4 driver SW for P4 DPDK target. (Accessed 2023). https://github.com/p4lang/p4-dpdk-target.
[13]
Accessed 2023. P4 Portable NIC Architecture (PNA) version 0.5. (Accessed 2023). https://p4.org/p4-spec/docs/PNA.html.
[14]
Accessed 2023. trafgen---A fast, multithreaded network packet generator. (Accessed 2023). https://manpages.ubuntu.com/manpages/bionic/man8/trafgen.8.html.
[15]
Accessed 2023. TRex Traffic Generator. (Accessed 2023). https://trex-tgn.cisco.com/.
[16]
Accessed 2023. Zero-Copy Optimization for Alibaba Cloud Smart NIC Solution. (Accessed 2023). https://www.alibabacloud.com/blog/zero-copy-optimization-for-alibaba-cloud-smart-nic-solution593986.
[17]
Anubhavnidhi Abhashkumar, Jeongkeun Lee, Jean Tourrilhes, Sujata Banerjee, Wenfei Wu, Joon-Myung Kang, and Aditya Akella. 2017. P5: Policy-Driven Optimization of P4 Pipeline. In Proc. SOSR.
[18]
Deepak Bansal, Gerald DeGrace, Rishabh Tewari, Michal Zygmunt, James Grantham, Silvano Gai, Mario Baldi, Krishna Doddapaneni, Arun Selvarajan, Arunkumar Arumugam, Balakrishnan Raman, Avijit Gupta, Sachin Jain, Deven Jagasia, Evan Langlais, Pranjal Srivastava, Rishiraj Hazarika, Neeraj Motwani, Soumya Tiwari, Stewart Grant, Ranveer Chandra, and Srikanth Kandula. 2023. Disaggregating Stateful Network Functions. In Proc. NSDI.
[19]
Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications. In Proc. CGO.
[20]
Xiang Chen, Dong Zhang, and Haifeng Zhou. 2018. Matreduce: Towards High-Performance P4 Pipeline by Reducing Duplicate Match Operations. In Proc. GLOBECOM.
[21]
Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik, Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac Keslassy, Ariel Orda, and Tom Edsall. 2017. dRMT: Disaggregated Programmable Switching. In Proc. SIGCOMM.
[22]
Keith D Cooper and Linda Torczon. 2011. Engineering a compiler (2nd ed.). Elsevier. 231--232 pages.
[23]
Tianyi Cui, Wei Zhang, Kaiyuan Zhang, and Arvind Krishnamurthy. 2021. Offloading Load Balancers onto SmartNICs. In Proc. APSys.
[24]
Bangwen Deng, Wenfei Wu, and Linhai Song. 2020. Redundant Logic Elimination in Network Functions. In Proc. SOSR.
[25]
Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In Proc. ATC.
[26]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proc. NSDI.
[27]
Jiaqi Gao, Ennan Zhai, Hongqiang Harry Liu, Rui Miao, Yu Zhou, Bingchuan Tian, Chen Sun, Dennis Cai, Ming Zhang, and Minlan Yu. 2020. Lyra: A Cross-Platform Language and Compiler for Data Plane Programming on Heterogeneous ASICs. In Proc. SIGCOMM.
[28]
Xiangyu Gao, Taegyun Kim, Michael D Wong, Divya Raghunathan, Aatish Kishan Varma, Pravein Govindan Kannan, Anirudh Sivaraman, Srinivas Narayana, and Aarti Gupta. 2020. Switch Code Generation Using Program Synthesis. In Proc. SIGCOMM.
[29]
Xiangyu Gao, Divya Raghunathan, Ruijie Fang, Tao Wang, Xiaotong Zhu, Anirudh Sivaraman, Srinivas Narayana, and Aarti Gupta. 2023. CaT: A Solver-Aided Compiler for Packet-Processing Pipelines. In Proc. ASPLOS.
[30]
Richard E Hank, Wen-Mei W Hwu, and B Ramakrishna Rau. 1995. Region-Based Compilation: An Introduction and Motivation. In Proc. MICRO.
[31]
Mary Hogan, Shir Landau-Feibish, Mina Tahmasbi Arashloo, Jennifer Rexford, and David Walker. 2022. Modular Switch Programming Under Resource Constraints. In Proc. NSDI.
[32]
Joel Hypolite, John Sonchack, Shlomo Hershkop, Nathan Dautenhahn, André De-Hon, and Jonathan M Smith. 2020. DeepMatch: Practical Deep Packet Inspection in the Data Plane Using Network Processors. In Proc. CoNEXT.
[33]
Xin Jin, Jennifer Gossels, Jennifer Rexford, and David Walker. 2015. CoVisor: A Compositional Hypervisor for Software-Defined Networks. In Proc. NSDI.
[34]
Lavanya Jose, Lisa Yan, George Varghese, and Nick McKeown. 2015. Compiling Packet Programs to Reconfigurable Switches. In Proc. NSDI.
[35]
Junru Li, Youyou Lu, Qing Wang, Jiazhen Lin, Zhe Yang, and Jiwu Shu. 2022. AlNiCo: SmartNIC-Accelerated Contention-Aware Request Scheduling for Transaction Processing. In Proc. ATC.
[36]
Yifan Li, Jiaqi Gao, Ennan Zhai, Mengqi Liu, Kun Liu, and Hongqiang Harry Liu. 2022. Cetus: Releasing P4 Programmers from the Chore of Trial and Error Compiling. In Proc. NSDI.
[37]
Jiaxin Lin, Kiran Patel, Brent E Stephens, Anirudh Sivaraman, and Aditya Akella. 2020. PANIC: A High-Performance Programmable NIC for Multi-Tenant Networks. In Proc. OSDI.
[38]
Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading Distributed Applications onto SmartNICs Using iPipe. In Proc. SIGCOMM.
[39]
Ming Liu, Simon Peter, Arvind Krishnamurthy, and Phitchaya Mangpo Phothilimthana. 2019. E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers. In Proc. ATC.
[40]
Sebastiano Miano, Alireza Sanaee, Fulvio Risso, Gábor Rétvári, and Gianni Antichi. 2022. Domain Specific Runtime Optimization for Software Data Planes. In Proc. ASPLOS.
[41]
Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proc. SIGCOMM.
[42]
László Molnár, Gergely Pongrácz, Gábor Enyedi, Zoltán Lajos Kis, Levente Csikor, Ferenc Juhász, Attila Kőrösi, and Gábor Rétvári. 2016. Dataplane Specialization for High-Performance OpenFlow Software Switching. In Proc. SIGCOMM.
[43]
YoungGyoun Moon, SeungEon Lee, Muhammad Asim Jamshed, and KyoungSoo Park. 2020. AccelTCP: Accelerating Network Applications with Stateful TCP Offloading. In Proc. NSDI.
[44]
Tomasz Osiński, Halina Tarasiuk, Paul Chaignon, and Mateusz Kossakowski. 2020. P4rt-OVS: Programming Protocol-Independent, Runtime Extensions for Open vSwitch with P4. In Proc. IFIP Networking.
[45]
Guilherme Ottoni. 2018. HHVM JIT: A Profile-Guided, Region-Based Compiler for PHP and Hack. In Proc. PLDI.
[46]
Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. Bolt: A Practical Binary Optimizer for Data Centers and Beyond. In Proc. CGO.
[47]
Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, and Martin Casado. 2015. The Design and Implementation of Open vSwitch. In Proc. NSDI.
[48]
Yiming Qiu, Ryan Beckett, and Ang Chen. 2023. Synthesizing Runtime Programmable Switch Updates. In Proc. NSDI.
[49]
Yiming Qiu, Jiarong Xing, Kuo-Feng Hsu, Qiao Kang, Ming Liu, Srinivas Narayana, and Ang Chen. 2021. Automated SmartNIC Offloading Insights for Network Functions. In Proc. SOSP.
[50]
Fabian Ruffy, Tao Wang, and Anirudh Sivaraman. 2020. Gauntlet: Finding Bugs in Compilers for Programmable Packet Processing. In Proc. OSDI.
[51]
Henry N Schuh, Weihao Liang, Ming Liu, Jacob Nelson, and Arvind Krishnamurthy. 2021. Xenic: SmartNIC-Accelerated Distributed Transactions. In SOSP.
[52]
Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Nick McKeown, and Jennifer Rexford. 2016. PISCES: A Programmable, Protocol-Independent Software Switch. In Proc. SIGCOMM.
[53]
Rajath Shashidhara, Tim Stamler, Antoine Kaufmann, and Simon Peter. 2022. FlexTOE: Flexible TCP Offload with Fine-Grained Parallelism. In Proc. NSDI.
[54]
John Sonchack, Devon Loehr, Jennifer Rexford, and David Walker. 2021. Lucid: A Language for Control in the Data Plane. In Proc. SIGCOMM.
[55]
Toshio Suganuma, Toshiaki Yasue, and Toshio Nakatani. 2006. A Region-Based Compilation Technique for Dynamic Compilers. ACM Transactions on Programming Languages and Systems (TOPLAS) 28, 1 (2006), 134--174.
[56]
Nik Sultana, John Sonchack, Hans Giesen, Isaac Pedisich, Zhaoyang Han, Nishanth Shyamkumar, Shivani Burad, André DeHon, and Boon Thau Loo. 2021. Flightplan: Dataplane Disaggregation and Placement for P4 Programs. In Proc. NSDI.
[57]
Konstantin Taranov, Benjamin Rothenberger, Adrian Perrig, and Torsten Hoefler. 2020. sRDMA-Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access. In Proc. ATC.
[58]
Balázs Vass, Erika Bérczi-Kovács, Costin Raiciu, and Gábor Rétvári. 2020. Compiling Packet Programs to Reconfigurable Switches: Theory and Algorithms. In Proc. Europe P4.
[59]
Péter Vörös, Dániel Horpácsi, Róbert Kitlei, Dániel Leskó, Máté Tejfel, and Sándor Laki. 2018. T4P4S: A Target-Independent Compiler for Protocol-Independent Packet Processors. In Proc. HPSR.
[60]
Han Wang, Robert Soulé, Huynh Tu Dang, Ki Suh Lee, Vishal Shrivastav, Nate Foster, and Hakim Weatherspoon. 2017. P4FPGA: A Rapid Prototyping Framework for P4. In Proc. SOSR.
[61]
John Whaley. 2001. Partial Method Compilation Using Dynamic Profile Information. In Proc. OOPSLA.
[62]
Patrick Wintermeyer, Maria Apostolaki, Alexander Dietmüller, and Laurent Vanbever. 2020. P2GO: P4 Profile-Guided Optimizations. In Proc. HotNets.
[63]
Jiarong Xing, Kuo-Feng Hsu, Matty Kadosh, Alan Lo, Yonatan Piasetzky, Arvind Krishnamurthy, and Ang Chen. 2022. Runtime Programmable Switches. In Proc. NSDI.
[64]
Jiarong Xing, Yiming Qiu, Kuo-Feng Hsu, Hongyi Liu, Matty Kadosh, Alan Lo, Aditya Akella, Thomas Anderson, Arvind Krishnamurthy, TS Eugene Ng, and Ang Chen. 2021. A Vision for Runtime Programmable Networks. In Proc. HotNets.
[65]
Chaoliang Zeng, Layong Luo, Teng Zhang, Zilong Wang, Luyang Li, Wenchen Han, Nan Chen, Lebing Wan, Lichao Liu, Zhipeng Ding, Xiongfei Geng, Tao Feng, Feng Ning, Kai Chen, and Chuanxiong Guo. 2022. Tiara: A Scalable and Efficient Hardware Acceleration Architecture for Stateful Layer-4 Load Balancing. In Proc. NSDI.
[66]
Cheng Zhang, Jun Bi, Yu Zhou, Keyao Zhang, and Zijun Ma. 2018. B-Cache: A Behavior-Level Caching Framework for the Programmable Data Plane. In Proc. ISCC.
[67]
Kaiyuan Zhang, Danyang Zhuo, and Arvind Krishnamurthy. 2020. Gallium: Automated Software Middlebox Offloading to Programmable Switches. In Proc. SIGCOMM.
[68]
Zhipeng Zhao, Hugo Sadok, Nirav Atre, James C Hoe, Vyas Sekar, and Justine Sherry. 2020. Achieving 100Gbps Intrusion Prevention on A Single Server. In Proc. OSDI.

Cited By

View all
  • (2024)DDS: DPU-Optimized Disaggregated StorageProceedings of the VLDB Endowment10.14778/3681954.368200217:11(3304-3317)Online publication date: 30-Aug-2024
  • (2024)P4CGO: Control Plane Guided P4 Program OptimizationProceedings of the 2024 SIGCOMM Workshop on Formal Methods Aided Network Operation10.1145/3672199.3673892(1-7)Online publication date: 4-Aug-2024
  • (2024)An Integrated Solution for High-efficiency In-band Network TelemetryProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663425(115-121)Online publication date: 3-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
September 2023
1217 pages
ISBN:9798400702365
DOI:10.1145/3603269
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. SmartNICs
  2. P4
  3. runtime program optimization

Qualifiers

  • Research-article

Funding Sources

Conference

ACM SIGCOMM '23
Sponsor:
ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference
September 10, 2023
NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,103
  • Downloads (Last 6 weeks)64
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DDS: DPU-Optimized Disaggregated StorageProceedings of the VLDB Endowment10.14778/3681954.368200217:11(3304-3317)Online publication date: 30-Aug-2024
  • (2024)P4CGO: Control Plane Guided P4 Program OptimizationProceedings of the 2024 SIGCOMM Workshop on Formal Methods Aided Network Operation10.1145/3672199.3673892(1-7)Online publication date: 4-Aug-2024
  • (2024)An Integrated Solution for High-efficiency In-band Network TelemetryProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663425(115-121)Online publication date: 3-Aug-2024
  • (2024)Performance Modeling and Analysis of P4 Programmable Devices With General Service TimesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340481321:4(4543-4562)Online publication date: Aug-2024
  • (2024)A Technique for Secure Variant Calling on Human Genome Sequences Using SmartNICs2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00044(328-335)Online publication date: 7-Jul-2024
  • (2024)A Comprehensive Survey on SmartNICs: Architectures, Development Models, Applications, and Research DirectionsIEEE Access10.1109/ACCESS.2024.343720312(107297-107336)Online publication date: 2024
  • (2023)NVIDIA's Resource Transmutable Network Processing ASIC2023 IEEE Hot Chips 35 Symposium (HCS)10.1109/HCS59251.2023.10254697(1-14)Online publication date: 27-Aug-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media