Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3578338.3593571acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
abstract
Public Access

SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving

Published: 19 June 2023 Publication History

Abstract

The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to characterize the overheads imposed by the RPC mechanism (`RPC tax') when serving them on accelerators. Conventional RPC implementations implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While SmartNIC based solutions work well for simple applications, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC - a {control + data} path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency.

References

[1]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). USENIX Association, Boston, MA, 578--594.
[2]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association, Boston, MA, USA, 613--627.
[3]
Google. 2018. GRPC Framework. https://grpc.io/. [Online; accessed 17-Apr-2022].
[4]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter {RPCs} can be General and Fast. In Proceedings of the Conference on Networked Systems Design and Implementation (NSDI). USENIX Association, Boston, MA, USA, 1--16.
[5]
NVIDIA. 2022. CUDA GPUDirect RDMA. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html. [Online; accessed 17-Apr-2022].
[6]
Maroun Tork, Lina Maudlej, and Mark Silberstein. 2020. Lynx: A smartnic-driven accelerator-centric architecture for network servers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Association for Computing Machinery, New York, NY, USA, 117--131.

Cited By

View all
  • (2024)Toward GPU-centric Networking on Commodity HardwareProceedings of the 7th International Workshop on Edge Systems, Analytics and Networking10.1145/3642968.3654820(43-48)Online publication date: 22-Apr-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
June 2023
123 pages
ISBN:9798400700743
DOI:10.1145/3578338
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 51, Issue 1
    SIGMETRICS '23
    June 2023
    108 pages
    ISSN:0163-5999
    DOI:10.1145/3606376
    Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2023

Check for updates

Author Tags

  1. data path
  2. ml inference
  3. orchestration
  4. remote procedure call
  5. smartnic

Qualifiers

  • Abstract

Funding Sources

Conference

SIGMETRICS '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)29
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Toward GPU-centric Networking on Commodity HardwareProceedings of the 7th International Workshop on Edge Systems, Analytics and Networking10.1145/3642968.3654820(43-48)Online publication date: 22-Apr-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media