research-article

Public Access

Cerebros: Evading the RPC Tax in Datacenters

Authors:

Arash Pourhabibi,

Mark Sutherland,

Alexandros Daglis,

Babak FalsafiAuthors Info & Claims

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 407 - 420

https://doi.org/10.1145/3466752.3480055

Published: 17 October 2021 Publication History

All formats PDF

Abstract

The emerging paradigm of microservices decomposes online services into fine-grained software modules frequently communicating over the datacenter network, often using Remote Procedure Calls (RPCs). Ongoing advancements in the network stack have exposed the RPC layer itself as a bottleneck, that we show accounts for 40–90% of a microservice’s total execution cycles. We break down the underlying modules that comprise production RPC layers and demonstrate, based on prior evidence, that CPUs can only expect limited improvements for such tasks, mandating a shift to hardware to remove the RPC layer as a limiter of microservice performance. Although recently proposed accelerators can efficiently handle a portion of the RPC layer, their overall benefit is limited by unnecessary CPU involvement, which occurs because the accelerators are architected as co-processors under the CPU’s control. Instead, we show that conclusively removing the RPC layer bottleneck requires all of the RPC layer’s modules to be executed by a NIC-attached hardware accelerator. We introduce Cerebros, a dedicated RPC processor that executes the Apache Thrift RPC layer and acts as an intermediary stage between the NIC and the microservice running on the CPU. Our evaluation using the DeathStarBench microservice suite shows that Cerebros reduces the CPU cycles spent in the RPC layer by 37–64 ×, yielding a 1.8–14 × reduction in total cycles expended per microservice request.

References

[1]

Mohammad Alizadeh, Albert G. Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference. 63–74.

Digital Library

[2]

Muhammad Shoaib Bin Altaf and David A. Wood. 2017. LogCA: A High-Level Performance Model for Hardware Accelerators. In Proceedings of the 44th International Symposium on Computer Architecture (ISCA). 375–388.

Digital Library

[3]

Ali Ansari, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2020. Divide and Conquer Frontend Bottleneck. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA). 65–78.

Digital Library

[4]

Apache Software Foundation. [n.d.]. Thrift. Retrieved August 16, 2019 from https://thrift.apache.org/

[5]

Nils Asmussen, Michael Roitzsch, and Hermann Härtig. 2019. M³x: Autonomous Accelerators via Context-Enabled Fast-Path Communication. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 617–632.

[6]

Grant Ayers, Nayana Prasad Nagendra, David I. August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2019. AsmDB: understanding and mitigating front-end stalls in warehouse-scale computers. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA). 462–473.

Digital Library

[7]

Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. Morgan & Claypool Publishers.

Digital Library

[8]

Luiz André Barroso, Mike Marty, David A. Patterson, and Parthasarathy Ranganathan. 2017. Attack of the killer microseconds. Commun. ACM 60, 4 (2017), 48–54.

Digital Library

[9]

Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A cloud-scale acceleration architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 7:1–7:13.

Digital Library

[10]

Adrian Cockcroft. 2015. Microservices the Good Bad and the Ugly. Retrieved August 16, 2019 from https://www.slideshare.net/adriancockcroft/microservices-the-good-bad-and-the-ugly

[11]

James Coleman. 2009. Reducing Interrupt Latency Through the Use of Message Signaled Interrupts. Retrieved March 28, 2020 from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/msg-signaled-interrupts-paper.pdf

[12]

NVIDIA Corp.2020. Developing a Linux Kernel Module using GPUDirect RDMA. Retrieved March 29, 2020 from https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

[13]

Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2015. Manycore network interfaces for in-memory rack-scale computing. In Proceedings of the 42nd International Symposium on Computer Architecture (ISCA). 567–579.

Digital Library

[14]

Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIV). 35–48.

Digital Library

[15]

Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday Naik, and Amin Vahdat. 2018. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI). 373–387.

[16]

Datacenter Knowledge. 2018. The Year of 100GbE in Data Center Networks. Retrieved November 19, 2020 from https://www.datacenterknowledge.com/networks/year-100gbe-data-center-networks

[17]

DPDK [n.d.]. Data Plane Development Kit. https://www.dpdk.org

[18]

Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI). 401–414.

[19]

Dave Dunning, Greg J. Regnier, Gary L. McAlpine, Don Cameron, Bill Shubert, Frank Berry, Anne Marie Merritt, Ed Gronke, and Chris Dodd. 1998. The Virtual Interface Architecture. IEEE Micro 18, 2 (1998), 66–76.

Digital Library

[20]

Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 345–362.

[21]

Facebook Inc.[n.d.]. Facebook Thrift. Retrieved November 19, 2020 from https://github.com/facebook/fbthrift

[22]

Michael Ferdman, Thomas F. Wenisch, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2008. Temporal instruction fetch streaming. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–10.

Digital Library

[23]

Daniel Firestone, Andrew Putnam, Sambrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian M. Caulfield, Eric S. Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert G. Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI). 51–66.

[24]

Yu Gan, Yanqi Zhang, Dailun Cheng, Ankitha Shetty, Priyal Rathi, Nayan Katarki, Ariana Bruno, Justin Hu, Brian Ritchken, Brendon Jackson, Kelvin Hu, Meghna Pancholi, Yuan He, Brett Clancy, Chris Colen, Fukang Wen, Catherine Leung, Siyuan Wang, Leon Zaruvinsky, Mateo Espinosa, Rick Lin, Zhongling Liu, Jake Padilla, and Christina Delimitrou. 2019. An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIV). 3–18.

Digital Library

[25]

Google. [n.d.]. FlatBuffers. Retrieved April 5, 2019 from https://google.github.io/flatbuffers/

[26]

Google. [n.d.]. gRPC. Retrieved April 16, 2021 from https://grpc.io/

[27]

Albert G. Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: a scalable and flexible data center network. In Proceedings of the ACM SIGCOMM 2009 Conference. 51–62.

Digital Library

[28]

Boris Grot, Joel Hestness, Stephen W. Keckler, and Onur Mutlu. 2009. Express Cube Topologies for on-Chip Interconnects. In Proceedings of the 15th IEEE Symposium on High-Performance Computer Architecture (HPCA). 163–174.

[29]

Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the ACM SIGCOMM 2016 Conference. 202–215.

Digital Library

[30]

Tom Halfhill. 2015. Oracle Shrinks Sparc M7. Linley Group Microprocessor Report (September 2015).

[31]

Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W. Moore, Gianni Antichi, and Marcin Wójcik. 2017. Re-architecting datacenter networks and stacks for low latency and high performance. In Proceedings of the ACM SIGCOMM 2017 Conference. 29–42.

Digital Library

[32]

Todd Hoff. 2016. Lessons Learned From Scaling Uber To 2000 Engineers, 1000 Services, And 8000 Git Repositories. Retrieved August 16, 2019 from http://highscalability.com/blog/2016/10/12/lessons-learned-from-scaling-uber-to-2000-engineers-1000-ser.html

[33]

Stephen Ibanez, Alex Mallery, Serhat Arslan, Theo Jepsen, Muhammad Shahbaz, Changhoon Kim, and Nick McKeown. 2021. The nanoPU: A Nanosecond Network Stack for Datacenters. In Proceedings of the 15th Symposium on Operating System Design and Implementation (OSDI). 239–256.

[34]

Intel. 2014. Introduction to Intel Ethernet Flow Director and Memcached Performance. https://www.intel.com/content/www/us/en/ethernet-products/converged-network-adapters/ethernet-flow-director.html

[35]

Intel Corp. 2016. Intel Xeon Processor D-1500 Product Family. https://cdrdv2.intel.com/v1/dl/getcontent/333423. (Date retrieved: 6 March 2020).

[36]

Jaeyoung Jang, Sungjun Jung, Sunmin Jeong, Jun Heo, Hoon Shin, Tae Jun Ham, and Jae W. Lee. 2020. A Specialized Architecture for Object Serialization with Applications to Big Data Analytics. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA). 322–334.

Digital Library

[37]

Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service fabric: a distributed platform for building microservices in the cloud. In Proceedings of the 2018 EuroSys Conference. 33:1–33:15.

Digital Library

[38]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the ACM SIGCOMM 2014 Conference. 295–306.

Digital Library

[39]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Annual Technical Conference (ATC). 437–450.

[40]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In Proceedings of the 12th Symposium on Operating System Design and Implementation (OSDI). 185–201.

[41]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2019. Datacenter RPCs can be General and Fast. In Proceedings of the 16th Symposium on Networked Systems Design and Implementation (NSDI). 1–16.

[42]

Svilen Kanev, Juan Pablo Darago, Kim M. Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David M. Brooks. 2016. Profiling a Warehouse-Scale Computer. IEEE Micro 36, 3 (2016), 54–59.

[43]

Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy H. Katz, Jonathan Bachrach, and Krste Asanovic. 2018. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud. In Proceedings of the 45th International Symposium on Computer Architecture (ISCA). 29–42.

Digital Library

[44]

Cansu Kaynak, Boris Grot, and Babak Falsafi. 2013. SHIFT: shared history instruction fetch for lean-core server processors. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 272–283.

Digital Library

[45]

Cansu Kaynak, Boris Grot, and Babak Falsafi. 2015. Confluence: unified instruction supply for scale-out servers. In Proceedings of the 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 166–177.

Digital Library

[46]

Kenton Varda, Sandstorm.io. [n.d.]. Cap’n Proto. Retrieved September 3, 2021 from https://capnproto.org

[47]

Tanvir Ahmed Khan, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2020. I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing. In Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 146–159.

[48]

Marios Kogias, George Prekas, Adrien Ghosn, Jonas Fietz, and Edouard Bugnion. 2019. R2P2: Making RPCs first-class datacenter citizens. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC). 863–880.

[49]

Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan M. G. Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. In Proceedings of the ACM SIGCOMM 2020 Conference. 514–528.

Digital Library

[50]

Rakesh Kumar, Boris Grot, and Vijay Nagarajan. 2018. Blasting through the Front-End Bottleneck with Shotgun. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXIII). 30–42.

Digital Library

[51]

Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, and Vijay Nagarajan. 2017. Boomerang: A Metadata-Free Architecture for Control Flow Delivery. In Proceedings of the 23rd IEEE Symposium on High-Performance Computer Architecture (HPCA). 493–504.

[52]

Nikita Lazarev, Shaojie Xiang, Neil Adit, Zhiru Zhang, and Christina Delimitrou. 2021. Dagger: Efficient and Fast RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs. In ASPLOS 2021. 36–51.

Digital Library

[53]

Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. 2019. Offloading distributed applications onto smartNICs using iPipe. In Proceedings of the ACM SIGCOMM 2019 Conference. 318–333.

Digital Library

[54]

Pejman Lotfi-Kamran, Boris Grot, Michael Ferdman, Stavros Volos, Yusuf Onur Koçberber, Javier Picorel, Almutaz Adileh, Djordje Jevdjic, Sachin Idgunji, Emre Özer, and Babak Falsafi. 2012. Scale-out processors. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA). 500–511.

[55]

Tony Mauro. 2015. Adopting Microservices at Netflix: Lessons for Architectural Design. Retrieved August 16, 2019 from https://www.nginx.com/blog/microservices-at-netflix-architectural-best-practices

[56]

Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John K. Ousterhout. 2018. Homa: a receiver-driven low-latency transport protocol using network priorities. In Proceedings of the ACM SIGCOMM 2018 Conference. 221–235.

Digital Library

[57]

Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2014. Scale-out NUMA. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XIX). 3–18.

[58]

Parallel Systems Architecture Lab (PARSA), EPFL. 2020. QFlex. https://qflex.epfl.ch

[59]

Arash Pourhabibi. 2021. Hardware-Software Co-Design of an RPC Processor. EPFL PhD Thesis (2021).

[60]

Arash Pourhabibi, Siddharth Gupta, Hussein Kassir, Mark Sutherland, Zilu Tian, Mario Paulo Drumond, Babak Falsafi, and Christoph Koch. 2020. Optimus Prime: Accelerating Data Transformation in Servers. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 1203–1216.

Digital Library

[61]

Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, and John K. Ousterhout. 2018. Arachne: Core-Aware Thread Management. In Proceedings of the 13th Symposium on Operating System Design and Implementation (OSDI). 145–160.

[62]

Deepti Raghavan, Philip Alexander Levis, Matei Zaharia, and Irene Zhang. 2021. Breakfast of champions: towards zero-copy serialization with NIC scatter-gather. In Proceedings of The 18th Workshop on Hot Topics in Operating Systems (HotOS-XVIII). 199–205.

Digital Library

[63]

Glenn Reinman, Brad Calder, and Todd M. Austin. 1999. Fetch Directed Instruction Prefetching. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 16–27.

[64]

Stephen M. Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, and John K. Ousterhout. 2011. It’s Time for Low Latency. In Proceedings of The 13th Workshop on Hot Topics in Operating Systems (HotOS-XIII).

[65]

Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David M. Brooks. 2016. Co-designing accelerators and SoC interfaces using gem5-Aladdin. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 48:1–48:12.

Digital Library

[66]

Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: integrating a file system with GPUs. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XVIII). 485–498.

Digital Library

[67]

Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network. In Proceedings of the ACM SIGCOMM 2015 Conference. 183–197.

Digital Library

[68]

James E. Smith. 1984. Decoupled Access/Execute Computer Architectures. ACM Trans. Comput. Syst. 2, 4 (1984), 289–308.

Digital Library

[69]

Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XXV). 733–750.

Digital Library

[70]

Akshitha Sriraman and Thomas F. Wenisch. 2018. μTune: Auto-Tuned Threading for OLDI Microservices. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018.177–194.

[71]

Mark Sutherland, Siddharth Gupta, Babak Falsafi, Virendra J. Marathe, Dionisios N. Pnevmatikatos, and Alexandros Daglis. 2020. The NEBULA RPC-Optimized Architecture. In Proceedings of the 47th International Symposium on Computer Architecture (ISCA). 199–212.

Digital Library

[72]

Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA). 53–65.

Digital Library

[73]

Bob Wheeler. 2011. Calxeda Spins 4W Server-on-a-Chip. Linley Group Microprocessor Report (November 2011).

[74]

Adam Wolnikowski, Stephen Ibanez, Jonathan Stone, Changhoon Kim, Rajit Manohar, and Robert Soulé. 2021. Zerializer: towards zero-copy serialization. In Proceedings of The 18th Workshop on Hot Topics in Operating Systems (HotOS-XVIII). 206–212.

Digital Library

[75]

Hao Zhou, Ming Chen, Qian Lin, Yong Wang, Xiaobin She, Sifan Liu, Rui Gu, Beng Chin Ooi, and Junfeng Yang. 2018. Overload Control for Scaling WeChat Microservices. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2018,Carlsbad, CA, USA, October 11-13, 2018. 149–161. https://doi.org/10.1145/3267809.3267823

Digital Library

Cited By

Nayak SRangwani VDubey KMondal RGupta TShah RGopalakrishnan VWang JAyyub Qazi ITyson G(2024)Poster: Reducing Data Movement Tax for Serialization in MicroservicesProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3699882(17-18)Online publication date: 9-Dec-2024
https://dl.acm.org/doi/10.1145/3680121.3699882
Mahapatra RGhodrati SAhn BKinzer SWang SXu HKarthikeyan LSharma HYazdanbakhsh AAlian MEsmaeilzadeh HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)In-Storage Domain-Specific Acceleration for Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640413(530-548)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640413
Luo LZhao GXu HChung CXie L(2024)SMART: Dual-channel Southbound Message Delivery in Clouds with Rate Estimation2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682836(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682836
Show More Cited By

Recommendations

Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

The ongoing shift of cloud services from monolithic designs to mi- croservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained work- loads. Commodity networking systems based on software ...
Optimus Prime: Accelerating Data Transformation in Servers
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Modern online services are shifting away from monolithic applications to loosely-coupled microservices because of their improved scalability, reliability, programmability and development velocity. Microservices communicating over the datacenter network ...
Altocumulus: Scalable Scheduling for Nanosecond-Scale Remote Procedure Calls
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

Online services in modern datacenters use Remote Procedure Calls (RPCs) to communicate between different software layers. Despite RPCs using just a few small functions, inefficient RPC handling can cause delays to propagate across the system and degrade ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 2021

1322 pages

ISBN:9781450385572

DOI:10.1145/3466752

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

MICRO '21

Sponsor:

SIGMICRO

MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture

October 18 - 22, 2021

Virtual Event, Greece

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,826
Total Downloads

Downloads (Last 12 months)695
Downloads (Last 6 weeks)118

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nayak SRangwani VDubey KMondal RGupta TShah RGopalakrishnan VWang JAyyub Qazi ITyson G(2024)Poster: Reducing Data Movement Tax for Serialization in MicroservicesProceedings of the 20th International Conference on emerging Networking EXperiments and Technologies10.1145/3680121.3699882(17-18)Online publication date: 9-Dec-2024
https://dl.acm.org/doi/10.1145/3680121.3699882
Mahapatra RGhodrati SAhn BKinzer SWang SXu HKarthikeyan LSharma HYazdanbakhsh AAlian MEsmaeilzadeh HTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)In-Storage Domain-Specific Acceleration for Serverless ComputingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640413(530-548)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640413
Luo LZhao GXu HChung CXie L(2024)SMART: Dual-channel Southbound Message Delivery in Clouds with Rate Estimation2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682836(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682836
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Patel NMamandipoor ANouri MAlian M(2024)SmartDIMM: In-Memory Acceleration of Upper Layer Protocols2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00032(312-329)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00032
Seyedroudbari HVanavasam SDaglis A(2023)Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071135(1045-1058)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071135
Yuan YHuang JSun YWang TNelson JPorts DWang YWang RTai CKim N(2023)Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071127(499-515)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071127
Stojkovic JXu TFranke HTorrellas J(2023)SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071120(814-827)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071120
Cao SDi Girolamo SHoefler T(2022)Accelerating Data Serialization/Deserialization Protocols with In-Network Compute2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00008(22-30)Online publication date: Nov-2022
https://doi.org/10.1109/ExaMPI56604.2022.00008

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents