Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3603269.3604833acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Cowbird: Freeing CPUs to Compute by Offloading the Disaggregation of Memory

Published: 01 September 2023 Publication History

Abstract

Memory disaggregation allows applications running on compute servers to expand their pool of available memory capacity by leveraging remote resources through low-latency networks. Unfortunately, in existing software-level disaggregation frameworks, the simple act of issuing requests to remote memory---paid on every access---can consume many CPU cycles. This overhead represents a direct cost to disaggregation, not only on the throughput of remote memory access but also on application logic, which must contend with the framework's CPU overheads.
In this paper, we present Cowbird, a memory disaggregation architecture that frees compute servers to fulfill their stated purpose by removing disaggregation-related logic from their CPUs. Our experimental evaluation shows that Cowbird eliminates dis-aggregation overhead on compute-server CPUs and can improve end-to-end application performance by up to 3.5× compared to RDMA-only communication.

References

[1]
Bluefield smartnic. https://network.nvidia.com/files/doc-2020/pb-bluefield-smart-nic.pdf, 2022.
[2]
Compute express link: The breakthrough cpu-to-device interconnect. https://www.computeexpresslink.org, 2022.
[3]
Intel tofino. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-series.html, 2022.
[4]
Marcos K. Aguilera, Kimberly Keeton, Stanko Novakovic, and Sharad Singhal. Designing far memory data structures: Think outside the box. In Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS 2019, Bertinoro, Italy, May 13-15, 2019, pages 120--126. ACM, 2019.
[5]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. Can far memory improve job throughput? In Angelos Bilas, Kostas Magoutis, Evangelos P. Markatos, Dejan Kostic, and Margo I. Seltzer, editors, EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece, April 27-30, 2020, pages 14:1--14:16. ACM, 2020.
[6]
Pradeep Ambati, Inigo Goiri, Felipe Frujeri, Alper Gun, Ke Wang, Brian Dolan, Brian Corell, Sekhar Pasupuleti, Thomas Moscibroda, Sameh Elnikety, Marcus Fontoura, and Ricardo Bianchini. Providing slos for resource-harvesting vms in cloud platforms. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 735--751. USENIX Association, November 2020.
[7]
Alexey Andreyev. Introducing data center fabric, the next-generation facebook data center network. https://goo.gl/rE8wkL, 2014. Facebook.
[8]
Sebastian Angel, Mihir Nanavati, and Siddhartha Sen. Disaggregation and the application. In Amar Phanishayee and Ryan Stutsman, editors, 12th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2020, July 13-14, 2020. USENIX Association, 2020.
[9]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 53--64, 2012.
[10]
Broadcom. Rdma over converged ethernet (roce). https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernetnic-controllers/bcm957xxx/adapters/RDMA-over-Converged-Ethernet.html, 2022.
[11]
Amanda Carbonari and Ivan Beschasnikh. Tolerating faults in disaggregated datacenters. In Sujata Banerjee, Brad Karp, and Michael Walfish, editors, Proceedings of the 16th ACM Workshop on Hot Topics in Networks, Palo Alto, CA, USA, HotNets 2017, November 30 - December 01, 2017, pages 164--170. ACM, 2017.
[12]
Adrian Caulfield, Paolo Costa, and Monia Ghobadi. Beyond smartnics: Towards a fully programmable cloud. In 2018 IEEE 19th International Conference on High Performance Switching and Routing (HPSR), pages 1--6. IEEE, 2018.
[13]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143--154, 2010.
[14]
Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. Farm: Fast remote memory. In Ratul Mahajan and Ion Stoica, editors, Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014, Seattle, WA, USA, April 2-4, 2014, pages 401--414. USENIX Association, 2014.
[15]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. Azure accelerated networking:{SmartNICs} in the public cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 51--66, 2018.
[16]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. Direct access, High-Performance memory disaggregation with DirectCXL. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 287--294, Carlsbad, CA, July 2022. USENIX Association.
[17]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. Efficient memory disaggregation with infiniswap. In Aditya Akella and Jon Howell, editors, 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017, pages 649--667. USENIX Association, 2017.
[18]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. RDMA over commodity ethernet at scale. In Marinho P. Barcellos, Jon Crowcroft, Amin Vahdat, and Sachin Katti, editors, Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, pages 202--215. ACM, 2016.
[19]
Zhiyuan Guo, Yizhou Shan, Xuhao Luo, Yutong Huang, and Yiying Zhang. Clio: a hardware-software co-designed disaggregated memory system. In Babak Falsafi, Michael Ferdman, Shan Lu, and Thomas F. Wenisch, editors, ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022, pages 417--433. ACM, 2022.
[20]
Sangjin Han, Norbert Egi, Aurojit Panda, Sylvia Ratnasamy, Guangyu Shi, and Scott Shenker. Network support for resource disaggregation in next-generation datacenters. In Dave Levine, Sachin Katti, and Dave Oran, editors, Twelfth ACM Workshop on Hot Topics in Networks, HotNets-XII, College Park, MD, USA, November 21-22, 2013, pages 10:1--10:7. ACM, 2013.
[21]
Daehyeok Kim, Amir Saman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu, Jitu Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, and Srinivasan Seshan. Hyperloop: group-based nic-offloading to accelerate replicated transactions in multi-tenant storage systems. In Sergey Gorinsky and János Tapolcai, editors, Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2018, Budapest, Hungary, August 20-25, 2018, pages 297--312. ACM, 2018.
[22]
Jongyul Kim, Insu Jang, Waleed Reda, Jaeseong Im, Marco Canini, Dejan Kostić, Youngjin Kwon, Simon Peter, and Emmett Witchel. Linefs: Efficient smartnic offload of a distributed file system with pipeline parallelism. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, pages 756--771, 2021.
[23]
Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. MIND: in-network memory management for disaggregated data centers. In Robbert van Renesse and Nickolai Zeldovich, editors, SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event / Koblenz, Germany, October 26-29, 2021, pages 488--504. ACM, 2021.
[24]
Kevin T. Lim, Jichuan Chang, Trevor N. Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In Stephen W. Keckler and Luiz André Barroso, editors, 36th International Symposium on Computer Architecture (ISCA 2009), June 20-24, 2009, Austin, TX, USA, pages 267--278. ACM, 2009.
[25]
Ming Liu, Tianyi Cui, Henry Schuh, Arvind Krishnamurthy, Simon Peter, and Karan Gupta. Offloading distributed applications onto smartnics using ipipe. In Proceedings of the ACM Special Interest Group on Data Communication, pages 318--333. 2019.
[26]
Hasan Al Maruf and Mosharaf Chowdhury. Effectively prefetching remote memory with leap. In Ada Gavrilovska and Erez Zadok, editors, 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020, pages 843--857. USENIX Association, 2020.
[27]
Microsoft. Faster: Fast persistent recoverable log and key-value store + cache, in c# and c++. https://microsoft.github.io/FASTER/, 2022.
[28]
Nvidia. Nvidia quantum infiniband platform. https://www.nvidia.com/en-us/networking/products/infiniband/, 2022.
[29]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. Shenango: Achieving high CPU efficiency for latency-sensitive datacenter workloads. In Jay R. Lorch and Minlan Yu, editors, 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26-28, 2019, pages 361--378. USENIX Association, 2019.
[30]
Tian Pan, Nianbing Yu, Chenhao Jia, Jianwen Pi, Liang Xu, Yisong Qiao, Zhiguo Li, Kun Liu, Jie Lu, Jianyuan Lu, et al. Sailfish: Accelerating cloud-scale multi-tenant multi-service gateways with programmable switches. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, pages 194--206, 2021.
[31]
Waleed Reda, Marco Canini, Dejan Kostic, and Simon Peter. RDMA is turing complete, we just did not know it yet! In Amar Phanishayee and Vyas Sekar, editors, 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, Renton, WA, USA, April 4-6, 2022, pages 71--85. USENIX Association, 2022.
[32]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay. AIFM: high-performance, application-integrated far memory. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 315--332. USENIX Association, 2020.
[33]
Samsung. Samsung unveils industry-first memory module incorporating new cxl interconnect standard. https://semiconductor.samsung.com/newsroom/news/samsung-unveils-industry-first-memory-module-incorporating-new-cxl-interconnect-standard/, 2021.
[34]
Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan Ports, and Peter Richtarik. Scaling distributed machine learning with In-Network aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 785--808. USENIX Association, April 2021.
[35]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In Andrea C. Arpaci-Dusseau and Geoff Voelker, editors, 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, pages 69--87. USENIX Association, 2018.
[36]
David Sidler, Zeke Wang, Monica Chiosa, Amit Kulkarni, and Gustavo Alonso. Strom: smart remote memory. In Angelos Bilas, Kostas Magoutis, Evangelos P. Markatos, Dejan Kostic, and Margo I. Seltzer, editors, EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece, April 27-30, 2020, pages 29:1--29:16. ACM, 2020.
[37]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. SIGCOMM Comput. Commun. Rev., 45(4):183--197, aug 2015.
[38]
Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, Shan Muthukrishnan, and Jennifer Rexford. Heavy-hitter detection entirely in the data plane. In Proceedings of the Symposium on SDN Research, pages 164--176, 2017.
[39]
Shin-Yeh Tsai and Yiying Zhang. LITE kernel RDMA support for datacenter applications. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017, pages 306--324. ACM, 2017.
[40]
Chenxi Wang, Haoran Ma, Shi Liu, Yuanqi Li, Zhenyuan Ruan, Khanh Nguyen, Michael D. Bond, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. Semeru: A memory-disaggregated managed runtime. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020, pages 261--280. USENIX Association, 2020.
[41]
Qing Wang, Youyou Lu, and Jiwu Shu. Sherman: A write-optimized distributed b+tree index on disaggregated memory. In Zachary Ives, Angela Bonifati, and Amr El Abbadi, editors, SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pages 1033--1048. ACM, 2022.
[42]
Nofel Yaseen, John Sonchack, and Vincent Liu. Synchronized network snapshots. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 402--416, 2018.
[43]
Liangcheng Yu, John Sonchack, and Vincent Liu. Mantis: Reactive programmable switches. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 296--309, 2020.
[44]
Liangcheng Yu, John Sonchack, and Vincent Liu. OrbWeaver: Using IDLE cycles in programmable networks for opportunistic coordination. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 1195--1212, Renton, WA, April 2022. USENIX Association.
[45]
Qiao Zhang, Vincent Liu, Hongyi Zeng, and Arvind Krishnamurthy. High-resolution measurement of data center microbursts. In Proceedings of the 2017 Internet Measurement Conference, pages 78--85, 2017.
[46]
Qizhen Zhang, Phil Bernstein, Daniel S. Berger, Badrish Chandramouli, Boon Thao Loo, and Vincent Liu. Compucache: Remote computable caching using spot vms. In Conference on Innovative Data Systems Research (CIDR 2022), January 2022.
[47]
Qizhen Zhang, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli. Redy: Remote dynamic memory cache. Proc. VLDB Endow., 15(4):766--779, 2021.
[48]
Qizhen Zhang, Yifan Cai, Xinyi Chen, Sebastian Angel, Ang Chen, Vincent Liu, and Boon Thau Loo. Understanding the effect of data center resource disaggregation on production dbmss. Proc. VLDB Endow., 13(9):1568--1581, 2020.
[49]
Qizhen Zhang, Xinyi Chen, Sidharth Sankhe, Zhilei Zheng, Ke Zhong, Sebastian Angel, Ang Chen, Vincent Liu, and Boon Thau Loo. Optimizing data-intensive systems in disaggregated data centers with TELEPORT. In Zachary Ives, Angela Bonifati, and Amr El Abbadi, editors, SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pages 1345--1359. ACM, 2022.
[50]
Yingqiang Zhang, Chaoyi Ruan, Cheng Li, Jimmy Yang, Wei Cao, Feifei Li, Bo Wang, Jing Fang, Yuhui Wang, Jingze Huo, and Chao Bi. Towards cost-effective and elastic cloud database deployment via memory disaggregation. Proc. VLDB Endow., 14(10):1900--1912, 2021.

Cited By

View all
  • (2024)DDS: DPU-Optimized Disaggregated StorageProceedings of the VLDB Endowment10.14778/3681954.368200217:11(3304-3317)Online publication date: 30-Aug-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
  • (2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference
September 2023
1217 pages
ISBN:9798400702365
DOI:10.1145/3603269
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. memory disaggregation
  2. RDMA
  3. compute offload
  4. programmable networks
  5. spot VMS
  6. P4 programmable switches
  7. SmartNICs

Qualifiers

  • Research-article

Funding Sources

  • SAMSUNG
  • GOOGLE
  • NSF

Conference

ACM SIGCOMM '23
Sponsor:
ACM SIGCOMM '23: ACM SIGCOMM 2023 Conference
September 10, 2023
NY, New York, USA

Acceptance Rates

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)606
  • Downloads (Last 6 weeks)36
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DDS: DPU-Optimized Disaggregated StorageProceedings of the VLDB Endowment10.14778/3681954.368200217:11(3304-3317)Online publication date: 30-Aug-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 2-May-2024
  • (2024)Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated MemoryACM Transactions on Architecture and Code Optimization10.1145/366600421:3(1-26)Online publication date: 27-May-2024

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media