Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2658260.2658262acmconferencesArticle/Chapter ViewAbstractPublication PagesancsConference Proceedingsconference-collections
research-article

Marlin: a memory-based rack area network

Published: 20 October 2014 Publication History

Abstract

Disaggregation of hardware resources that are traditionally embedded within individual servers into separate resource pools is an emerging architectural trend in hyperscale data center design, as exemplified by Facebook's disaggregated rack architecture. This paper presents the design, implementation and evaluation of a PCIe-based rack area network system called Marlin, which is designed to support the communications needs of disaggregated racks. By virtue of being based on PCIe, Marlin presents a memory-based addressing model for both I/O device sharing among multiple hosts and inter-host communications, and as a result offers hardware- based remote direct memory access (HRDMA) as a first-class communications primitive between servers within a rack. Marlin supports socket-based communications for legacy network applications and cross-machine zero memory copying for applications designed specifically to take full advantage of Marlin. Empirical measurements on a fully operational Marlin prototype based on 4-lane Gen3 PCIe technology show that the one-way kernel-to-kernel latency is 8.5μsec and the end-to- end sustainable TCP throughput is 19.6 Gbps.

References

[1]
AMD SeaMicro SM15000 Fabric Compute Systems. http://www.seamicro.com/SM15000.
[2]
Calxeda ECX1000 Product Brief. http://www.calxeda.com/wp-content/uploads/2012/06/ECX1000-Product-Brief-612.pdf.
[3]
Intel shows off Rack Scale Architecture and Rack Disaggregation plans. http://semiaccurate.com/.
[4]
I/O Consolidation White Paper, NextIO, Inc. http://www.nextio.com/resources/files/wp-nextio-consolidation.pdf.
[5]
kontron VXFabric - PCI Express Switch Fabric for High Performance Embedded Computing. http://www.kontron.com/vxfabric_whitepaper.
[6]
Mellanox InfiniScale III. http://www.mellanox.com/related-docs/prod_siliconPB_InfiniScale_III.pdf.
[7]
Micron I/O Virtualization White Paper, Micron Technology, Inc. http://www.micron.com/~/media/Documents/Products/White%20Paper/micron_io_virtualization_wp.pdf.
[8]
Open Compute Project. http://opencompute.org/.
[9]
Open Compute Project Virtual IO Charter. http://www.opencompute.org/wp/wp-content/uploads/2012/10/Open_Compute_Project_Virtual_IO_Charter_2012-09--10.pdf .
[10]
PCI Express System Interconnect Software Architecture for x86-based Systems. http://www.idt.com/.
[11]
PEX 8717, PCI Express Gen 3 Switch, 16 Lanes, 10 Ports. www.plxtech.com/download/file/2221?
[12]
The Case for PCIe 3.0 Repeaters, PCI-SIG Developers Conference, 2011.
[13]
The opposite of virtualization: Calxeda's new quad-core ARM part for cloud servers. http://arstechnica.com/.
[14]
Multi-Root I/O Virtualization and Sharing 1.0 Specification, PCI-SIG, 2008.
[15]
Single-Root I/O Virtualization and Sharing Specification, Revision 1.0, PCI-SIG, 2008.
[16]
M. Ben-Yehuda, J. Mason, J. Xenidis, O. Krieger, L. Van~Doorn, J. Nakajima, A. Mallick, and E. Wahlig. Utilizing IOMMUs for virtualization in Linux and Xen. In OLS06.
[17]
M. A. Blumrich, C. Dubnicki, E. W. Felten, and K. Li. Protected, user-level dma for the shrimp network interface. In High-Performance Computer Architecture, 1996. Proceedings. Second International Symposium on, pages 154--165. IEEE, 1996.
[18]
M. A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. W. Felten, and J. Sandberg. Virtual memory mapped network interface for the SHRIMP multicomputer, volume~22. IEEE Computer Society Press, 1994.
[19]
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, and J. Wilkes. An implementation of the hamlyn sender-managed interface architecture. ACM SIGOPS Operating Systems Review, 1996.
[20]
J. Byrne, J. Chang, K. T. Lim, L. Ramirez, and P. Ranganathan. Power-efficient networking for balanced system designs: early experiences with pcie. In Proceedings of the 4th Workshop on Power-Aware Computing and Systems, page~3. ACM, 2011.
[21]
D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The virtual interface architecture. Micro, IEEE, 18(2):66--76, 1998.
[22]
B. Fink and R. Scott. nuttcp, v5. 3.1, 2006.
[23]
R. Gillett and R. Kaufmann. Using the memory channel network. Micro, IEEE, 17(1):19--25, 1997.
[24]
R. B. Gillett. Memory channel network for pci. Micro, IEEE, 16(1):12--18, 1996.
[25]
I. Granovsky. Optimizaing PCIe Port Performance, 2006.
[26]
T. Hanawa, T. Boku, S. Miura, M. Sato, and K. Arimoto. Pearl: Power-aware, dependable, and high-performance communication link using pci express. In Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom), pages 284--291. IEEE, 2010.
[27]
R. Hiremane. Intel Virtualization Technology for Directed I/O (Intel VT-d). Technology@ Intel Magazine, 4(10), 2007.
[28]
V. Krishnan. Towards an integrated io and clustering solution using pci express. In Cluster Computing, 2007 IEEE International Conference on, pages 259--266. IEEE, 2007.
[29]
S. Liang, R. Noronha, and D. K. Panda. Swapping to remote memory over infiniband: An approach using a high performance network block device. In Cluster Computing, 2005. IEEE International, pages 1--10. IEEE, 2005.
[30]
K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In ACM SIGARCH Computer Architecture News, volume~37, pages 267--278. ACM, 2009.
[31]
K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1--12. IEEE, 2012.
[32]
K. Malwankar, D. Talayco, and A. Ekici. PCI-Express Function Proxy, Oct. 1 2009. WO Patent WO/2009/120,798.
[33]
J. Mason. Intel PCIe NTB Driver: ntb\_tx\_copy\_task and ntb\_rx\_copy\_task . http://lxr.linux.no/linux
[34]
v3.9/drivers/ntb/ntb_transport.c.
[35]
M. Mathis and J. Mahdavi. Forward acknowledgement: Refining tcp congestion control. ACM SIGCOMM Computer Communication Review, 26(4):281--291, 1996.
[36]
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 38(2):69--74, 2008.
[37]
D. J. Miller, P. M. Watts, and A. W. Moore. Motivating future interconnects: a differential measurement analysis of pci latency. In Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, pages 94--103. ACM, 2009.
[38]
R. OpenFabrics. Protocols through ofed software.
[39]
PLX. Draco DMA Performance, 2013.
[40]
A. Rao. Seamicro technology overview. Technical report, Technical report, SeaMicro, 2010.
[41]
J. Regula. Multi-Root Sharing of Single-Root Input/Output Virtualization, Dec. 28 2010. US Patent App. 12/979,904.
[42]
D. Riley. System and Method for Multi-Host Sharing of a Single-Host Device, May~8 2012. US Patent 8,176,204.
[43]
L. Rizzo. netmap: a novel framework for fast packet i/o. In USENIX ATC, 2012.
[44]
M. J. Sullivan. Intel Xeon Processor C5500/C3500 Series Non-Transparent Bridge. Technology@ Intel Magazine, 2010.
[45]
J. Suzuki, Y. Hidaka, J. Higuchi, T. Baba, N. Kami, and T. Yoshikawa. Multi-Root Share of Single-Root I/O Virtualization (SR-IOV) Compliant PCI Express Device. In High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on, pages 25--31. IEEE, 2010.
[46]
A. Tirumala, F. Qin, J. Dugan, J. Ferguson, and K. Gibbs. Iperf: The TCP/UDP Bandwidth Measurement Tool. http://dast.nlanr.net/Projects, 2005.
[47]
C.-C. Tu, C. tang Lee, and T. cker Chiueh. Secure i/o device sharing among virtual machines on multiple hosts. In ACM ISCA'13.
[48]
T. Von~Eicken, A. Basu, V. Buch, and W. Vogels. U-net: a user-level network interface for parallel and distributed computing (includes url). In ACM SIGOPS Operating Systems Review, volume~29, pages 40--53. ACM, 1995.
[49]
P. Willmann, S. Rixner, and A. Cox. Protection Strategies for Direct Access to Virtualized I/O Devices. In USENIX Annual Technical Conference, pages 15--28, 2008.

Cited By

View all
  • (2023)Understanding the Performance Impact of Queue-Based Resource Allocation in Scalable Disaggregated Memory Systems2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00054(317-324)Online publication date: 18-Dec-2023
  • (2022)An ultra-low latency and compatible PCIe interconnect for rack-scale communicationProceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569128(232-244)Online publication date: 30-Nov-2022
  • (2022)Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00060(212-217)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Marlin: a memory-based rack area network

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ANCS '14: Proceedings of the tenth ACM/IEEE symposium on Architectures for networking and communications systems
    October 2014
    274 pages
    ISBN:9781450328395
    DOI:10.1145/2658260
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 October 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. mr-iov
    2. non transparent bridge
    3. pcie fabric
    4. rack disaggregation
    5. rack-area network
    6. rdma
    7. sr-iov

    Qualifiers

    • Research-article

    Conference

    ANCS '14

    Acceptance Rates

    ANCS '14 Paper Acceptance Rate 19 of 57 submissions, 33%;
    Overall Acceptance Rate 88 of 314 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Understanding the Performance Impact of Queue-Based Resource Allocation in Scalable Disaggregated Memory Systems2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC60832.2023.00054(317-324)Online publication date: 18-Dec-2023
    • (2022)An ultra-low latency and compatible PCIe interconnect for rack-scale communicationProceedings of the 18th International Conference on emerging Networking EXperiments and Technologies10.1145/3555050.3569128(232-244)Online publication date: 30-Nov-2022
    • (2022)Design and Evaluation of a Rack-Scale Disaggregated Memory Architecture For Data Centers2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00060(212-217)Online publication date: Dec-2022
    • (2019)Evaluation of a Disaggregated Rack SystemProceedings of the 10th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3343737.3343752(107-113)Online publication date: 19-Aug-2019
    • (2019)Dynamic Guest Memory Resizing - Paravirtualized Approach2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/EMPDP.2019.8671611(181-186)Online publication date: Feb-2019
    • (2019)EMF: Disaggregated GPUs in Datacenters for Efficiency, Modularity and Flexibility2019 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)10.1109/CCEM48484.2019.000-5(1-8)Online publication date: Sep-2019
    • (2018)dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2018.8342174(1093-1098)Online publication date: Mar-2018
    • (2018)Seamless Fail-over for PCIe Switched NetworksProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211895(101-111)Online publication date: 4-Jun-2018
    • (2018)Software-Defined “Hardware” Infrastructures: A Survey on Enabling Technologies and Open Research DirectionsIEEE Communications Surveys & Tutorials10.1109/COMST.2018.283473120:3(2454-2485)Online publication date: Nov-2019
    • (2017)Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2017.45(340-347)Online publication date: Dec-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media