research-article

DCS: a fast and scalable device-centric server architecture

Authors:

Mohammadamin Ajdari,

Jangwoo KimAuthors Info & Claims

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Pages 559 - 571

https://doi.org/10.1145/2830772.2830794

Published: 05 December 2015 Publication History

Abstract

Conventional servers have achieved high performance by employing fast CPUs to run compute-intensive workloads, while making operating systems manage relatively slow I/O devices through memory accesses and interrupts. However, as the emerging workloads are becoming heavily data-intensive and the emerging devices (e.g., NVM storage, high-bandwidth NICs, and GPUs) come to enable low-latency and high-bandwidth device operations, the traditional host-centric server architectures fail to deliver high performance due to their inefficient device handling mechanisms. Furthermore, without resolving the architecture inefficiency, the performance loss will continue to increase as the emerging devices become faster.

In this paper, we propose DCS, a novel device-centric server architecture to fully exploit the potential of the emerging devices so that the server performance nicely scales with the performance of the devices. The key idea of DCS is to orchestrate the devices to directly communicate with each other while selectively bypassing the host. The host becomes responsible for only few device-related operations (e.g., filesystem lookup). In this way, DCS achieves high I/O performance by direct inter-device communications and high computation performance by fully utilizing the host-side resources. To implement DCS, we introduce DCS Engine, a custom hardware device to orchestrate devices via standard I/O protocols (i.e., PCIe and NVMe), along with its device driver and user-level library. We show that our FPGA-based DCS prototype significantly improves the performance of emerging server workloads and the architecture will nicely scale with the performance of the devices.

References

[1]

A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion, "IX: A Protected Dataplane Operating System for High Throughput and Low Latency," in Proc. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.

Digital Library

[2]

S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe, "Arrakis: The Operating System is the Control Plane," in Proc. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.

Digital Library

[3]

E. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park, "mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems," in Proc. 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014.

Digital Library

[4]

A. M. Caulfield, T. I. Mollov, L. A. Eisner, A. De, J. Coburn, and S. Swanson, "Providing Safe, User Space Access to Fast, Solid State Disks," in Proc. 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012.

Digital Library

[5]

A. M. Caulfield and S. Swanson, "QuickSAN: A Storage Area Network for Fast, Distributed, Solid State Disks," in Proc. 40th International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[6]

NVIDIA Corporation, "NVIDIA GPUDirect." https://developer.nvidia.com/gpudirect.

[7]

K. Jang, S. Han, S. Han, S. Moon, and K. Park, "SSLShader: Cheap SSL Acceleration with Commodity Processors," in Proc. 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011.

Digital Library

[8]

S. Neuwirth, D. Frey, M. Nuessle, and U. Bruening, "Scalable Communication Architecture for Network-Attached Accelerators," in Proc. 21st IEEE Symposium on High Performance Computer Architecture (HPCA), 2015.

[9]

"NVIDIA Launches World's First High-Speed GPU Interconnect, Helping Pave the Way to Exascale Computing." http://nvidianews.nvidia.com/news/nvidia-launches-world-s-first-high-speed-gpu-interconnect-helping-pave-the-way-to-exascale-computing, 2014.

[10]

Advanced Micro Devices, Inc., "DirectGMA on AMD's FirePro GPUs." http://developer.amd.com/community/blog/2014/09/08/amd-firepro-gpus-directgma/.

[11]

R. Wipfel, D. Atkisson, and V. Brisebois, "RDMA GPU Direct for ioMemory." http://on-demand.gputechconf.com/gtc/2014/presentations/S4265-rdma-gpu-direct-for-fusion-io-iodrive.pdf.

[12]

S. Kim, S. Huh, Y. Hu, X. Zhang, E. Witchel, A. Wated, and M. Silberstein, "GPUnet: Networking Abstractions for GPU Programs," in Proc. 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.

Digital Library

[13]

Mellanox Technologies, "Mellanox OFED GPUDirect RDMA." http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect.

[14]

G. Kyriazis, "Heterogeneous System Architecture: A Technical Review." http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf, 2012.

[15]

A. Branover, D. Foley, and M. Steinman, "Amd fusion apu: Llano," Ieee Micro, no. 2, pp. 28--37, 2012.

Digital Library

[16]

S.-W. Jun, M. Liu, S. Lee, J. Hicks, J. Ankcorn, M. King, S. Xu, et al., "Bluedbm: an appliance for big data analytics," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 1--13, ACM, 2015.

Digital Library

[17]

A. M. Caulfield, A. De, J. Coburn, T. I. Mollov, R. K. Gupta, and S. Swanson, "Moneta: A High-performance Storage Array Architecture for Next-generation, Non-volatile Memories," in Proc. 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010.

Digital Library

[18]

R. Huggahalli, R. Iyer, and S. Tetrick, "Direct Cache Access for High Bandwidth Network I/O," in Proc. 32nd International Symposium on Computer Architecture (ISCA), 2005.

Digital Library

[19]

D. Tang, Y. Bao, W. Hu, and M. Chen, "DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU Data for Improving I/O Performance," in Proc. 16th International Symposium on High Performance Computer Architecture (HPCA), 2010.

[20]

NVM Express, Inc., "NVM Express." http://www.nvmexpress.org/.

[21]

J. W. Lockwood, N. McKeown, G. Watson, G. Gibb, P. Hartke, J. Naous, R. Raghuraman, and J. Luo, "NetFPGA - An Open Platform for Gigabit-rate Network Switching and Routing," in Proc. 2007 IEEE International Conference on Microelectronic Systems Education (MSE), 2007.

Digital Library

[22]

"Hp moonshot system." http://www8.hp.com/us/en/products/servers/moonshot/index.html?jumpid=reg_r1002_usen_c-001_title_r0001.

[23]

M. Factor, K. Meth, D. Naor, O. Rodeh, and J. Satran, "Object storage: The future building block for storage systems," Local to Global Data Interoperability-Challenges and Technologies, IEEE, 2005.

Digital Library

[24]

M. Mesnier, G. Ganger, and E. Riedel, "Object-based storage," Communications Magazine, IEEE, vol. 41, no. 8, 2003.

Digital Library

[25]

"Openstack swift." http://swift.openstack.org.

[26]

J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," in Proc. 6th Symposium on Operating Systems Design & Implementations (OSDI), 2004.

Digital Library

Cited By

Xu JQiu YChen YWang YLin WLin YZhao SLiu YWang YChen W(2024)Performance Characterization of SmartNIC NVMe-over-Fabrics Target OffloadingProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689154(14-24)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689154
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Qureshi ZMailthody VGelado IMin SMasood APark JXiong JNewburn CVainbrand DChung IGarland MDally WHwu WAamodt TJerger NSwift M(2023)GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System ArchitectureProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575748(325-339)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575748
Show More Cited By

Index Terms

DCS: a fast and scalable device-centric server architecture

Recommendations

DCS-ctrl: a fast and flexible device-control mechanism for device-centric server architecture
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

Modern high-performance servers leverage a large number of emerging peripheral devices (e.g., data processing accelerators, non-volatile memory storage, high-bandwidth network cards) to meet ever-increasing performance demands of server applications. ...
Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers.

We propose Lynx, an accelerator-centric ...
JOM: A Joint Operation Mechanism for NAND Flash Memory
Special Issue on ESWEEK2015 and Regular Papers

In the storage systems of NAND flash memory, an intermediate software called a Flash Translation Layer (FTL) is adopted to hide the characteristics of NAND flash memory and provide efficient management for NAND flash memory. Current flash translation ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

December 2015

787 pages

ISBN:9781450340342

DOI:10.1145/2830772

General Chair:
Milos Prvulovic
Georgia Tech

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE Computer Society TC-uARCH
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO-48

Sponsor:

SIGMICRO

MICRO-48: The 48th Annual IEEE/ACM International Symposium of Microarchitecture

December 5 - 9, 2015

Waikiki, Hawaii

Acceptance Rates

MICRO-48 Paper Acceptance Rate 61 of 283 submissions, 22%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,033
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu JQiu YChen YWang YLin WLin YZhao SLiu YWang YChen W(2024)Performance Characterization of SmartNIC NVMe-over-Fabrics Target OffloadingProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689154(14-24)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689154
Wang SXu HMamandipoor AMahapatra RAhn BGhodrati SKailas KAlian MEsmaeilzadeh H(2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00083
Qureshi ZMailthody VGelado IMin SMasood APark JXiong JNewburn CVainbrand DChung IGarland MDally WHwu WAamodt TJerger NSwift M(2023)GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System ArchitectureProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575748(325-339)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575748
Chen YXu JWei CWang YYuan XZhang YYu XChen YWang ZHe SChen W(2023)BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071029(1031-1044)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071029
Zhou ZYi SZhang J(2022)Survey on storage-accelerator data movementCCF Transactions on High Performance Computing10.1007/s42514-022-00112-0Online publication date: 21-Jul-2022
https://doi.org/10.1007/s42514-022-00112-0
Jung JChoi JHan HShilane PWon Y(2021)LibpublProceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3465332.3470874(64-70)Online publication date: 27-Jul-2021
https://dl.acm.org/doi/10.1145/3465332.3470874
Park PJeong HKim J(2020)TrainBox: An Extreme-Scale Neural Network Training Server Architecture by Systematically Balancing Operations2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00072(825-838)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00072
Zhang JPark GDonofrio DShalf JJung M(2020)DRAM-Less: Hardware Acceleration of Data Processing with New Memory2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA47549.2020.00032(287-302)Online publication date: Feb-2020
https://doi.org/10.1109/HPCA47549.2020.00032
Ajdari MLee WPark PKim JKim J(2019)FIDRProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358303(239-252)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358303
Zhang JKwon MKim HKim HJung M(2019)FlashGPUProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317827(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317827
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents