research-article

A cloud-scale acceleration architecture

Authors:

Adrian M. Caulfield,

Michael Haselman,

Todd Massengill,

Kalin Ovtcharov,

Michael Papamichael,

Doug BurgerAuthors Info & Claims

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

Article No.: 7, Pages 1 - 13

Published: 15 October 2016 Publication History

Abstract

Hyperscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability). In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications. This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers. We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and network acceleration (encryption of data in transit at high-speeds). This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication. By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network. Additionally, the scale of direct inter-FPGA messaging is much larger. The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively. The Configurable Cloud architecture has been deployed at hyperscale in Microsoft's production datacenters worldwide.

References

[1]

M. Staveley, "Applications that scale using GPU Compute," in AzureCon 2015, August 2015.

[2]

J. Barr, "Build 3D Streaming Applications with EC2's New G2 Instance Type," Nov 2013.

[3]

J. Ouyang, S. Lin, W. Qi, Y. Wang, B. Yu, and S. Jiang, "SDA: Software-Defined Accelerator for Large-Scale DNN Systems," in HotChips 2014, August 2014.

[4]

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. R. Larus, E. Peterson, G. Prashanth, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services," in International Symposium on Computer Architecture (ISCA), 2014.

Digital Library

[5]

Mellanox, "ConnectX-4 Lx EN Programmable Adapter Card. Rev. 1.1," 2015.

[6]

S. Gulley and V. Gopal, "Haswell Cryptographic Performance," July 2013. Available at http://www.intel.com/content/www/us/en/communications/haswell-cryptographic-performance-paper.html.

[7]

S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala, "An fpga memcached appliance," in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, (New York, NY, USA), pp. 245--254, ACM, 2013.

Digital Library

[8]

IEEE, IEEE 802.1Qbb - Priority-based Flow Control, June 2011 ed., 2011.

[9]

Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, "Congestion control for large-scale rdma deployments," in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, (New York, NY, USA), pp. 523--536, ACM, 2015.

Digital Library

[10]

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler, "Apache hadoop yarn: Yet another resource negotiator," in Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, (New York, NY, USA), pp. 5:1--5:16, ACM, 2013.

Digital Library

[11]

G. Gibb, J. Lockwood, J. Naous, P. Hartke, and N. McKeown, "NetFPGA-An Open Platform for Teaching How to Build Gigabit-Rate Network Switches and Routers," in IEEE Transactions on Education, 2008.

Digital Library

[12]

K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," SIGARCH Comput. Archit. News, vol. 41, pp. 36--47, June 2013.

Digital Library

[13]

M. lavasani, H. Angepat, and D. Chiou, "An fpga-based in-line accelerator for memcached," Computer Architecture Letters, vol. PP, no. 99, pp. 1--1, 2013.

Digital Library

[14]

M. Blott and K. Vissers, "Dataflow architectures for 10gbps line-rate key-value-stores," in HotChips 2013, August 2013.

[15]

E. S. Fukuda, H. Inoue, T. Takenaka, D. Kim, T. Sadashisa, T. Asai, and M. Motomura, "Caching mecached at reconfigurable network interface," in Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014.

[16]

A. G. Lawande, A. D. George, and H. Lam, "Novo-g: a multidimensional torus-based reconfigurable cluster for molecular dynamics," Concurrency and Computation: Practice and Experience, pp. n/a-n/a, 2015. cpe.3565.

[17]

Cray, Cray XD1 Datasheet, 1.3 ed., 2005.

[18]

R. Baxter, S. Booth, M. Bull, G. Cawood, J. Perry, M. Parsons, A. Simpson, A. Trew, A. Mccormick, G. Smart, R. Smart, A. Cantle, R. Chamberlain, and G. Genest, "Maxwell - a 64 FPGA Supercomputer," Engineering Letters, vol. 16, pp. 426--433, 2008.

[19]

A. George, H. Lam, and G. Stitt, "Novo-g: At the forefront of scalable reconfigurable supercomputing," Computing in Science Engineering, vol. 13, no. 1, pp. 82--86, 2011.

Digital Library

[20]

O. Pell and O. Mencer, "Surviving the end of frequency scaling with reconfigurable dataflow computing," SIGARCH Comput. Archit. News, vol. 39, pp. 60--65, Dec. 2011.

Digital Library

[21]

Convey, The Convey HC-2 Computer, conv-12--030.2 ed., 2012.

[22]

BEECube, BEE4 Hardware Platform, 1.0 ed., 2011.

[23]

SRC, MAPstation Systems, 70000 AH ed., 2014.

[24]

M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, and W. Hwu, "Qp: A heterogeneous multi-accelerator cluster," 2009.

[25]

J. Stuecheli, "Next Generation POWER microprocessor," in HotChips 2013, August 2013.

[26]

J. Stuecheli, B. Blaner, C. Johns, and M. Siegel, "Capi: A coherent accelerator processor interface," IBM Journal of Research and Development, vol. 59, no. 1, pp. 7--1, 2015.

Digital Library

[27]

M. J. Jaspers, Acceleration of read alignment with coherent attached FPGA coprocessors. PhD thesis, TU Delft, Delft University of Technology, 2015.

[28]

C.-C. Chung, C.-K. Liu, and D.-H. Lee, "Fpga-based accelerator platform for big data matrix processing," in Electron Devices and Solid-State Circuits (EDSSC), 2015 IEEE International Conference on, pp. 221--224, IEEE, 2015.

[29]

P. Gupta, "Xeon+fpga platform for the data center," 2015.

[30]

L. Ling, N. Oliver, C. Bhushan, W. Qigang, A. Chen, S. Wenbo, Y. Zhihong, A. Sheiman, I. McCallum, J. Grecco, H. Mitchel, L. Dong, and P. Gupta, "High-performance, Energy-efficient Platforms Using In-Socket FPGA Accelerators," in FPGA'09: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, (New York, NY, USA), pp. 261--264, ACM, 2009.

Digital Library

[31]

Intel, "An Introduction to the Intel Quickpath Interconnect," 2009.

[32]

D. Slogsnat, A. Giese, M. Nüssle, and U. Brüning, "An open-source hypertransport core," ACM Trans. Reconfigurable Technol. Syst., vol. 1, pp. 14:1--14:21, Sept. 2008.

Digital Library

[33]

DRC, DRC Accelium Coprocessors Datasheet, ds ac 7--08 ed., 2014.

[34]

"NVIDIA NVLink High-Speed Interconnect: Application Performance," Nov. 2014.

[35]

J. Hauswald, Y. Kang, M. A. Laurenzano, Q. Chen, C. Li, T. Mudge, R. G. Dreslinski, J. Mars, and L. Tang, "Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 27--40, ACM, 2015.

Digital Library

[36]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ACM SIGPLAN Notices, vol. 49, pp. 269--284, ACM, 2014.

Digital Library

[37]

S. A. Fahmy and K. Vipin, "A case for fpga accelerators in the cloud," Poster at SoCC 2014.

Cited By

Liu MBaumann ACrooks NSchwarzkopf M(2023)Fabric-Centric ComputingProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595907(118-126)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595907
Hu CWang CWang SSun NBao YZhao JKashyap SZuo PChen XXu LZhang QFeng HShan YBaumann ACrooks NSchwarzkopf M(2023)Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data CentersProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595897(94-102)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595897
Swamy TZulfiqar ANardi LShahbaz MOlukotun KAamodt TJerger NSwift M(2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582022
Show More Cited By

Recommendations

Cloud architecture: a preliminary look
iiWAS '11: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services

Cloud computing has started taking root. Many vendors provide Infrastructure as a Service (IaaS), Software as a Service (SaaS), and Platform as a Service (PaaS). SaaS and PaaS are provided on top of an IaaS infrastructure. Different vendors have ...
OpenGL application live migration with GPU acceleration in personal cloud
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Live migration of virtual machine (VM) across physical machines with limited downtime provides many new benefits for users having multiple devices within a personal cloud such as resource sharing and composition. Meanwhile, there are many graphics ...
Cloud architecture: a preliminary look
MoMM '11: Proceedings of the 9th International Conference on Advances in Mobile Computing and Multimedia

Cloud computing has started taking root. Many vendors provide Infrastructure as a Service (IaaS), Software as a Service (SaaS), and Platform as a Service (PaaS). SaaS and PaaS are provided on top of an IaaS infrastructure. Different vendors have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

October 2016

816 pages

General Chairs:
Wei-Chung Hsu
NTU, Taiwan
,
Chia-Lin Yang
NTU, Taiwan
,
Program Chairs:
Mikko Lipasti
Univ. Wisconsin
,
Hsien-Hsin Lee
TSMC, Taiwan

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Qualifiers

Research-article

Conference

MICRO-49

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

October 15 - 19, 2016

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
537
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu MBaumann ACrooks NSchwarzkopf M(2023)Fabric-Centric ComputingProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595907(118-126)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595907
Hu CWang CWang SSun NBao YZhao JKashyap SZuo PChen XXu LZhang QFeng HShan YBaumann ACrooks NSchwarzkopf M(2023)Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data CentersProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595897(94-102)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595897
Swamy TZulfiqar ANardi LShahbaz MOlukotun KAamodt TJerger NSwift M(2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582022
Wu CGeng TGuo ABandara SHaghi PLiu CLi AHerbordt MMohror KArnold DBadia R(2023)FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular DynamicsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607100(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607100
Jiang WKorolija DAlonso GDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Data Processing with FPGAs on Modern ArchitecturesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589410(77-82)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589410
Brunella MBelocchi GBonola MPontarelli SSiracusano GBianchi GCammarano APalumbo APetrucci LBifulco R(2022)hXDPCommunications of the ACM10.1145/354366865:8(92-100)Online publication date: 21-Jul-2022
https://dl.acm.org/doi/10.1145/3543668
Ebcioglu KSan I(2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
https://dl.acm.org/doi/10.1145/3507698
Bobda CMbongue JChow PEwais MTarafdar NVega JEguro KKoch DHandagala SLeeser MHerbordt MShahzad HHofste PRinglein BSzefer JSanaullah ATessier R(2022)The Future of FPGA Acceleration in Datacenters and the CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350671315:3(1-42)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3506713
Eran HFudim MMalka GShalom GCohen NHermony ALevi DLiss LSilberstein MFalsafi BFerdman MLu SWenisch T(2022)FlexDriver: a network driver for your acceleratorProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507776(1115-1129)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507776
Cock DRamdas ASchwyn DGiardino MTurowski AHe ZHossle NKorolija DLicciardello MMartsenko KAchermann RAlonso GRoscoe TFalsafi BFerdman MLu SWenisch T(2022)Enzian: an open, general, CPU/FPGA platform for systems software researchProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507742(434-451)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3503222.3507742
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents