Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3195638.3195647acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

A cloud-scale acceleration architecture

Published: 15 October 2016 Publication History

Abstract

Hyperscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability). In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications. This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers. We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and network acceleration (encryption of data in transit at high-speeds). This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication. By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network. Additionally, the scale of direct inter-FPGA messaging is much larger. The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively. The Configurable Cloud architecture has been deployed at hyperscale in Microsoft's production datacenters worldwide.

References

[1]
M. Staveley, "Applications that scale using GPU Compute," in AzureCon 2015, August 2015.
[2]
J. Barr, "Build 3D Streaming Applications with EC2's New G2 Instance Type," Nov 2013.
[3]
J. Ouyang, S. Lin, W. Qi, Y. Wang, B. Yu, and S. Jiang, "SDA: Software-Defined Accelerator for Large-Scale DNN Systems," in HotChips 2014, August 2014.
[4]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. R. Larus, E. Peterson, G. Prashanth, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services," in International Symposium on Computer Architecture (ISCA), 2014.
[5]
Mellanox, "ConnectX-4 Lx EN Programmable Adapter Card. Rev. 1.1," 2015.
[6]
S. Gulley and V. Gopal, "Haswell Cryptographic Performance," July 2013. Available at http://www.intel.com/content/www/us/en/communications/haswell-cryptographic-performance-paper.html.
[7]
S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala, "An fpga memcached appliance," in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, (New York, NY, USA), pp. 245--254, ACM, 2013.
[8]
IEEE, IEEE 802.1Qbb - Priority-based Flow Control, June 2011 ed., 2011.
[9]
Y. Zhu, H. Eran, D. Firestone, C. Guo, M. Lipshteyn, Y. Liron, J. Padhye, S. Raindel, M. H. Yahia, and M. Zhang, "Congestion control for large-scale rdma deployments," in Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, (New York, NY, USA), pp. 523--536, ACM, 2015.
[10]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler, "Apache hadoop yarn: Yet another resource negotiator," in Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, (New York, NY, USA), pp. 5:1--5:16, ACM, 2013.
[11]
G. Gibb, J. Lockwood, J. Naous, P. Hartke, and N. McKeown, "NetFPGA-An Open Platform for Teaching How to Build Gigabit-Rate Network Switches and Routers," in IEEE Transactions on Education, 2008.
[12]
K. Lim, D. Meisner, A. G. Saidi, P. Ranganathan, and T. F. Wenisch, "Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached," SIGARCH Comput. Archit. News, vol. 41, pp. 36--47, June 2013.
[13]
M. lavasani, H. Angepat, and D. Chiou, "An fpga-based in-line accelerator for memcached," Computer Architecture Letters, vol. PP, no. 99, pp. 1--1, 2013.
[14]
M. Blott and K. Vissers, "Dataflow architectures for 10gbps line-rate key-value-stores," in HotChips 2013, August 2013.
[15]
E. S. Fukuda, H. Inoue, T. Takenaka, D. Kim, T. Sadashisa, T. Asai, and M. Motomura, "Caching mecached at reconfigurable network interface," in Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014.
[16]
A. G. Lawande, A. D. George, and H. Lam, "Novo-g: a multidimensional torus-based reconfigurable cluster for molecular dynamics," Concurrency and Computation: Practice and Experience, pp. n/a-n/a, 2015. cpe.3565.
[17]
Cray, Cray XD1 Datasheet, 1.3 ed., 2005.
[18]
R. Baxter, S. Booth, M. Bull, G. Cawood, J. Perry, M. Parsons, A. Simpson, A. Trew, A. Mccormick, G. Smart, R. Smart, A. Cantle, R. Chamberlain, and G. Genest, "Maxwell - a 64 FPGA Supercomputer," Engineering Letters, vol. 16, pp. 426--433, 2008.
[19]
A. George, H. Lam, and G. Stitt, "Novo-g: At the forefront of scalable reconfigurable supercomputing," Computing in Science Engineering, vol. 13, no. 1, pp. 82--86, 2011.
[20]
O. Pell and O. Mencer, "Surviving the end of frequency scaling with reconfigurable dataflow computing," SIGARCH Comput. Archit. News, vol. 39, pp. 60--65, Dec. 2011.
[21]
Convey, The Convey HC-2 Computer, conv-12--030.2 ed., 2012.
[22]
BEECube, BEE4 Hardware Platform, 1.0 ed., 2011.
[23]
SRC, MAPstation Systems, 70000 AH ed., 2014.
[24]
M. Showerman, J. Enos, A. Pant, V. Kindratenko, C. Steffen, R. Pennington, and W. Hwu, "Qp: A heterogeneous multi-accelerator cluster," 2009.
[25]
J. Stuecheli, "Next Generation POWER microprocessor," in HotChips 2013, August 2013.
[26]
J. Stuecheli, B. Blaner, C. Johns, and M. Siegel, "Capi: A coherent accelerator processor interface," IBM Journal of Research and Development, vol. 59, no. 1, pp. 7--1, 2015.
[27]
M. J. Jaspers, Acceleration of read alignment with coherent attached FPGA coprocessors. PhD thesis, TU Delft, Delft University of Technology, 2015.
[28]
C.-C. Chung, C.-K. Liu, and D.-H. Lee, "Fpga-based accelerator platform for big data matrix processing," in Electron Devices and Solid-State Circuits (EDSSC), 2015 IEEE International Conference on, pp. 221--224, IEEE, 2015.
[29]
P. Gupta, "Xeon+fpga platform for the data center," 2015.
[30]
L. Ling, N. Oliver, C. Bhushan, W. Qigang, A. Chen, S. Wenbo, Y. Zhihong, A. Sheiman, I. McCallum, J. Grecco, H. Mitchel, L. Dong, and P. Gupta, "High-performance, Energy-efficient Platforms Using In-Socket FPGA Accelerators," in FPGA'09: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, (New York, NY, USA), pp. 261--264, ACM, 2009.
[31]
Intel, "An Introduction to the Intel Quickpath Interconnect," 2009.
[32]
D. Slogsnat, A. Giese, M. Nüssle, and U. Brüning, "An open-source hypertransport core," ACM Trans. Reconfigurable Technol. Syst., vol. 1, pp. 14:1--14:21, Sept. 2008.
[33]
DRC, DRC Accelium Coprocessors Datasheet, ds ac 7--08 ed., 2014.
[34]
"NVIDIA NVLink High-Speed Interconnect: Application Performance," Nov. 2014.
[35]
J. Hauswald, Y. Kang, M. A. Laurenzano, Q. Chen, C. Li, T. Mudge, R. G. Dreslinski, J. Mars, and L. Tang, "Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp. 27--40, ACM, 2015.
[36]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in ACM SIGPLAN Notices, vol. 49, pp. 269--284, ACM, 2014.
[37]
S. A. Fahmy and K. Vipin, "A case for fpga accelerators in the cloud," Poster at SoCC 2014.

Cited By

View all
  • (2023)Fabric-Centric ComputingProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595907(118-126)Online publication date: 22-Jun-2023
  • (2023)Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data CentersProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595897(94-102)Online publication date: 22-Jun-2023
  • (2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture
October 2016
816 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Qualifiers

  • Research-article

Conference

MICRO-49
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Fabric-Centric ComputingProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595907(118-126)Online publication date: 22-Jun-2023
  • (2023)Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data CentersProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595897(94-102)Online publication date: 22-Jun-2023
  • (2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
  • (2023)FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular DynamicsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607100(1-14)Online publication date: 12-Nov-2023
  • (2023)Data Processing with FPGAs on Modern ArchitecturesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589410(77-82)Online publication date: 4-Jun-2023
  • (2022)hXDPCommunications of the ACM10.1145/354366865:8(92-100)Online publication date: 21-Jul-2022
  • (2022)Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350769815:4(1-42)Online publication date: 8-Aug-2022
  • (2022)The Future of FPGA Acceleration in Datacenters and the CloudACM Transactions on Reconfigurable Technology and Systems10.1145/350671315:3(1-42)Online publication date: 4-Feb-2022
  • (2022)FlexDriver: a network driver for your acceleratorProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507776(1115-1129)Online publication date: 28-Feb-2022
  • (2022)Enzian: an open, general, CPU/FPGA platform for systems software researchProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507742(434-451)Online publication date: 28-Feb-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media