Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

A reconfigurable fabric for accelerating large-scale datacenter services

Published: 28 October 2016 Publication History

Abstract

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network.
We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers.

References

[1]
Altera. Nios II Processor Reference Handbook, 13.1.0 edition, 2014.
[2]
Altera. Stratix V Device Handbook, 14.01.10 edition, 2014.
[3]
Baxter, R., Booth, S., Bull, M., Cawood, G., Perry, J., Parsons, M., Simpson, A., Trew, A., Mccormick, A., Smart, G., Smart, R., Cantle, A., Chamberlain, R., Genest, G. Maxwell -- A 64 FPGA Supercomputer. Eng. Lett. 16 (2008), 426--433, 2008.
[4]
BEECube. BEE4 Hardware Platform, 1.0 edition, 2011.
[5]
Blott, M., Vissers, K. Dataflow architectures for 10Gbps line-rate key-value stores. In HotChips 2013 (August 2013).
[6]
Convey. The Convey HC-2 Computer, conv-12-030.2 edition, 2012.
[7]
Cray. Cray XD1 Datasheet, 1.3 edition, 2005.
[8]
Dennard, R., Rideout, V., Bassous, E., LeBlanc, A. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE J. Solid-State Circ. 9, 5 (Oct. 1974), 256--268.
[9]
Estlick, M., Leeser, M., Theiler, J., Szymanski, J.J. Algorithmic transformations in the implementation of K-means clustering on reconfigurable hardware. In Proceedings of the 2001 ACM/SIGDA Ninth International Symposium on Field Programmable Gate Arrays, FPGA'01 (New York, NY, USA, 2001). ACM.
[10]
Gens, F. Worldwide and Regional Public IT Cloud Services 2014--2018 Forecast (Oct. 2014).
[11]
George, A., Lam, H., Stitt, G. Novo-G: At the forefront of scalable reconfigurable supercomputing. Comput. Sci. Eng. 13, 1 (2011), 82--86.
[12]
Hussain, H.M., Benkrid, K., Erdogan, A.T., Seker, H. Highly parameterized K-means clustering on FPGAs: Comparative results with GPPs and GPUs. In Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs, RECONFIG'11 (Washington, DC, USA, 2011). IEEE Computer Society.
[13]
IBM. IBM PureData System for Analytics N2001, WAD12353-USEN-01 edition, 2013.
[14]
Lavasani, M., Angepat, H., Chiou, D. An FPGA-based in-line accelerator for memcached. Comput. Arch. Lett. PP, 99 (2013), 1--1.
[15]
Martin, A., Jamsek, D., Agarawal, K. FPGA-based application acceleration: Case study with GZIP compression/decompression streaming engine. In ICCAD Special Session 7C (November 2013).
[16]
Microsoft. How Microsoft Designs Its Cloud-Scale Servers, 2014.
[17]
Pell, O., Mencer, O. Surviving the end of frequency scaling with reconfigurable dataflow computing. SIGARCH Comput. Archit. News 39, 4 (Dec. 2011).
[18]
Showerman, M., Enos, J., Pant, A., Kindratenko, V., Steffen, C., Pennington, R., Hwu, W. QP: A Heterogeneous Multi-accelerator Cluster. 2009.
[19]
SRC. MAPstation Systems, 70000 AH edition, 2014.
[20]
Yan, J., Zhao, Z.-X., Xu, N.-Y., Jin, X., Zhang, L.-T., Hsu, F.-H. Efficient query processing for web search engine with FPGAs. In Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, FCCM'12 (Washington, DC, USA, 2012). IEEE Computer Society.

Cited By

View all
  • (2024)Towards Efficient Reconfiguration through Lightweight Input Inversion for MLC NVFPGAs2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546880(1-6)Online publication date: 25-Mar-2024
  • (2024)In-Network Address Caching for Virtual NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672213(735-749)Online publication date: 4-Aug-2024
  • (2024)A Survey on Scheduling Techniques in Computing and Network ConvergenceIEEE Communications Surveys & Tutorials10.1109/COMST.2023.332902726:1(160-195)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 59, Issue 11
November 2016
118 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3013530
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2016
Published in CACM Volume 59, Issue 11

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)320
  • Downloads (Last 6 weeks)26
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Efficient Reconfiguration through Lightweight Input Inversion for MLC NVFPGAs2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546880(1-6)Online publication date: 25-Mar-2024
  • (2024)In-Network Address Caching for Virtual NetworksProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672213(735-749)Online publication date: 4-Aug-2024
  • (2024)A Survey on Scheduling Techniques in Computing and Network ConvergenceIEEE Communications Surveys & Tutorials10.1109/COMST.2023.332902726:1(160-195)Online publication date: 1-Jan-2024
  • (2023)Leveraging Hardware Probes and Optimizations for Accelerating Fuzz Testing of Heterogeneous ApplicationsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616318(1101-1113)Online publication date: 30-Nov-2023
  • (2023)Reconfigurable Virtual Memory for FPGA-Driven I/OProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582048(556-571)Online publication date: 25-Mar-2023
  • (2022)An OpenMP Runtime for Transparent Work Sharing across Cache-Incoherent Heterogeneous NodesACM Transactions on Computer Systems10.1145/350522439:1-4(1-30)Online publication date: 5-Jul-2022
  • (2022)FlexDriver: a network driver for your acceleratorProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507776(1115-1129)Online publication date: 28-Feb-2022
  • (2022)Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator CompilationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311685933:6(1478-1490)Online publication date: 1-Jun-2022
  • (2022)Adaptive Mode Transformation for Wear Leveling in Nonvolatile FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319768541:11(3591-3601)Online publication date: 1-Nov-2022
  • (2022)Mapping Boolean Functions onto Lookup-Tables on FPGAs2022 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF55975.2022.10013797(508-512)Online publication date: 20-Dec-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media