Nothing Special   »   [go: up one dir, main page]

skip to main content
review-article

Automated parallel execution of distributed task graphs with FPGA clusters

Published: 18 October 2024 Publication History

Abstract

Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance Computing (HPC) field, because their reconfigurability enables very fine-grained optimizations with low energy cost. However, the different characteristics, architectures, and network topologies of the clusters have hindered the use of FPGAs at a large scale. In this work, we present an evolution of OmpSs@FPGA, a high-level task-based programming model and extension to OmpSs-2, that aims at unifying all FPGA clusters by using a message-passing interface that is compatible with FPGA accelerators. These accelerators are programmed with C/C++ pragmas, and synthesized with High-Level Synthesis tools. The new framework includes a custom protocol to exchange messages between FPGAs, agnostic of the architecture and network type. On top of that, we present a new communication paradigm called Implicit Message Passing (IMP), where the user does not need to call any message-passing API. Instead, the runtime automatically infers data movement between nodes. We test classic message passing and IMP with three benchmarks on two different FPGA clusters. One is cloudFPGA, a disaggregated platform with AMD FPGAs that are only connected to the network through UDP/TCP/IP. The other is ESSPER, composed of CPU-attached Intel FPGAs that have a private network at the ethernet level. In both cases, we demonstrate that IMP with OmpSs@FPGA can increase the productivity of FPGA programmers at a large scale thanks to simplifying communication between nodes, without limiting the scalability of applications. We implement the N-body, Heat simulation and Cholesky decomposition benchmarks, and show that FPGA clusters get 2.6x and 2.4x better performance per watt than a CPU-only supercomputer for N-body and Heat.

Highlights

High-level task-based programming model for FPGA clusters with MPI-like communication.
High performance computing applications can be easily adapted to FPGA clusters.
Automatic MPI communication inferred by the runtime, users do not write MPI API.
Easily portable code between AMD (Xilinx) and Intel FPGAs, applications tested on both.
N-body, Heat, and Cholesky implementations on cloudFPGA and ESSPER, written in C.

References

[1]
Cong J., Liu B., Neuendorffer S., Noguera J., Vissers K., Zhang Z., High-level synthesis for FPGAs: From prototyping to deployment, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 30 (4) (2011) 473–491.
[2]
T.S. Czajkowski, U. Aydonat, D. Denisenko, J. Freeman, M. Kinsner, D. Neto, J. Wong, P. Yiannacouras, D.P. Singh, From opencl to high-performance hardware on FPGAS, in: 22nd International Conference on Field Programmable Logic and Applications, FPL, 2012, pp. 531–534.
[3]
J.M. de Haro, R. Cano, C. Álvarez, D. Jiménez-González, X. Martorell, E. Ayguadé, J. Labarta, F. Abel, B. Ringlein, B. Weiss, OmpSs@cloudFPGA: An FPGA Task-Based Programming Model with Message Passing, in: 2022 IEEE International Parallel and Distributed Processing Symposium, IPDPS, 2022, pp. 828–838.
[4]
Samayoa W.F., Crespo M.L., Cicuttin A., Carrato S., A survey on FPGA-based heterogeneous clusters architectures, IEEE Access 11 (2023) 67679–67706.
[5]
Bobda C., Mbongue J.M., Chow P., Ewais M., Tarafdar N., Vega J.C., Eguro K., Koch D., Handagala S., Leeser M., Herbordt M., Shahzad H., Hofste P., Ringlein B., Szefer J., Sanaullah A., Tessier R., The future of FPGA acceleration in datacenters and the cloud, ACM Trans. Reconfigurable Technol. Syst. 15 (3) (2022).
[6]
Message Passing Interface Forum C., MPI: A message-passing interface standard version 4.0, 2021, URL https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf.
[7]
F. Abel, J. Weerasinghe, C. Hagleitner, B. Weiss, S. Paredes, An FPGA Platform for Hyperscalers, in: 2017 IEEE 25th Annual Symposium on High-Performance Interconnects, HOTI, 2017, pp. 29–32.
[8]
Sano K., Koshiba A., Miyajima T., Ueno T., ESSPER: Elastic and scalable FPGA-cluster system for high-performance reconfigurable computing with supercomputer Fugaku, in: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, in: HPC Asia ’23, Association for Computing Machinery, 2023, pp. 140–150.
[9]
M. Sato, Y. Ishikawa, H. Tomita, Y. Kodama, T. Odajima, M. Tsuji, H. Yashiro, M. Aoki, N. Shida, I. Miyoshi, K. Hirai, A. Furuya, A. Asato, K. Morita, T. Shimizu, Co-Design for A64FX Manycore Processor and ”Fugaku”, in: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–15.
[10]
Ompss-2 documentation and code, 2023, URL https://pm.bsc.es/ompss-2.
[11]
de Haro J.M., Bosch J., Filgueras A., Vidal M., Jiménez-González D., Álvarez C., Martorell X., Ayguadé E., Labarta J., OmpSs@FPGA framework for high performance FPGA computing, IEEE Trans. Comput. 70 (12) (2021) 2029–2042.
[12]
OmpSs@FPGA framework code, 2023, URL https://github.com/bsc-pm-ompss-at-fpga.
[13]
B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner, D. Fey, System Architecture for Network-Attached FPGAs in the Cloud using Partial Reconfiguration, in: 2019 29th International Conference on Field Programmable Logic and Applications, FPL, 2019, pp. 293–300.
[14]
B. Ringlein, F. Abel, D. Diamantopoulos, B. Weiss, C. Hagleitner, M. Reichenbach, D. Fey, A Case for Function-as-a-Service with Disaggregated FPGAs, in: 2021 IEEE 14th International Conference on Cloud Computing, CLOUD, 2021, pp. 333–344.
[18]
Basic linear algebra subprograms, 2023, URL https://www.netlib.org/blas.
[19]
Linear algebra package, 2023, URL https://netlib.org/lapack.
[20]
H. Shahzad, A. Sanaullah, M. Herbordt, Survey and Future Trends for FPGA Cloud Architectures, in: 2021 IEEE High Performance Extreme Computing Conference, HPEC, 2021, pp. 1–10.
[21]
Putnam A., Caulfield A.M., Chung E.S., Chiou D., Constantinides K., Demme J., Esmaeilzadeh H., Fowers J., Gopal G.P., Gray J., Haselman M., Hauck S., Heil S., Hormati A., Kim J.-Y., Lanka S., Larus J., Peterson E., Pope S., Smith A., Thong J., Xiao P.Y., Burger D., A reconfigurable fabric for accelerating large-scale datacenter services, in: Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA ’14, IEEE Press, 2014, pp. 13–24.
[22]
Cock D., Ramdas A., Schwyn D., Giardino M., Turowski A., He Z., Hossle N., Korolija D., Licciardello M., Martsenko K., Achermann R., Alonso G., Roscoe T., Enzian: An open, general, CPU/FPGA platform for systems software research, in: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’22, Association for Computing Machinery, New York, NY, USA, 2022, pp. 434–451.
[24]
Paderborn center for parallel computing, Noctua 2, 2023, URL https://pc2.uni-paderborn.de/hpc-services/available-systems/noctua2.
[25]
A.D. George, M.C. Herbordt, H. Lam, A.G. Lawande, J. Sheng, C. Yang, Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects, in: 2016 IEEE High Performance Extreme Computing Conference, HPEC, 2016, pp. 1–7.
[26]
S. Christgau, M. Knaust, T. Steinke, A First Step towards Support for MPI Partitioned Communication on SYCL-programmed FPGAs, in: 2022 IEEE/ACM International Workshop on Heterogeneous High-Performance Reconfigurable Computing, H2RC, 2022, pp. 9–17.
[27]
B. Ringlein, F. Abel, A. Ditter, B. Weiss, C. Hagleitner, D. Fey, ZRLMPI: A Unified Programming Model for Reconfigurable Heterogeneous Computing Clusters, in: 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM, 2020, pp. 220–220.
[28]
Contini N., Ramesh B., Kandadi Suresh K., Tran T., Michalowicz B., Abduljabbar M., Subramoni H., Panda D., Enabling reconfigurable HPC through MPI-based inter-fpga communication, in: Proceedings of the 37th International Conference on Supercomputing, ICS ’23, Association for Computing Machinery, New York, NY, USA, 2023, pp. 477–487.
[29]
De Matteis T., de Fine Licht J., Beránek J., Hoefler T., Streaming message interface: High-performance distributed memory programming on reconfigurable hardware, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 1–33.
[30]
Sala K., Teruel X., Perez J.M., Peña A.J., Beltran V., Labarta J., Integrating blocking and non-blocking MPI primitives with task-based programming models, Parallel Comput. 85 (2019) 153–166.
[31]
Yang D., Peterson G.D., Li H., Compressed sensing and Cholesky decomposition on FPGAs and GPUs, Parallel Comput. 38 (8) (2012) 421–437.
[32]
E.B. Tavakoli, M. Riera, M.H. Quraishi, F. Ren, FSCHOL: An OpenCL-based HPC Framework for Accelerating Sparse Cholesky Factorization on FPGAs, in: 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD, 2021, pp. 209–220.
[33]
Menzel J., Plessl C., Kenter T., The strong scaling advantage of FPGAs in HPC for N-body simulations, ACM Trans. Reconfigurable Technol. Syst. 15 (1) (2021).

Index Terms

  1. Automated parallel execution of distributed task graphs with FPGA clusters
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Future Generation Computer Systems
          Future Generation Computer Systems  Volume 160, Issue C
          Nov 2024
          966 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 18 October 2024

          Author Tags

          1. FPGA
          2. MPI
          3. Task graphs
          4. Heterogeneous computing
          5. High performance computing
          6. Programming models
          7. Distributed computing

          Qualifiers

          • Review-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 18 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media