short-paper

Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation

Author:

Nick BrownAuthors Info & Claims

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Pages 91 - 97

https://doi.org/10.1145/3543622.3573047

Published: 12 February 2023 Publication History

Get Access

Abstract

AMD Xilinx's new Versal Adaptive Compute Acceleration Platform (ACAP) is an FPGA architecture combining reconfigurable fabric with other on-chip hardened compute resources. AI engines are one of these and, by operating in a highly vectorized manner, they provide significant raw compute that is potentially beneficial for a range of workloads including HPC simulation. However, this technology is still early-on, and as yet unproven for accelerating HPC codes, with a lack of benchmarking and best practice.

This paper presents an experience report, exploring porting of the Piacsek and Williams (PW) advection scheme onto the Versal ACAP, using the chip's AI engines to accelerate the compute. A stencil-based algorithm, advection is commonplace in atmospheric modelling, including several Met Office codes who initially developed this scheme. Using this algorithm as a vehicle, we explore optimal approaches for structuring AI engine compute kernels and how best to interface the AI engines with programmable logic. Evaluating performance using a VCK5000 against non-AI engine FPGA configurations on the VCK5000 and Alveo U280, as well as a 24-core Xeon Platinum Cascade Lake CPU and Nvidia V100 GPU, we found that whilst the number of channels between the fabric and AI engines are a limitation, by leveraging the ACAP we can double performance compared to an Alveo U280.

References

[1]

Sagheer Ahmad, Sridhar Subramanian, Vamsi Boppana, Shankar Lakka, Fu-Hing Ho, Tomai Knopp, Juanjo Noguera, Gaurav Singh, and Ralph Wittig. 2019. Xilinx first 7nm device: Versal AI core (VC1902). In 2019 IEEE Hot Chips 31 Symposium (HCS). IEEE Computer Society, 1--28.

Crossref

Google Scholar

[2]

Nick Brown. 2021. Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 767--774.

Crossref

Google Scholar

[3]

Nick Brown et al. 2015. A highly scalable Met Office NERC Cloud model. In Proceedings of the 3rd International Conference on Exascale Applications and Software. University of Edinburgh, 132--137.

Google Scholar

[4]

Brian Gaide, Dinesh Gaitonde, Chirag Ravishankar, and Trevor Bauer. 2019. Xilinx adaptive compute acceleration platform: VersalTM architecture. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 84--93.

Digital Library

Google Scholar

[5]

David Lee, Gregory Allen, Matthew Cannon, Hunter Earnest, Paul Thelen, Nathaniel Dodds, Jeffrey McCasland, and Carol Chen. 2021. Preliminary Results from Heavy-Ion Irradiation of the Xilinx Versal ACAP. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).

Google Scholar

[6]

Steve A Piacsek and Gareth P Williams. 1970. Conservation properties of convection difference schemes. J. Comput. Phys., Vol. 6, 3 (1970), 392--405.

Crossref

Google Scholar

[7]

Xilinx. 2021. Versal ACAP AI Engine Architecture Manual (AM009). https://docs.xilinx.com/r/en-US/am009-versal-ai-engine

Google Scholar

[8]

Xilinx. 2022a. AI Engine Kernel Coding Best Practices Guide (UG1079)). https://docs.xilinx.com/r/en-US/ug1079-ai-engine-kernel-coding

Google Scholar

[9]

Xilinx. 2022b. Versal ACAP AI Engine Programming Environment User Guide (UG1076). https://docs.xilinx.com/r/en-US/ug1076-ai-engine-environment

Google Scholar

[10]

Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, and Dingwen Tao. 2022. H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture. arXiv preprint arXiv:2206.13734 (2022).

Google Scholar

Cited By

View all

Heinz CKalkhof TLavan YKoch A(2024)TaPaS Co-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00041(155-161)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00041
Taka EGourounas DGerstlauer AMarculescu DArora A(2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00015
Taka EArora AWu KMarculescu D(2023)MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00016(96-105)Online publication date: 12-Dec-2023
https://doi.org/10.1109/ICFPT59805.2023.00016
Show More Cited By

Index Terms

Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation

Recommendations

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have ...
Enabling FPGA and AI Engine Tasks in the HPX Programming Framework for Heterogeneous High-Performance Computing
Applied Reconfigurable Computing. Architectures, Tools, and Applications
Abstract
The increasing complexity of modern exascale computers, with a growing number of cores per node, poses a challenge to traditional programming models. To address this challenge, Asynchronous Many-Task (AMT) runtimes such as the C++-based HPX, ...
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms
Special Section on FPL 2013

One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 2023

283 pages

ISBN:9781450394178

DOI:10.1145/3543622

General Chair:
Paolo Ienne
EPFL, Switzerland
,
Program Chair:
Zhiru Zhang
Cornell University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

EPSRC
STFC

Conference

FPGA '23

Sponsor:

SIGDA

FPGA '23: The 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 12 - 14, 2023

CA, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)164
Downloads (Last 6 weeks)25

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Heinz CKalkhof TLavan YKoch A(2024)TaPaS Co-AIE: An Open-Source Framework for Streaming-Based Heterogeneous Acceleration Using AMD AI Engines2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00041(155-161)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00041
Taka EGourounas DGerstlauer AMarculescu DArora A(2024)Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs2024 IEEE 32nd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM60383.2024.00015(54-65)Online publication date: 5-May-2024
https://doi.org/10.1109/FCCM60383.2024.00015
Taka EArora AWu KMarculescu D(2023)MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00016(96-105)Online publication date: 12-Dec-2023
https://doi.org/10.1109/ICFPT59805.2023.00016
Yang ZZhuang JYin JYu CJones AZhou P(2023)AIM: Accelerating Arbitrary-Precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)10.1109/ICCAD57390.2023.10323754(1-9)Online publication date: 28-Oct-2023
https://doi.org/10.1109/ICCAD57390.2023.10323754
Zhang WLiu YZang TBao Z(undefined)EA4RCA: Efficient AIE accelerator design framework for regular Communication-Avoiding AlgorithmACM Transactions on Architecture and Code Optimization10.1145/3678010
https://dl.acm.org/doi/10.1145/3678010

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture

Enabling FPGA and AI Engine Tasks in the HPX Programming Framework for Heterogeneous High-Performance Computing

Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms