research-article

Exploiting Parallelism on GPUs and FPGAs with OmpSs

Authors:

Jaume Bosch,

Antonio Filgueras,

Miquel Vidal,

Daniel Jimenez-Gonzalez,

Carlos Alvarez,

Xavier MartorellAuthors Info & Claims

ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

Article No.: 4, Pages 1 - 5

https://doi.org/10.1145/3152821.3152880

Published: 09 September 2017 Publication History

Get Access

Abstract

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives specifying task-based parallelism. The Mercurium compiler transforms the code to exploit the parallelism in the SMP host cores, and also to spawn work on CUDA/OpenCL devices, and FPGA accelerators. For the CUDA/OpenCL devices, the programmer needs only to insert the annotations and provide the kernel function to be compiled by the native CUDA/OpenCL compiler. In the case of the FPGAs, OmpSs uses the High-Level Synthesis tools from FPGA vendors to generate the IP configurations for the FPGA. In this paper we present the performance obtained on the matrix multiply benchmark in the Xilinx Zynq Ultrascale+, as a result of using OmpSs on this benchmark.

References

[1]

Intel Corp. 2017. Quartus Prime. (2017). https://www.altera.com/products/design-software/fpga-design/quartus-prime/what-s-new.html

Google Scholar

[2]

Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193.

Crossref

Google Scholar

[3]

Avnet Inc. 2017. Zedboard. (September 2017). http://zedboard.org/product/zedboard

Google Scholar

[4]

SECO Inc. 2017. The AXIOM Board. (2017). http://www.axiom-project.eu/2017/02/the-axiom-board-has-arrived/

Google Scholar

[5]

Xilinx Inc. 2017. Vivado High-Level Synthesis. (2017). http://www.xilinx.com/hls

Google Scholar

[6]

Xilinx Inc. 2017. Xilinx Zynq-7000 All Programmable SoC ZC702 Evaluation Kit. (September 2017). https://www.xilinx.com/products/boards-and-kits/ek-z7-zc702-g.html

Google Scholar

[7]

Xilinx Inc. 2017. Xilinx Zynq-7000 All Programmable SoC ZC706 Evaluation Kit. (September 2017). https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html

Google Scholar

[8]

Xilinx Inc. 2017. Zynq Ultrascale+ MPSoC. (2017). https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html

Google Scholar

[9]

Stephen Neuendorffer and Fernando Martinez-Vallina. 2013. Building Zynq® Accelerators with Vivado®High Level Synthesis. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '13). ACM, New York, NY, USA, 1--2.

Digital Library

Google Scholar

[10]

University of Tennessee. 2017. BLAS - Basic Linear Algebra Subprograms. (2017). http://www.netlib.org/blas/

Google Scholar

[11]

Florentino Sainz, Sergi Mateo, Vicenç Beltran, José Luis Bosque, Xavier Martorell, and Eduard Ayguadé. 2014. Leveraging OmpSs to Exploit Hardware Accelerators. In 26th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014, Paris, France, October 22-24, 2014. 112--119.

Digital Library

Google Scholar

Cited By

View all

Mencagli GTorquati MGriebler DFais ADanelutto M(2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2023.104782
Korakitis Ode Gonzalo SGuidotti NBarreto JMonteiro JPena A(2022)OmpSs-2 and OpenACC Interoperation2022 Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD56842.2022.00007(11-21)Online publication date: Nov-2022
https://doi.org/10.1109/WACCPD56842.2022.00007
Guidotti NCeyrat PBarreto JMonteiro JRodrigues RFonseca RMartorell XPeña A(2021)Particle-In-Cell Simulation Using Asynchronous TaskingEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_30(482-498)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_30
Show More Cited By

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to ...
Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

The notorious power wall has significantly limited the scaling for general-purpose processors. To address this issue, various accelerators, such as GPUs and FPGAs, emerged to achieve better performance and energy-efficiency. Between these two ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

September 2017

35 pages

ISBN:9781450353632

DOI:10.1145/3152821

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Spanish Ministerio de economia y competitividad

Conference

ANDARE '17

ANDARE '17: 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

September 9 - 13, 2017

OR, Portland, USA

Acceptance Rates

ANDARE '17 Paper Acceptance Rate 3 of 4 submissions, 75%;

Overall Acceptance Rate 3 of 4 submissions, 75%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
215
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Mencagli GTorquati MGriebler DFais ADanelutto M(2024)General-purpose data stream processing on heterogeneous architectures with WindFlowJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104782184:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.jpdc.2023.104782
Korakitis Ode Gonzalo SGuidotti NBarreto JMonteiro JPena A(2022)OmpSs-2 and OpenACC Interoperation2022 Workshop on Accelerator Programming Using Directives (WACCPD)10.1109/WACCPD56842.2022.00007(11-21)Online publication date: Nov-2022
https://doi.org/10.1109/WACCPD56842.2022.00007
Guidotti NCeyrat PBarreto JMonteiro JRodrigues RFonseca RMartorell XPeña A(2021)Particle-In-Cell Simulation Using Asynchronous TaskingEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_30(482-498)Online publication date: 25-Aug-2021
https://doi.org/10.1007/978-3-030-85665-6_30
Watanabe YLee JSano KBoku TSato M(2020)Design and Preliminary Evaluation of OpenACC Compiler for FPGA with OpenCL and Stream Processing DSLProceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops10.1145/3373271.3373274(10-16)Online publication date: 15-Jan-2020
https://dl.acm.org/doi/10.1145/3373271.3373274
Forsberg BBenini LMarongiu A(2020)HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCsIEEE Transactions on Computers10.1109/TC.2020.2980520(1-1)Online publication date: 2020
https://doi.org/10.1109/TC.2020.2980520
Ashworth MRiley GAttwood AMawer JGuan Q(2019)First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa ArchitectureScientific Programming10.1155/2019/78078602019Online publication date: 13-Oct-2019
https://dl.acm.org/doi/10.1155/2019/7807860
Watanabe YLee JBoku TSato M(2018)Trade-Off of Offloading to FPGA in OpenMP Task-Based ProgrammingEvolving OpenMP for Evolving Architectures10.1007/978-3-319-98521-3_7(96-110)Online publication date: 29-Aug-2018
https://doi.org/10.1007/978-3-319-98521-3_7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications

Understanding Performance Differences of FPGAs and GPUs: (Abtract Only)