research-article

On the automatic generation of GPU-oriented software applications from RTL IPs

Authors:

Nicola Bombieri,

Sara VincoAuthors Info & Claims

CODES+ISSS '13: Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Article No.: 12, Pages 1 - 10

Published: 29 September 2013 Publication History

Abstract

Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good trade-off between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and pre-verified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.

References

[1]

AMD - Accelerated Processing Units. AMD Fusion APU Era Begins. http://www.amd.com/us/press-releases/Pages/amd-fusion-apu-era-2011jan04.aspx.

[2]

N. Bombieri, F. Fummi, and V. Guarnieri. FAST-GP: An RTL functional verification framework based on fault simulation on GP-GPUs. Proc. of ACM/IEEE DATE, pages 562--565, 2012.

Digital Library

[3]

D. Chatterjee, A. DeOrio, and V. Bertacco. Event-driven gate-level simulation with GP-GPUs. In Proc. of ACM/IEEE DAC, pages 557--562, 2009.

Digital Library

[4]

D. Chatterjee, A. DeOrio, and V. Bertacco. GCS: high-performance gate-level simulation with GP-GPUs. In Proc. of ACM/IEEE DATE, pages 1332--1337, 2009.

Digital Library

[5]

M. Doerksen, S. Solomon, and P. Thulasiraman. Designing APU oriented scientific computing applications in OpenCL. In Proc. of IEEE HPCC, pages 587--592, 2011.

Digital Library

[6]

D. D. Gajski, A. Wu, V. Chaiyakul, S. Mori, T. Nukiyama, and P. Bricaud. Essential issues for ip reuse. In Proc. of ACM/IEEE ASP-DAC, pages 37--42, 2000.

Digital Library

[7]

K. Gulati and S. P. Khatri. Towards acceleration of fault simulation using graphics processing units. In Proc. of ACM/IEEE DAC, pages 822--827, 2008.

Digital Library

[8]

ITRS. International Technology Roadmap for Semiconductors - 2011, 2011. http://www.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf.

[9]

Khronos group. OpenCL - The open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/.

[10]

M. A. Kochte, M. Schaal, H.-J. Wunderlich, and C. G. Zoellin. Efficient fault simulation on many-core processors. In Proc. of ACM/IEEE DAC, pages 380--385, 2010.

Digital Library

[11]

V. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of ACM ISCA, pages 451--460, 2010.

Digital Library

[12]

H. Li, D. Xu, Y. Han, K. Cheng, and X. Li. nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications. In Proc. of IEEE ITC, pages 1--10, 2010.

[13]

D. Mayank, M. Ashwin, and F. Wu-chun. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In Proc. of ACM SAAHPC, pages 141--149, 2011.

Digital Library

[14]

M. Nanjundappa, H. D. Patel, B. A. Jose, and S. K. Shukla. SCGPSim: A fast SystemC simulator on GPUs. Proc. of ACM/IEEE ASP-DAC, pages 149--154, 2010.

Digital Library

[15]

NVIDIA. Tegra 2 and Tegra 3 Super Processors. http://www.nvidia.com/object/tegra-superchip.html.

[16]

OpenCores.org. OpenCores. http://www.opencores.org/.

[17]

J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips. GPU computing. IEEE Proceedings of, 96(5):879--899, May 2008.

[18]

A. Sen, B. Aksanli, M. Bozkurt, and M. Mert. Parallel cycle based logic simulation using graphics processing units. In Proc. of IEEE ISPDC, pages 71--78, 2010.

Digital Library

[19]

S. Vinco, D. Chatterjee, V. Bertacco, and F. Fummi. SAGA: SystemC Acceleration on GPU Architectures. In Proc. of ACM/IEEE DAC, pages 115--120, 2012.

Digital Library

Cited By

Schmidt TLiu GDömer R(2017)Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC SimulationProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062243(1-6)Online publication date: 18-Jun-2017
https://dl.acm.org/doi/10.1145/3061639.3062243

Index Terms

On the automatic generation of GPU-oriented software applications from RTL IPs
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
PIPSEA: A Practical IPsec Gateway on Embedded APUs
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Accelerated Processing Unit (APU) is a heterogeneous multicore processor that contains general-purpose CPU cores and a GPU in a single chip. It also supports Heterogeneous System Architecture (HSA) that provides coherent physically-shared memory between ...
On the efficiency of the accelerated processing unit for scientific computing
HPC '16: Proceedings of the 24th High Performance Computing Symposium

The AMD APU (Accelerated Processing Unit) architecture, which combines CPU and GPU cores on the same die at a low power budget, promises a significant advent in GPU computing, in particular to applications which performance is bottlenecked by the low ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CODES+ISSS '13: Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

September 2013

335 pages

ISBN:9781479914173

Program Chairs:
Radu Marculescu
Carnegie Mellon University
,
Preeti Ranjan Panda
IIT Delhi

Sponsors

Publisher

IEEE Press

Publication History

Published: 29 September 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK'13

Sponsor:

ESWEEK'13: Ninth Embedded System Week

September 29 - October 4, 2013

Quebec, Montreal, Canada

Acceptance Rates

CODES+ISSS '13 Paper Acceptance Rate 31 of 111 submissions, 28%;

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
44
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schmidt TLiu GDömer R(2017)Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC SimulationProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062243(1-6)Online publication date: 18-Jun-2017
https://dl.acm.org/doi/10.1145/3061639.3062243

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents