Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2555692.2555704acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

On the automatic generation of GPU-oriented software applications from RTL IPs

Published: 29 September 2013 Publication History

Abstract

Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good trade-off between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and pre-verified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.

References

[1]
AMD - Accelerated Processing Units. AMD Fusion APU Era Begins. http://www.amd.com/us/press-releases/Pages/amd-fusion-apu-era-2011jan04.aspx.
[2]
N. Bombieri, F. Fummi, and V. Guarnieri. FAST-GP: An RTL functional verification framework based on fault simulation on GP-GPUs. Proc. of ACM/IEEE DATE, pages 562--565, 2012.
[3]
D. Chatterjee, A. DeOrio, and V. Bertacco. Event-driven gate-level simulation with GP-GPUs. In Proc. of ACM/IEEE DAC, pages 557--562, 2009.
[4]
D. Chatterjee, A. DeOrio, and V. Bertacco. GCS: high-performance gate-level simulation with GP-GPUs. In Proc. of ACM/IEEE DATE, pages 1332--1337, 2009.
[5]
M. Doerksen, S. Solomon, and P. Thulasiraman. Designing APU oriented scientific computing applications in OpenCL. In Proc. of IEEE HPCC, pages 587--592, 2011.
[6]
D. D. Gajski, A. Wu, V. Chaiyakul, S. Mori, T. Nukiyama, and P. Bricaud. Essential issues for ip reuse. In Proc. of ACM/IEEE ASP-DAC, pages 37--42, 2000.
[7]
K. Gulati and S. P. Khatri. Towards acceleration of fault simulation using graphics processing units. In Proc. of ACM/IEEE DAC, pages 822--827, 2008.
[8]
ITRS. International Technology Roadmap for Semiconductors - 2011, 2011. http://www.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf.
[9]
Khronos group. OpenCL - The open standard for parallel programming of heterogeneous systems. http://www.khronos.org/opencl/.
[10]
M. A. Kochte, M. Schaal, H.-J. Wunderlich, and C. G. Zoellin. Efficient fault simulation on many-core processors. In Proc. of ACM/IEEE DAC, pages 380--385, 2010.
[11]
V. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proc. of ACM ISCA, pages 451--460, 2010.
[12]
H. Li, D. Xu, Y. Han, K. Cheng, and X. Li. nGFSIM: A GPU-based fault simulator for 1-to-n detection and its applications. In Proc. of IEEE ITC, pages 1--10, 2010.
[13]
D. Mayank, M. Ashwin, and F. Wu-chun. On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing. In Proc. of ACM SAAHPC, pages 141--149, 2011.
[14]
M. Nanjundappa, H. D. Patel, B. A. Jose, and S. K. Shukla. SCGPSim: A fast SystemC simulator on GPUs. Proc. of ACM/IEEE ASP-DAC, pages 149--154, 2010.
[15]
NVIDIA. Tegra 2 and Tegra 3 Super Processors. http://www.nvidia.com/object/tegra-superchip.html.
[16]
OpenCores.org. OpenCores. http://www.opencores.org/.
[17]
J. Owens, M. Houston, D. Luebke, S. Green, J. Stone, and J. Phillips. GPU computing. IEEE Proceedings of, 96(5):879--899, May 2008.
[18]
A. Sen, B. Aksanli, M. Bozkurt, and M. Mert. Parallel cycle based logic simulation using graphics processing units. In Proc. of IEEE ISPDC, pages 71--78, 2010.
[19]
S. Vinco, D. Chatterjee, V. Bertacco, and F. Fummi. SAGA: SystemC Acceleration on GPU Architectures. In Proc. of ACM/IEEE DAC, pages 115--120, 2012.

Cited By

View all
  • (2017)Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC SimulationProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062243(1-6)Online publication date: 18-Jun-2017

Index Terms

  1. On the automatic generation of GPU-oriented software applications from RTL IPs

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CODES+ISSS '13: Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
      September 2013
      335 pages
      ISBN:9781479914173

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 29 September 2013

      Check for updates

      Author Tags

      1. APU
      2. GP-GPU processing
      3. HW to SW migration
      4. OpenCL
      5. SW generation

      Qualifiers

      • Research-article

      Conference

      ESWEEK'13
      ESWEEK'13: Ninth Embedded System Week
      September 29 - October 4, 2013
      Quebec, Montreal, Canada

      Acceptance Rates

      CODES+ISSS '13 Paper Acceptance Rate 31 of 111 submissions, 28%;
      Overall Acceptance Rate 280 of 864 submissions, 32%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC SimulationProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062243(1-6)Online publication date: 18-Jun-2017

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media