Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.Explains principles and strategies to learn parallel programming with OpenCL, from understanding the four abstraction models to thoroughly testing and debugging complete applications.Covers image processing, web plugins, particle simulations, video editing, performance optimization, and more.Shows how OpenCL maps to an example target architecture and explains some of the tradeoffs associated with mapping to various architecturesAddresses a range of fundamental programming techniques, with multiple examples and case studies that demonstrate OpenCL extensions for a variety of hardware platforms
Cited By
- Lin X, Lai L and Li H (2024). Parallel Static Learning Toward Heterogeneous Computing Architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43:3, (983-993), Online publication date: 1-Mar-2024.
- Brown A, Beaumont J, Thomas D, Shillcock J, Naylor M, Bragg G, Vousden M, Moore S and Fleming S (2023). POETS: An Event-driven Approach to Dissipative Particle Dynamics, ACM Transactions on Parallel Computing, 10:2, (1-32), Online publication date: 30-Jun-2023.
- Renney H, Gaster B and Mitchell T OpenCL vs Proceedings of the International Workshop on OpenCL, (1-11)
- Dávila Guzmán M, Nozal R, Gran Tejero R, Villarroya-Gaudó M, Suárez Gracia D and Bosque J (2019). Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL, The Journal of Supercomputing, 75:3, (1732-1746), Online publication date: 1-Mar-2019.
- Russo I, Bernardino H and Barbosa H (2017). A massively parallel Grammatical Evolution technique with OpenCL, Journal of Parallel and Distributed Computing, 109:C, (333-349), Online publication date: 1-Nov-2017.
- Alharbi N, Chavent M and Laramee R Real-time rendering of molecular dynamics simulation data Proceedings of the Conference on Computer Graphics & Visual Computing, (43-51)
- Varela J and Wehn N Near Real-Time Risk Simulation of Complex Portfolios on Heterogeneous Computing Systems with OpenCL Proceedings of the 5th International Workshop on OpenCL, (1-10)
- Jiang Z, Fei Y and Kaeli D A Novel Side-Channel Timing Attack on GPUs Proceedings of the Great Lakes Symposium on VLSI 2017, (167-172)
- Hall M, Robins C, Owens K, Nowatzke J, Lauck T and Smith L (2017). High performance supercomputing on a budget, Journal of Computing Sciences in Colleges, 32:4, (86-92), Online publication date: 1-Apr-2017.
- Gong X, Chen Z, Ziabari A, Ubal R and Kaeli D TwinKernels: an execution model to improve GPU hardware scheduling at compile time Proceedings of the 2017 International Symposium on Code Generation and Optimization, (39-49)
- Lee A and Abdelrahman T Launch-Time Optimization of OpenCL GPU Kernels Proceedings of the General Purpose GPUs, (32-41)
- De A, Zhang Y and Guo C (2016). A parallel adaptive segmentation method based on SOM and GPU with application to MRI image processing, Neurocomputing, 198:C, (180-189), Online publication date: 19-Jul-2016.
- Petre D, Lake A and Hux A OpenCL™ FFT Optimizations for Intel® Processor Graphics Proceedings of the 4th International Workshop on OpenCL, (1-4)
- Vialle S, Contassot-Vivier S and Mercier P Generic Algorithmic Scheme for 2D Stencil Applications on Hybrid Machines Proceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 - Volume 9637, (115-129)
- Murtojärvi M, Nevalainen O and Leppänen V (2015). Performance tuning and sparse traversal technique for a cell-based fetch length algorithm on a GPU, Concurrency and Computation: Practice & Experience, 27:17, (5114-5133), Online publication date: 10-Dec-2015.
- Varela J, Kestel C, De Schryver C, Wehn N, Desmettre S and Korn R Optimization strategies for portable code for Monte Carlo-based value-at-risk systems Proceedings of the 8th Workshop on High Performance Computational Finance, (1-8)
- Bailey M Fundamentals seminar ACM SIGGRAPH 2015 Courses, (1-129)
- Ma J, Ting T, Wen H, Fu B and Ban J GPU-Based Parameter Estimation Method for Photovoltaic Electrical Models Revised Selected Papers, Part II, of the 5th International Conference on Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques - Volume 9243, (298-307)
- Mukherjee S, Gong X, Yu L, McCardwell C, Ukidave Y, Dao T, Paravecino F and Kaeli D Exploring the features of OpenCL 2.0 Proceedings of the 3rd International Workshop on OpenCL, (1-5)
- McIntosh-Smith S, Price J, Sessions R and Ibarra A (2015). High performance in silico virtual drug screening on many-core processors, International Journal of High Performance Computing Applications, 29:2, (119-134), Online publication date: 1-May-2015.
- Li X, Grossman M and Kaeli D Mahout on heterogeneous clusters using HadoopCL Proceedings of the 2nd Workshop on Parallel Programming for Analytics Applications, (9-16)
- Elangovan V, Badia R and Ayguadé E Auto-Tuning OmpSs-OpenCL Kernels Across GPU Machines Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, (31-36)
- Takizawa H, Hirasawa S, Sugawara M, Gelado I, Kobayashi H and Hwu W (2016). Optimized data transfers based on the OpenCL event management mechanism, Scientific Programming, 2015, (2-2), Online publication date: 1-Jan-2015.
- Chang L, El-Araby E, Dang V and Dao L (2014). GPU acceleration of nonlinear diffusion tensor estimation using CUDA and MPI, Neurocomputing, 135:C, (328-338), Online publication date: 5-Jul-2014.
- Raba N and Stankova E Two Parallel Algorithms for Effective Calculation of the Precipitation Particle Spectra in Elaborated Numerical Models of Convective Clouds Proceedings of the 14th International Conference on Computational Science and Its Applications — ICCSA 2014 - Volume 8584, (289-299)
- Swamy T, Shah N, Luo P, Fei Y and Kaeli D Scalable and efficient implementation of correlation power analysis using graphics processing units (GPUs) Proceedings of the Third Workshop on Hardware and Architectural Support for Security and Privacy, (1-8)
- Banaś K, Płaszewski P and Macioł P (2014). Numerical integration on GPUs for higher order finite elements, Computers & Mathematics with Applications, 67:6, (1319-1344), Online publication date: 1-Apr-2014.
- Cassagnes A, Chen Y and Ohashi H Heterogeneous COS pricing of rainbow options Proceedings of the 6th Workshop on High Performance Computational Finance, (1-7)
- Bailey M Combining GPU data-parallel computing with OpenGL ACM SIGGRAPH 2013 Courses, (1-65)
- Mistry P, Ukidave Y, Schaa D and Kaeli D Valar Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, (54-65)
- Ko Y, Burgstaller B and Scholz B Parallel from the beginning Proceeding of the 44th ACM technical symposium on Computer science education, (415-420)
- McKean D and Sprinkle J Heterogeneous multi-core systems Proceedings of the 2012 workshop on Domain-specific modeling, (45-48)
- Joshi P, Bourges-Sévenier M, Russell K and Mo Z Graphics programming for the web ACM SIGGRAPH 2012 Courses, (1-75)
- Rao V, Agrawal N and Maity S C-DAC's efforts Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?, (1-4)
Index Terms
- Heterogeneous Computing with OpenCL
Recommendations
Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL
Multi-accelerator platforms combine CPUs and different accelerator architectures within a single compute node. Such systems are capable of processing parallel workloads very efficiently while being more energy efficient than regular systems consisting ...