The raw compute performance of today's graphics processor is truly amazing. With peak performance of over 60 GFLOPS, the compute power of today's graphics processor (GPU) dwarfs that of the commodity CPU at a price of only a few hundred dollars. As the programmability and performance of modern graphics hardware continues to increase, many researchers are looking to graphics hardware to solve computationally intensive problems previously performed on general purpose CPUs. The challenge, however, is how to re-target these processors from game rendering to general computation, such as numerical modeling, scientific computing, or signal processing. Traditional graphics APIs abstract the GPU as a rendering device, involving textures, triangles, and pixels. Mapping an algorithm to use these primitives is not a straightforward operation, even for the most advanced graphics developers. In this dissertation, we explore the concept of stream computing with GPUs. We describe the stream processor abstraction and how this abstraction and corresponding programming model can efficiently represent computation on the GPU. To formalize the model, we present Brook for GPUs, a programming system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a streaming processor and evaluate the performance of a collection of benchmark applications in comparison to their CPU implementations. For a variety of the applications explored in this dissertation, we demonstrate that our Brook implementations performs up to seven times faster than their CPU counterparts. We also discuss some of the algorithmic decisions which are critical for efficient execution when using the stream programming model for the GPU.
Cited By
- Plavec F, Vranesic Z and Brown S (2013). Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), 6:4, (1-37), Online publication date: 1-Dec-2013.
- Tian C, Lin C, Feng M and Gupta R Enhanced speculative parallelization via incremental recovery Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, (189-200)
- Tian C, Lin C, Feng M and Gupta R (2011). Enhanced speculative parallelization via incremental recovery, ACM SIGPLAN Notices, 46:8, (189-200), Online publication date: 7-Sep-2011.
- Ha L, Krüger J, Fletcher P, Joshi S and Silva C Fast parallel unbiased diffeomorphic atlas construction on multi-graphics processing units Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization, (41-48)
- Chen Tian , Min Feng , Nagarajan V and Gupta R Copy or Discard execution model for speculative parallelization on multicores Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, (330-341)
- Stachera J and Rokita P Real-time high-dynamic range texture compression based on local fractal transform Proceedings of the 24th Spring Conference on Computer Graphics, (43-50)
- Foley T and Sugerman J KD-tree acceleration structures for a GPU raytracer Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, (15-22)
Recommendations
Brook for GPUs: Stream Computing on Graphics Hardware
Seminal Graphics Papers: Pushing the Boundaries, Volume 2In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Graphics hardware & GPU computing: past, present, and future
GI '09: Proceedings of Graphics Interface 2009Modern GPUs have emerged as the world's most successful parallel architecture. GPUs provide a level of massively parallel computation that was once the preserve of supercomputers like the MasPar and Connection Machine. For example, NVIDIA's GeForce GTX ...
Why is graphics hardware so fast?
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programmingNVIDIA has claimed that their graphics processors (or GPUs) are improving at a rate three times faster than Moore's Law for processors. A $25 GPU is rated from 50-100 gigaflops and approximately 1 teraop (8-bit ops). Alongside this increase in ...