Physical challenges in reliable graphics hardware design

January 2007

Author:
Jeremy W. Sheaffer
University of Virginia
,
Advisers:
Kevin Skadron
University of Virginia
,
David Luebke
University of Virginia

Publisher:

University of Virginia
Charlottesville, VA
United States

ISBN:978-0-549-25829-2

Order Number:AAI3283290

Pages:

135

Purchase on ProQuest

Bibliometrics

Abstract

Specialized 3-D graphics processors (GPUs) for the commodity market first appeared in the mid-1990s. At that time, commodity CPU development had already had a nearly 20 year head start on the development of specialized graphics hardware. GPU architects exploited the incredible inroads in semiconductor technology that were driven by the already mature CPU manufacturers, filling the available silicon real-estate with logic. In the intervening decade, the complexity of GPUs has advanced considerably, but most of the additional complexity comes as a result of increased parallelization with the introduction of more vertex and fragment processing pipelines. GPUs parallelized to the limits of the lithographic process, placing as many transistors as possible in each new generation product.

By 2004, this increased complexity was leading to thermal management issues, which strongly influence both lifetime and reliability. At that time, reliability was not high on the list of priorities for GPU vendors, though the cost of the cooling solution was. However, in 2003, vendors started adding programmability to some functional units of the graphics pipeline in commodity GPUs. Since then, as this programmability has continued to evolve, researchers have put graphics processors to use as highly parallel, floating point co-processors for scientific calculation ( General Purpose computation on Graphics Processing Units or GPGPU ). With the advent of GPGPU came a push for reliability of the results of the computation. Indeed, GPU based supercomputers have already been built and high errors rates are regularly observed.

This dissertation discusses Qsilver, a simulation framework for graphics architectures; it uses Qsilver to analyze the application of some CPU static and dynamic thermal management techniques to the graphics domain; it presents a characterization of the effects of transient errors on traditional graphics workloads, including an assessment of the most vulnerable set of state for traditional graphics; and finally, it provides a detailed survey and analysis of proposed transient fault detection and recovery mechanisms for GPGPU on modern graphics processors.

Contributors

K. Skadron
University of Virginia
- Publication Years1997 - 2024
- Publication counts166
- Citation count10,120
- Available for Download82
- Downloads (cumulative)151,513
- Downloads (12 months)13,291
- Downloads (6 weeks)2,586
- Average Downloads per Article1,848
- Average Citation per Article61
View Full Profile
David Patrick Luebke
NVIDIA
- Publication Years1995 - 2023
- Publication counts100
- Citation count4,494
- Available for Download73
- Downloads (cumulative)102,980
- Downloads (12 months)7,131
- Downloads (6 weeks)985
- Average Downloads per Article1,411
- Average Citation per Article45
View Full Profile
Jeremy W Sheaffer
University of Virginia
- Publication Years2004 - 2023
- Publication counts14
- Citation count1,401
- Available for Download4
- Downloads (cumulative)2,115
- Downloads (12 months)64
- Downloads (6 weeks)7
- Average Downloads per Article529
- Average Citation per Article100
View Full Profile

Index Terms

Physical challenges in reliable graphics hardware design
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Hardware

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Stream computing on graphics hardware
Why is graphics hardware so fast?
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming

NVIDIA has claimed that their graphics processors (or GPUs) are improving at a rate three times faster than Moore's Law for processors. A $25 GPU is rated from 50-100 gigaflops and approximately 1 teraop (8-bit ops). Alongside this increase in ...
Parallel ant colony for nonlinear function optimization with graphics hardware acceleration
SMC'09: Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics

This paper presents a massively parallel Ant Colony Optimization - Pattern Search (ACO-PS) algorithm with graphics hardware acceleration on nonlinear function optimization problems. The objective of this study is to determine the effectiveness of using ...

Browse Theses

Sections

Index Terms

Stream computing on graphics hardware

Why is graphics hardware so fast?

Parallel ant colony for nonlinear function optimization with graphics hardware acceleration

Sections

Save to Binder

Index Terms

Recommendations

Stream computing on graphics hardware

Why is graphics hardware so fast?

Parallel ant colony for nonlinear function optimization with graphics hardware acceleration