Nothing Special   »   [go: up one dir, main page]

skip to main content
Physical challenges in reliable graphics hardware design
Publisher:
  • University of Virginia
  • Charlottesville, VA
  • United States
ISBN:978-0-549-25829-2
Order Number:AAI3283290
Pages:
135
Reflects downloads up to 12 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

Specialized 3-D graphics processors (GPUs) for the commodity market first appeared in the mid-1990s. At that time, commodity CPU development had already had a nearly 20 year head start on the development of specialized graphics hardware. GPU architects exploited the incredible inroads in semiconductor technology that were driven by the already mature CPU manufacturers, filling the available silicon real-estate with logic. In the intervening decade, the complexity of GPUs has advanced considerably, but most of the additional complexity comes as a result of increased parallelization with the introduction of more vertex and fragment processing pipelines. GPUs parallelized to the limits of the lithographic process, placing as many transistors as possible in each new generation product.

By 2004, this increased complexity was leading to thermal management issues, which strongly influence both lifetime and reliability. At that time, reliability was not high on the list of priorities for GPU vendors, though the cost of the cooling solution was. However, in 2003, vendors started adding programmability to some functional units of the graphics pipeline in commodity GPUs. Since then, as this programmability has continued to evolve, researchers have put graphics processors to use as highly parallel, floating point co-processors for scientific calculation ( General Purpose computation on Graphics Processing Units or GPGPU ). With the advent of GPGPU came a push for reliability of the results of the computation. Indeed, GPU based supercomputers have already been built and high errors rates are regularly observed.

This dissertation discusses Qsilver, a simulation framework for graphics architectures; it uses Qsilver to analyze the application of some CPU static and dynamic thermal management techniques to the graphics domain; it presents a characterization of the effects of transient errors on traditional graphics workloads, including an assessment of the most vulnerable set of state for traditional graphics; and finally, it provides a detailed survey and analysis of proposed transient fault detection and recovery mechanisms for GPGPU on modern graphics processors.

Contributors
  • University of Virginia
  • University of Virginia
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations