Withdraw
Loading…
Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors
Sadeghi Baghsorkhi, Sara
Loading…
Permalink
https://hdl.handle.net/2142/26373
Description
- Title
- Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors
- Author(s)
- Sadeghi Baghsorkhi, Sara
- Issue Date
- 2011-08-26T15:33:26Z
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-Mei W.
- Doctoral Committee Chair(s)
- Hwu, Wen-Mei W.
- Committee Member(s)
- Gropp, William D.
- Navarro, Nacho
- Padua, David A.
- Patel, Sanjay J.
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- GPU computing
- performance evaluation
- memory hierarchy
- Graphics Processing Unit (GPU)
- Abstract
- With the emergence of highly multithreaded architectures, an effective performance monitoring system must reflect the interaction between a large number of concurrent events, and associate the overall effect of individual events and inefficiencies to the operations in the application source code. The state-of-the-art performance counters in highly multithreaded graphic processors currently do not provide this level of precision. Although fine-grained sampling of performance counters after each source-level operation could potentially achieve the desired precision, the high frequency of sampling required will likely cause too much distortion to the actual application behavior and make the sampled counter values inaccurate. In this thesis, I present a novel software-based approach for monitoring the memory hierarchy performance in highly multithreaded general-purpose graphics processors. The proposed analysis is based on memory traces collected for small snapshots of application execution. A trace-based memory hierarchy model with a Monte Carlo experimental methodology generates statistical bounds of performance measures in the presence of nonuniform thread interleaving and data sharing in a highly multithreaded execution environment. The statistical approach overcomes the classical problem of disturbed execution timing due to instrumentation. The approach scales well as I deploy a minimal sampling technique to reduce the trace generation overhead and model simulation time. The proposed scheme also keeps track of individual memory operations in the source code and can quantify the amount of their contribution to detrimental effects on memory system performance. A cross-validation of the model results shows close agreement with the values read from the hardware performance counters on an NVIDIA Tesla C2050. I later use the predicted memory hierarchy performance statistics in an analytical model to identify performance characteristics of a kernel and its expected execution time. To account for the systematic error present in the predictions, I approximate the error function and express a range of potential true execution times for each predicted value.
- Graduation Semester
- 2011-08
- Permalink
- http://hdl.handle.net/2142/26373
- Copyright and License Information
- Copyright 2011 Sara Sadeghi Baghsorkhi
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…