Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

C Avalos Baddouh, M Khairy, RN Green… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-level
simulation is orders of magnitude slower than native silicon, the only solution is to …

Forecasting GPU Performance for Deep Learning Training and Inference

S Lee, A Phanishayee, D Mahajan - Proceedings of the 30th ACM …, 2025 - dl.acm.org
Deep learning kernels exhibit a high level of predictable memory accesses and compute
patterns, making GPU's architecture well-suited for their execution. Moreover, software and …

Treelet prefetching for ray tracing

YH Chou, T Nowicki, TM Aamodt - Proceedings of the 56th Annual IEEE …, 2023 - dl.acm.org
Ray tracing is traditionally only used in offline rendering to produce images of high fidelity
because it is computationally expensive. Recent Graphics Processing Units (GPUs) have …

CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs

J Pan, TG Rogers - 2024 IEEE International Symposium on …, 2024 - ieeexplore.ieee.org
… We would like to thank Cesar Avalos for his help in the project. We would also like to
thank Shichen Qiao and Matthew D. Sinclair for their work on per-stream stat in GPGPU-Sim. …

[PDF][PDF] Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads

M Payer, TG Rogers - 2021 - academia.edu
Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-level
simulation is orders of magnitude slower than native silicon, the only solution is to …

[PDF][PDF] Accelerating the Evaluation of Large Workloads on Post-Dennard Systems using Sampling

A Sabu - alenks.github.io
With the end of Moore’s law, computer architects have turned to alternative approaches to
enhance computational capabilities. One prominent strategy involves a shift towards …

Data-driven Forecasting of Deep Learning Performance on GPUs

S Lee, A Phanishayee, D Mahajan - arXiv preprint arXiv:2407.13853, 2024 - arxiv.org
Deep learning kernels exhibit predictable memory accesses and compute patterns, making
GPUs' parallel architecture well-suited for their execution. Software and runtime systems for …

Photon: A fine-grained sampled simulation methodology for GPU workloads

C Liu, Y Sun, TE Carlson - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
GPUs, due to their massively-parallel computing architectures, provide high performance for
data-parallel applications. However, existing GPU simulators are too slow to enable …

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

Y Li, Y Sun, A Jog - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Today, DNNs’ high computational complexity and sub-optimal device utilization present a
major roadblock to democratizing DNNs. To reduce the execution time and improve device …

Development Of A Heterogeneous Architecture Simulation Framework

S Mohapatra - 2022 - etda.libraries.psu.edu
Heterogenous systems consisting of processors of varying nature which complement each
other’s deficiencies are rapidly eclipsing the homogeneous systems of past. The consumer …