Google Scholar

PORPLE: An extensible optimizer for portable data placement on GPU

G Chen, B Wu, D Li, X Shen - 2014 47th Annual IEEE/ACM …, 2014 - ieeexplore.ieee.org

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014•ieeexplore.ieee.org

GPU is often equipped with complex memory systems, including globalmemory, texture memory, shared memory, constant memory, and variouslevels of cache. Where to place the data is important for theperformance of a GPU program. However, the decision is difficult for aprogrammer to make because of architecture complexity and thesensitivity of suitable data placements to input and architecturechanges.This paper presents PORPLE, a portable data placement engine thatenables a new way to solve the data placement problem. PORPLE consistsof a mini specification language, a source-to-source compiler, and a runtime data placer. The language allows an easy description of amemory system; the compiler transforms a GPU program into a formamenable to runtime profiling and data placement; the placer, based onthe memory description and data access patterns, identifies on the flyappropriate placement schemes for data and places themaccordingly. PORPLE is distinctive in being adaptive to program inputsand architecture changes, being transparent to programmers (in mostcases), and being extensible to new memory architectures. Ourexperiments on three types of GPU systems show that PORPLE is able toconsistently find optimal or near-optimal placement despite the largedifferences among GPU architectures and program inputs, yielding up to2.08X (1.59X on average) speedups on a set of regular and irregularGPU benchmarks.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 82 Related articles All 10 versions

Cite

Advanced search

Saved to My library

PORPLE: An extensible optimizer for portable data placement on GPU