Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A programming system for future proofing performance critical libraries

Published: 27 February 2016 Publication History

Abstract

We present Tangram, a programming system for writing performance-portable programs. The language enables programmers to write computation and composition codelets, supported by tuning knobs and primitives for expressing data parallelism and work decomposition. The compiler and runtime use a set of techniques such as hierarchical composition, coarsening, data placement, tuning, and runtime selection based on input characteristics and micro-profiling. The resulting performance is competitive with optimized vendor libraries.

References

[1]
B. Jang et al. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Trans. Parallel Distrib. Syst., 22(1):105--118, 2011.
[2]
D. Merrill et al. Policy-based tuning for performance portability and library co-optimization. In InPar, pages 1--10, 2012.
[3]
G. Blelloch. NESL: A nested data-parallel language. Technical report, Pittsburgh, PA, USA, 1992.
[4]
G. Chen et al. PORPLE: An extensible optimizer for portable data placement on GPU. In MICRO, pages 88--100, 2014.
[5]
H.-S. Kim et al. Locality-centric thread scheduling for bulk-synchronous programming models on cpu architectures. In CGO, pages 257--268, 2015.
[6]
J. Ansel et al. Petabricks: A language and compiler for algorithmic choice. In PLDI, pages 38--49, 2009.
[7]
R. Karrenberg and S. Hack. Improving Performance of OpenCL on CPUs. In CC, pages 1--20, 2012.
[8]
L.-W. Chang et al. Tangram: a high-level language for performance portable code synthesis. In In Programmability Issues for Heterogeneous Multicores, 2015.
[9]
L.-W. Chang et al. Dysel: Lightweight dynamic selection for kernel-based data-parallel programming model. In ASPLOS, 2016 (in press).
[10]
M. Püschel et al. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. International Journal of High Performance Computing Applications, 18(1):21--45, 2004.
[11]
P. Jääskeläinen et al. pocl: A performance-portable OpenCL implementation, 2014.
[12]
R. C. Whaley et el. Automated empirical optimizations of software and the atlas project. Parallel Computing, 27(1):3--35, 2001.
[13]
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, pages 44--54, 2009.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    February 2016
    420 pages
    ISBN:9781450340922
    DOI:10.1145/2851141
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2016
Published in SIGPLAN Volume 51, Issue 8

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media