research-article

Synthesising graphics card programs from DSLs

Authors:

Oege de MoorAuthors Info & Claims

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 121 - 132

https://doi.org/10.1145/2254064.2254080

Published: 11 June 2012 Publication History

Abstract

Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions.

However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture.

In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.

References

[1]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, Juan-les-Pins, France, September 2004.

Digital Library

[2]

H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, pages 835--847, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0203-6.

Digital Library

[3]

R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, July 1999. ISBN 0521629713.

[4]

S. Eddy. HMMer Website, including User Manual. http://hmmer.wustl.edu.

[5]

C. Elliott. Programming graphics processors functionally. In Proceedings of the 2004 ACM SIGPLAN workshop on Haskell, Haskell '04, pages 45--56, New York, NY, USA, 2004. ACM. ISBN 1-58113-850-4. http://doi.acm.org/10.1145/1017472.1017482.

Digital Library

[6]

R. Giegerich and C. Meyer. Algebraic dynamic programming. In AMAST '02: Proceedings of the 9th International Conference on Algebraic Methodology and Software Technology, pages 349--364, London, UK, 2002. Springer-Verlag. ISBN 3-540-44144-1.

Digital Library

[7]

J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal Linear Algebra Methods Environment. ACM Transactions on Mathematical Software, 27 (4): 422--455, Dec. 2001. ISSN 0098-3500. URL http://doi.acm.org/10.1145/504210.504213.

Digital Library

[8]

S. L. P. Jones and S. Singh. A tutorial on parallel and concurrent programming in haskell. In P. W. M. Koopman, R. Plasmeijer, and S. D. Swierstra, editors, Advanced Functional Programming, volume 5832 of Lecture Notes in Computer Science, pages 267--305. Springer, 2008. ISBN 978-3-642-04651-3.

Digital Library

[9]

R. M. Karp and M. Held. Finite-state processes and dynamic programming. SIAM Journal on Applied Mathematics, 15 (3): 693--718, 1967. 10.1137/0115060.

[10]

A. Krogh, I. S. Mian, and D. Haussler. A hidden markov model that finds genes in e.coli dna. Nucleic Acids Research, 22 (22): 4768--4778, 1994. 10.1093/nar/22.22.4768.

[11]

L. Lamport. The parallel execution of do loops. Communications of The ACM, 17: 83--93, February 1974. 10.1145/360827.360844.

Digital Library

[12]

C. Lengauer. Loop parallelization in the polytope model. In CONCUR '93, Lecture Notes in Computer Science 715, pages 398--416. Springer-Verlag, 1993.

Digital Library

[13]

Y. Liu, B. Schmidt, and D. Maskell. Cudasw 2.0: enhanced smith-waterman protein database search on cuda-enabled gpus based on simt and virtualized simd abstractions. BMC Research Notes, 3 (1): 93, 2010. ISSN 1756-0500. 10.1186/1756-0500-3-93.

[14]

G. Lunter. HMMoC a compiler for hidden Markov models. Bioinformatics, 23 (18): 2485--2487, September 2007. 10.1093/bioinformatics/btm350.

Digital Library

[15]

G. Mainland and G. Morrisett. Nikola: embedding compiled gpu functions in haskell. In Haskell '10: Proceedings of the third ACM Haskell symposium on Haskell, pages 67--78, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0252-4.

Digital Library

[16]

W. R. Pearson and D. J. Lipman. Improved tools for biological sequence comparison. Proceedings of The National Academy of Sciences, 85: 2444--2448, 1988.

[17]

P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. 2009. URL HAL:http://hal.archives-ouvertes.fr/inria-00438219/en/.

Digital Library

[18]

J. Svensson, K. Claessen, and M. Sheeran. Gpgpu kernel implementation and refinement using obsidian. Procedia Computer Science, 1 (1): 2065--2074, 2010. ISSN 1877-0509. ICCS 2010.

[19]

A. van Deursen, P. Klint, and J. Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35: 26--36, June 2000.

Digital Library

[20]

H. Verge, C. Mauras, and P. Quinton. The alpha language and its use for the design of systolic arrays. The Journal of VLSI Signal Processing, 3: 173--182, 1991. ISSN 0922-5773. URL http://dx.doi.org/10.1007/BF00925828. 10.1007/BF00925828.

Digital Library

[21]

J. P. Walters, V. Balu, S. Kompalli, and V. Chaudhary. Evaluating the use of gpus in liver image segmentation and hmmer database searches. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-1-4244-3751-1.

Digital Library

Cited By

Hummelgren LPalmkvist VStjerna LXu XJalden JBroman DLaemmel RPereira JMosses P(2024)Trellis: A Domain-Specific Language for Hidden Markov Models with Sparse TransitionsProceedings of the 17th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3687997.3695641(196-209)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3687997.3695641
Hijma PNieuwpoort RJacobs CBal H(2015)Stepwise-refinement for performanceConcurrency and Computation: Practice & Experience10.1002/cpe.341627:17(4515-4554)Online publication date: 10-Dec-2015
https://dl.acm.org/doi/10.1002/cpe.3416
Jonnalagedda MCoppey TStucki SRompf TOdersky M(2014)Staged parser combinators for efficient data processingACM SIGPLAN Notices10.1145/2714064.266024149:10(637-653)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660241
Show More Cited By

Index Terms

Synthesising graphics card programs from DSLs
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Synthesising graphics card programs from DSLs
PLDI '12

Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions.

However, this increase can be ...
Developmental directions in parallel accelerators
AusPDC '14: Proceedings of the Twelfth Australasian Symposium on Parallel and Distributed Computing - Volume 152

Parallel accelerators such as massively-cored graphical processing units or many-cored co-processors such as the Xeon Phi are becoming widespread and affordable on many systems including blade servers and even desktops. The use of a single such ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2012

572 pages

ISBN:9781450312059

DOI:10.1145/2254064

General Chairs:
Jan Vitek
Purdue University
,
Haibo Lin
Microsoft China
,
Program Chair:
Frank Tip
IBM T.J. Watson Research Center

ACM SIGPLAN Notices Volume 47, Issue 6
PLDI '12
June 2012
534 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2345156
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '12

Sponsor:

SIGPLAN

PLDI '12: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 11 - 16, 2012

Beijing, China

Acceptance Rates

PLDI '12 Paper Acceptance Rate 48 of 255 submissions, 19%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
409
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hummelgren LPalmkvist VStjerna LXu XJalden JBroman DLaemmel RPereira JMosses P(2024)Trellis: A Domain-Specific Language for Hidden Markov Models with Sparse TransitionsProceedings of the 17th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3687997.3695641(196-209)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3687997.3695641
Hijma PNieuwpoort RJacobs CBal H(2015)Stepwise-refinement for performanceConcurrency and Computation: Practice & Experience10.1002/cpe.341627:17(4515-4554)Online publication date: 10-Dec-2015
https://dl.acm.org/doi/10.1002/cpe.3416
Jonnalagedda MCoppey TStucki SRompf TOdersky M(2014)Staged parser combinators for efficient data processingACM SIGPLAN Notices10.1145/2714064.266024149:10(637-653)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2714064.2660241
Jonnalagedda MCoppey TStucki SRompf TOdersky MBlack AMillstein T(2014)Staged parser combinators for efficient data processingProceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications10.1145/2660193.2660241(637-653)Online publication date: 15-Oct-2014
https://dl.acm.org/doi/10.1145/2660193.2660241
JACKSON E(2014)A Module System for Domain-Specific LanguagesTheory and Practice of Logic Programming10.1017/S147106841400033714:4-5(771-785)Online publication date: 21-Jul-2014
https://doi.org/10.1017/S1471068414000337

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents