research-article

Programmable data dependencies and placements

Authors:

Magne HaveraaenAuthors Info & Claims

DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming

Pages 31 - 40

https://doi.org/10.1145/2103736.2103741

Published: 28 January 2012 Publication History

Abstract

One of the major issues in parallelizing applications is to deal with the inherent dependency structure of the program. Dependence analysis provides execution-order constraints between program statements, and can establish legitimate ways to carry out program code transformations. The concept of data dependency constitutes one class of dependencies obtained through dependence analysis, a form related to data parallelism. Since automatic dependence analysis has proved to be too complex for the general case, parallelizing compilers cannot help parallelizing every dependency pattern.

In many cases, the data dependency pattern of a computation is independent from the actual data values, i.e., it is static, though the pattern may scale with the size of the data set.

In this paper, we explore how a static, scalable data dependency can be presented to the compiler in a meaningful way. We describe the major components of a proposed framework in which static and possibly scalable data dependencies are turned into programmable entities. The framework provides a high, and easy to manipulate, level to deal with data distribution and placement of computations onto any parallel system which has a well defined space-time communication structure. The data dependency information together with the placement information can be utilised by a compiler to generate parallel code. This presentation explores the idea of programmable data placements in more detail through concrete examples for the CUDA API of Nvidia GPUs.

References

[1]

The OpenCL Specification, 2010. URL http://www.khronos.org/registry/cl/.

[2]

E. Axelsson, K. Claessen, G. Dévai, Z. Horváth, K. Keijzer, B. Lyckegård, A. Persson, M. Sheeran, J. Svenningsson, and A. Vajda. Feldspar: A domain specific language for digital signal processing algorithms. In Formal Methods and Models for Codesign (MEMOCODE) 2010, 8th IEEE/ACM International Conference on, pages 169--178, 2010. http://dx.doi.org/10.1109/MEMCOD.2010.5558637.

[3]

D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-performance computing. ACM Comput. Surv., 26: 345--420, December 1994. ISSN 0360-0300. http://doi.acm.org/10.1145/197405.197406.

Digital Library

[4]

U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81 (2): 211 --243, Feb. 1993. ISSN 0018--9219. http://dx.doi.org/10.1109/5.214548.

[5]

K. E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pages 307--314, 1968.

Digital Library

[6]

G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90--190, School of Computer Science, Carnegie Mellon University, Nov. 1990.

[7]

E. Burrows and M. Haveraaen. A hardware independent parallel programming model. Journal of Logic and Algebraic Programming, 78: 519--538, 2009. http://dx.doi.org/10.1016/j.jlap.2009.06.002.

[8]

A. Buss, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. STAPL: standard template adaptive parallel library. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR'10, pages 14:1--14:10, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--908--4. http://doi.acm.org/10.1145/1815695.1815713.

Digital Library

[9]

M. Chen, Y.-i. Choo, and J. Li. Crystal: theory and pragmatics of generating efficient parallel code. In B. K. Szymanski, editor, Parallel functional languages and compilers, pages 255--308. ACM, New York, NY, USA, 1991. ISBN 0--201--52243--8. http://doi.acm.org/10.1145/107214.129259.

Digital Library

[10]

M. Cole. Algorithmic skeletons: Structured management of parallel computation. MIT press, Mass., 1989.

Digital Library

[11]

R. L. Collins, B. Vellore, and L. P. Carloni. Recursion-driven parallel code generation for multi-core platforms. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE'10, pages 190--195, 3001 Leuven, Belgium, Belgium, 2010. European Design and Automation Association. ISBN 978--3--9810801--6--2. URL http://portal.acm.org/citation.cfm?id=1870926.1870972.

Digital Library

[12]

K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC'06, New York, NY, USA, 2006. ACM. ISBN 0--7695--2700-0. http://doi.acm.org/10.1145/1188455.1188543.

Digital Library

[13]

C. Gregg and K. Hazelwood. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on, pages 134 --144, april 2011. http://dx.doi.org/10.1109/ISPASS.2011.5762730.

Digital Library

[14]

M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. Int. J. Parallel Program., 28: 537--562, December 2000. ISSN 0885--7458. http://dx.doi.org/10.1023/A:1007560600904.

[15]

M. Haveraaen. Distributing programs on different parallel architectures. In D. A. Padua, editor, Proceedings of the 1990 International Conference on Parallel Processing (ICPP), volume II Software, pages 288--289. The Pennsylvania State University Press, University Park and London, 1990. ISBN 0--271-00728--1.

[16]

M. Haveraaen. Efficient parallelisation of recursive problems using constructive recursion. In Euro-Par 2000: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 758--761, London, UK, 2000. Springer-Verlag. ISBN 3--540--67956--1. http://dx.doi.org/10.1007/3--540--44520-X_104.

Digital Library

[17]

M. Haveraaen. Data dependencies and space time algebras in parallel programming. Technical Report 45, Department of Informatics, University of Bergen, Norway, June 1990.

[18]

HPC Wire. A call to arms for parallel programming standards. an interview with intel's tim mattson, november 2010. URL http://www.hpcwire.com.

[19]

Intel. Intel Parallel Studio, 2009. URL http://www.intel.com/go/parallel.

[20]

Intel. Intel's array building blocks, 2010. URL http://software.intel.com/en-us/data-parallel/.

[21]

W. M. Johnston, J. R. P. Hanna, and R. J. Millar. Advances in dataflow programming languages. ACM Comput. Surv., 36: 1--34, March 2004. ISSN 0360-0300. http://doi.acm.org/10.1145/1013208.1013209.

Digital Library

[22]

K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. ISBN 1--55860--286-0.

Digital Library

[23]

M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Caşcaval. How much parallelism is there in irregular applications? SIGPLAN Not., 44: 3--14, February 2009. ISSN 0362--1340. http://doi.acm.org/10.1145/1594835.1504181.

Digital Library

[24]

I. Kuon, R. Tessier, and J. Rose. FPGA architecture: Survey and challenges. Found. Trends Electron. Des. Autom., 2: 135--253, February 2008. ISSN 1551--3076. http://dx.doi.org/10.1561/1000000005.

Digital Library

[25]

J. McCanny, J. McWhirter, and S.-Y. Kung. The use of data dependence graphs in the design of bit-level systolic arrays. phAcoustics, Speech and Signal Processing, IEEE Transactions on, 38 (5): 787 --793, May 1990. ISSN 0096--3518. http://dx.doi.org/10.1109/29.56023.

[26]

W. L. Miranker and A. Winkler. Spacetime representations of computational structures. Computing, 32 (2): 93--114, 1984. ISSN 0010--485X. http://dx.doi.org/10.1007/BF02253685.

Digital Library

[27]

NVIDIA. CUDA Programming Guide. Manual, Nvidia, 2010. URL http://www.nvidia.com.

[28]

P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7 (11): 1121--1132, Nov. 1996. ISSN 1045--9219. http://dx.doi.org/10.1109/71.544354.

Digital Library

[29]

L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: part ii, multidimensional time. In phProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 90--100, New York, NY, USA, 2008. ACM.

Digital Library

[30]

K. Psarris and K. Kyriakopoulos. An experimental evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 15 (3): 196--213, 2004. ISSN 1045--9219. http://dx.doi.org/10.1109/TPDS.2004.1264806.

Digital Library

[31]

W. Pugh and D. Wonnacott. Static analysis of upper and lower bounds on dependences and parallelism. ACM Trans. Program. Lang. Syst., 16: 1248--1278, July 1994. ISSN 0164-0925. http://doi.acm.org/10.1145/183432.183525.

Digital Library

[32]

F. A. Rabhi and S. Gorlatch, editors. Patterns and skeletons for parallel and distributed computing. Springer-Verlag, London, UK, 2003. ISBN 1--85233--506--8.

Digital Library

[33]

J. Reinders. Intel Threading Building Blocks: Outfitting C

Digital Library

[34]

for Multi-core Processor Parallelism. O'Reilly, 2007.

[35]

R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP'99, pages 72--83, New York, NY, USA, 1999. ACM. ISBN 1--58113--100--3. http://doi.acm.org/10.1145/301104.301111.

Digital Library

[36]

M. Sheeran. Functional and dynamic programming in the design of parallel prefix networks. Journal of Functional Programming, FirstView: 1--56, 2010. http://dx.doi.org/10.1017/S0956796810000304.

Digital Library

[37]

S. Singh. Death of the rloc? Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on, page 145, 2000. ISSN 1082--3409. http://dx.doi.org/10.1109/FPGA.2000.903401.

Digital Library

[38]

S. Singh. The rloc is dead -- long live the rloc. In Field Programmable Gate Arrays (FPGA), 2011 ACM/SIGA International Symposium on. ACM, 2011.

Digital Library

[39]

J. Sklansky. Conditional-sum addition logic. Electronic Computers, IRE Transactions on, EC-9 (2): 226--231, June 1960. ISSN 0367--9950. http://dx.doi.org/10.1109/TEC.1960.5219822.

[40]

M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. Int. J. Parallel Program., 16: 137--178, April 1987. ISSN 0885--7458. http://dx.doi.org/10.1007/BF01379099.

Digital Library

Cited By

Stokke KBarash MJärvi J(2022)A domain-specific language for structure manipulation in constraint system-based GUIsJournal of Computer Languages10.1016/j.cola.2022.101175(101175)Online publication date: Nov-2022
https://doi.org/10.1016/j.cola.2022.101175

Index Terms

Programmable data dependencies and placements

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Control and data dependencies in business processes based on semantic business activities
iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services

Control and data dependencies are important information in business processes that supports process modeling, analysis, and execution. However, sequencing constraints, which are prescribed by control structures, obfuscate the true sources of ...
A cluster for CS education in the manycore era
SIGCSE '11: Proceedings of the 42nd ACM technical symposium on Computer science education

Traditional Beowulf clusters have been homogeneous platforms for distributed-memory MIMD parallelism. However, the shift to multicore architectures has made shared-memory MIMD parallelism increasingly important, and inexpensive manycore GPGPUs have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming

January 2012

62 pages

ISBN:9781450311175

DOI:10.1145/2103736

General Chair:
Umut Acar
Max Planck Institute for Software Systems, Germany
,
Program Chair:
Vítor Santos Costa
Universidade do Porto, Portugal

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

POPL '12

Sponsor:

SIGPLAN

POPL '12: The 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

January 28, 2012

Pennsylvania, Philadelphia, USA

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
205
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Stokke KBarash MJärvi J(2022)A domain-specific language for structure manipulation in constraint system-based GUIsJournal of Computer Languages10.1016/j.cola.2022.101175(101175)Online publication date: Nov-2022
https://doi.org/10.1016/j.cola.2022.101175

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten