Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2103736.2103741acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

Programmable data dependencies and placements

Published: 28 January 2012 Publication History

Abstract

One of the major issues in parallelizing applications is to deal with the inherent dependency structure of the program. Dependence analysis provides execution-order constraints between program statements, and can establish legitimate ways to carry out program code transformations. The concept of data dependency constitutes one class of dependencies obtained through dependence analysis, a form related to data parallelism. Since automatic dependence analysis has proved to be too complex for the general case, parallelizing compilers cannot help parallelizing every dependency pattern.
In many cases, the data dependency pattern of a computation is independent from the actual data values, i.e., it is static, though the pattern may scale with the size of the data set.
In this paper, we explore how a static, scalable data dependency can be presented to the compiler in a meaningful way. We describe the major components of a proposed framework in which static and possibly scalable data dependencies are turned into programmable entities. The framework provides a high, and easy to manipulate, level to deal with data distribution and placement of computations onto any parallel system which has a well defined space-time communication structure. The data dependency information together with the placement information can be utilised by a compiler to generate parallel code. This presentation explores the idea of programmable data placements in more detail through concrete examples for the CUDA API of Nvidia GPUs.

References

[1]
The OpenCL Specification, 2010. URL http://www.khronos.org/registry/cl/.
[2]
E. Axelsson, K. Claessen, G. Dévai, Z. Horváth, K. Keijzer, B. Lyckegård, A. Persson, M. Sheeran, J. Svenningsson, and A. Vajda. Feldspar: A domain specific language for digital signal processing algorithms. In Formal Methods and Models for Codesign (MEMOCODE) 2010, 8th IEEE/ACM International Conference on, pages 169--178, 2010. http://dx.doi.org/10.1109/MEMCOD.2010.5558637.
[3]
D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-performance computing. ACM Comput. Surv., 26: 345--420, December 1994. ISSN 0360-0300. http://doi.acm.org/10.1145/197405.197406.
[4]
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81 (2): 211 --243, Feb. 1993. ISSN 0018--9219. http://dx.doi.org/10.1109/5.214548.
[5]
K. E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pages 307--314, 1968.
[6]
G. E. Blelloch. Prefix sums and their applications. Technical Report Carnegie Mellon University-CS-90--190, School of Computer Science, Carnegie Mellon University, Nov. 1990.
[7]
E. Burrows and M. Haveraaen. A hardware independent parallel programming model. Journal of Logic and Algebraic Programming, 78: 519--538, 2009. http://dx.doi.org/10.1016/j.jlap.2009.06.002.
[8]
A. Buss, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. STAPL: standard template adaptive parallel library. In Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR'10, pages 14:1--14:10, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--908--4. http://doi.acm.org/10.1145/1815695.1815713.
[9]
M. Chen, Y.-i. Choo, and J. Li. Crystal: theory and pragmatics of generating efficient parallel code. In B. K. Szymanski, editor, Parallel functional languages and compilers, pages 255--308. ACM, New York, NY, USA, 1991. ISBN 0--201--52243--8. http://doi.acm.org/10.1145/107214.129259.
[10]
M. Cole. Algorithmic skeletons: Structured management of parallel computation. MIT press, Mass., 1989.
[11]
R. L. Collins, B. Vellore, and L. P. Carloni. Recursion-driven parallel code generation for multi-core platforms. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE'10, pages 190--195, 3001 Leuven, Belgium, Belgium, 2010. European Design and Automation Association. ISBN 978--3--9810801--6--2. URL http://portal.acm.org/citation.cfm?id=1870926.1870972.
[12]
K. Fatahalian, D. R. Horn, T. J. Knight, L. Leem, M. Houston, J. Y. Park, M. Erez, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC'06, New York, NY, USA, 2006. ACM. ISBN 0--7695--2700-0. http://doi.acm.org/10.1145/1188455.1188543.
[13]
C. Gregg and K. Hazelwood. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on, pages 134 --144, april 2011. http://dx.doi.org/10.1109/ISPASS.2011.5762730.
[14]
M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. Int. J. Parallel Program., 28: 537--562, December 2000. ISSN 0885--7458. http://dx.doi.org/10.1023/A:1007560600904.
[15]
M. Haveraaen. Distributing programs on different parallel architectures. In D. A. Padua, editor, Proceedings of the 1990 International Conference on Parallel Processing (ICPP), volume II Software, pages 288--289. The Pennsylvania State University Press, University Park and London, 1990. ISBN 0--271-00728--1.
[16]
M. Haveraaen. Efficient parallelisation of recursive problems using constructive recursion. In Euro-Par 2000: Proceedings from the 6th International Euro-Par Conference on Parallel Processing, number 1900 in Lecture Notes in Computer Science, pages 758--761, London, UK, 2000. Springer-Verlag. ISBN 3--540--67956--1. http://dx.doi.org/10.1007/3--540--44520-X_104.
[17]
M. Haveraaen. Data dependencies and space time algebras in parallel programming. Technical Report 45, Department of Informatics, University of Bergen, Norway, June 1990.
[18]
HPC Wire. A call to arms for parallel programming standards. an interview with intel's tim mattson, november 2010. URL http://www.hpcwire.com.
[19]
Intel. Intel Parallel Studio, 2009. URL http://www.intel.com/go/parallel.
[20]
Intel. Intel's array building blocks, 2010. URL http://software.intel.com/en-us/data-parallel/.
[21]
W. M. Johnston, J. R. P. Hanna, and R. J. Millar. Advances in dataflow programming languages. ACM Comput. Surv., 36: 1--34, March 2004. ISSN 0360-0300. http://doi.acm.org/10.1145/1013208.1013209.
[22]
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002. ISBN 1--55860--286-0.
[23]
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Caşcaval. How much parallelism is there in irregular applications? SIGPLAN Not., 44: 3--14, February 2009. ISSN 0362--1340. http://doi.acm.org/10.1145/1594835.1504181.
[24]
I. Kuon, R. Tessier, and J. Rose. FPGA architecture: Survey and challenges. Found. Trends Electron. Des. Autom., 2: 135--253, February 2008. ISSN 1551--3076. http://dx.doi.org/10.1561/1000000005.
[25]
J. McCanny, J. McWhirter, and S.-Y. Kung. The use of data dependence graphs in the design of bit-level systolic arrays. phAcoustics, Speech and Signal Processing, IEEE Transactions on, 38 (5): 787 --793, May 1990. ISSN 0096--3518. http://dx.doi.org/10.1109/29.56023.
[26]
W. L. Miranker and A. Winkler. Spacetime representations of computational structures. Computing, 32 (2): 93--114, 1984. ISSN 0010--485X. http://dx.doi.org/10.1007/BF02253685.
[27]
NVIDIA. CUDA Programming Guide. Manual, Nvidia, 2010. URL http://www.nvidia.com.
[28]
P. Petersen and D. Padua. Static and dynamic evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 7 (11): 1121--1132, Nov. 1996. ISSN 1045--9219. http://dx.doi.org/10.1109/71.544354.
[29]
L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: part ii, multidimensional time. In phProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 90--100, New York, NY, USA, 2008. ACM.
[30]
K. Psarris and K. Kyriakopoulos. An experimental evaluation of data dependence analysis techniques. Parallel and Distributed Systems, IEEE Transactions on, 15 (3): 196--213, 2004. ISSN 1045--9219. http://dx.doi.org/10.1109/TPDS.2004.1264806.
[31]
W. Pugh and D. Wonnacott. Static analysis of upper and lower bounds on dependences and parallelism. ACM Trans. Program. Lang. Syst., 16: 1248--1278, July 1994. ISSN 0164-0925. http://doi.acm.org/10.1145/183432.183525.
[32]
F. A. Rabhi and S. Gorlatch, editors. Patterns and skeletons for parallel and distributed computing. Springer-Verlag, London, UK, 2003. ISBN 1--85233--506--8.
[33]
J. Reinders. Intel Threading Building Blocks: Outfitting C
[34]
for Multi-core Processor Parallelism. O'Reilly, 2007.
[35]
R. Rugina and M. Rinard. Automatic parallelization of divide and conquer algorithms. In Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP'99, pages 72--83, New York, NY, USA, 1999. ACM. ISBN 1--58113--100--3. http://doi.acm.org/10.1145/301104.301111.
[36]
M. Sheeran. Functional and dynamic programming in the design of parallel prefix networks. Journal of Functional Programming, FirstView: 1--56, 2010. http://dx.doi.org/10.1017/S0956796810000304.
[37]
S. Singh. Death of the rloc? Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on, page 145, 2000. ISSN 1082--3409. http://dx.doi.org/10.1109/FPGA.2000.903401.
[38]
S. Singh. The rloc is dead -- long live the rloc. In Field Programmable Gate Arrays (FPGA), 2011 ACM/SIGA International Symposium on. ACM, 2011.
[39]
J. Sklansky. Conditional-sum addition logic. Electronic Computers, IRE Transactions on, EC-9 (2): 226--231, June 1960. ISSN 0367--9950. http://dx.doi.org/10.1109/TEC.1960.5219822.
[40]
M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. Int. J. Parallel Program., 16: 137--178, April 1987. ISSN 0885--7458. http://dx.doi.org/10.1007/BF01379099.

Cited By

View all
  • (2022)A domain-specific language for structure manipulation in constraint system-based GUIsJournal of Computer Languages10.1016/j.cola.2022.101175(101175)Online publication date: Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
January 2012
62 pages
ISBN:9781450311175
DOI:10.1145/2103736
  • General Chair:
  • Umut Acar,
  • Program Chair:
  • Vítor Santos Costa
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. GPGPU
  3. data dependency
  4. hardware abstraction
  5. manycore
  6. platform-independence

Qualifiers

  • Research-article

Conference

POPL '12
Sponsor:

Upcoming Conference

POPL '26

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A domain-specific language for structure manipulation in constraint system-based GUIsJournal of Computer Languages10.1016/j.cola.2022.101175(101175)Online publication date: Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media