Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/967900.968184acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Automatic parallel code generation for tiled nested loops

Published: 14 March 2004 Publication History

Abstract

This paper presents an overview of our work, concerning a complete end-to-end framework for automatically generating message passing parallel code for tiled nested for-loops. It considers general parallelepiped tiling transformations and general convex iteration spaces. We address all problems regarding both the generation of sequential tiled code and its parallelization. We have implemented our techniques in a tool which automatically generates MPI parallel code and conducted several series of experiments, concerning the compilation time of our tool, the efficiency of the generated code and the speedup attained on a cluster of PCs. Apart from confirming the value of our techniques, our experimental results show the merit of general parallelepiped tiling transformations and verify previous theoretical work on scheduling-optimal tile shapes.

References

[1]
V. Adve and J. Mellor-Crummey. Advanced Code Generation for High Performance Fortran. In Languages, Compilation Techniques and Run Time Systems for Scalable Parallel Systems, chapter 18, Lecture Notes in Computer Science Series. Springer-Verlag, 1997.]]
[2]
S. P. Amarasinghe and M. S. Lam. Communication Optimization and Code Generation for Distributed Memory Machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Albuquerque, NM, Jun 1993.]]
[3]
C. Ancourt and F. Irigoin. Scanning Polyhedra with DO Loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), pages 39--50, Williamsburg, VA, Apr 1991.]]
[4]
T. Andronikos, N. Koziris, G. Papakonstantinou, and P. Tsanakas. Optimal Scheduling for UET/UET-UCT Generalized N-Dimensional Grid Task Graphs. Journal of Parallel and Distributed Computing, 57(2):140--165, May 1999.]]
[5]
P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? INTEGRATION, The VLSI Jounal, 17:33--51, 1994.]]
[6]
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. In Proceedings of the Third Workshop on Compilers for Parallel Computers, pages 121--160, Jul 1992.]]
[7]
F. Desprez, J. Dongarra, and Y. Robert. Determining the Idle Time of a Tiling: New Results. Journal of Information Science and Engineering, 14:167--190, Mar 1997.]]
[8]
E. D'Hollander. Partitioning and Labeling of Loops by Unimodular Transformations. IEEE Trans. on Parallel and Distributed Systems, 3(4):465--476, Jul 1992.]]
[9]
A. Fernandez, J. Llaberia, and M. Valero. Loop Transformations Using Nonunimodular Matrices. IEEE Trans. on Parallel and Distributed Systems, 6(8):832--840, Aug 1995.]]
[10]
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu. Fortran-D Language Specification. Technical Report TR-91-170, Dept. of Computer Science, Rice University, Dec 1991.]]
[11]
G. Goumas, M. Athanasaki, and N. Koziris. Automatic Code Generation for Executing Tiled Nested Loops Onto Parallel Architectures. In Proceedings of the ACM Symposium on Applied Computing (SAC 2002), pages 876--881, Madrid, Spain, Mar 2002.]]
[12]
G. Goumas, N. Drosinos, M. Athanasaki, and N. Koziris. Compiling Tiled Iteration Spaces for Clusters. In Proceedings of the 2002 IEEE International Conference on Cluster Computing, pages 360--369, Chicago, Illinois, Sep 2002.]]
[13]
E. Hodzic and W. Shang. On Supernode Transformation with Minimized Total Running Time. IEEE Trans. on Parallel and Distributed Systems, 9(5):417--428, May 1998.]]
[14]
E. Hodzic and W. Shang. On Time Optimal Supernode Shape. IEEE Trans. on Parallel and Distributed Systems, 13(12):1220--1233, Dec 2002.]]
[15]
K Hogstedt, L. Carter, and J. Ferrante. On the Parallel Execution Time of Tiled Loops. IEEE Trans. on Parallel and Distributed Systems, 14(3):307--321, Mar 2003.]]
[16]
F. Irigoin and R. Triolet. Supernode Partitioning. In Proceedings of the 15th Ann. ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages, pages 319--329, San Diego, California, Jan 1988.]]
[17]
M. Kandemir, J. Ramanujam, and A. Choudary. Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed Memory Machines. Journal of Parallel and Distributed Computing, 60:924--965, 2000.]]
[18]
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. The Omega Library Interface Guide. Technical Report CS-TR-3445, CS Dept., Univ. of Maryland, College Park, Mar 1995.]]
[19]
J. Ramanujam. Beyond Unimodular Transformations. Journal of Supercomputing, 9(4):365--389, Oct 1995.]]
[20]
J. Ramanujam and P. Sadayappan. Tiling Multidimensional Iteration Spaces for Multicomputers. Journal of Parallel and Distributed Computing, 16:108--120, 1992.]]
[21]
J.-P. Sheu and T.-H. Tai. Partitioning and Mapping Nested Loops on Multiprocessor Systems. IEEE Trans. on Parallel and Distributed Systems, 2(4):430--439, Oct 1991.]]
[22]
P. Tang and J. Xue. Generating Efficient Tiled Code for Distributed Memory Machines. Parallel Computing, 26(11):1369--1410, 2000.]]
[23]
P. Tsanakas, N. Koziris, and G. Papakonstantinou. Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays. IEEE Trans. on Parallel and Distributed Systems, 11(9):941--955, Sep 2000.]]
[24]
M. Wolf and M. Lam. A Data Locality Optimizing Algorithm. In ACM SIGPLAN'91 Conference on Programming Language Design and Implementation (PLDI), Toronto, Ontario, Jun 1991.]]
[25]
M. Wolf and M. Lam. A Loop Transformation Theory and an Algorithm to Maximize Parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452--471, Oct 1991.]]
[26]
J. Xue. Communication-Minimal Tiling of Uniform Dependence Loops. Journal of Parallel and Distributed Computing, 42(1):42--59, 1997.]]

Cited By

View all
  • (2011)Data locality and parallelism optimization using a constraint-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.00571:2(280-287)Online publication date: 1-Feb-2011
  • (2010)Synthesizing and Verifying Multicore Parallelism in Categories of Nested Code GraphsProcess Algebra for Parallel and Distributed Processing10.1201/9781420064872.pt1Online publication date: 31-Jan-2010
  • (2010)Parameterized tiling revisitedProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772983(200-209)Online publication date: 24-Apr-2010
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '04: Proceedings of the 2004 ACM symposium on Applied computing
March 2004
1733 pages
ISBN:1581138121
DOI:10.1145/967900
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. automatic SPMD code generation
  3. nested loops
  4. parallelizing compilers
  5. supernodes
  6. tiling

Qualifiers

  • Article

Conference

SAC04
Sponsor:
SAC04: The 2004 ACM Symposium on Applied Computing
March 14 - 17, 2004
Nicosia, Cyprus

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Data locality and parallelism optimization using a constraint-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.00571:2(280-287)Online publication date: 1-Feb-2011
  • (2010)Synthesizing and Verifying Multicore Parallelism in Categories of Nested Code GraphsProcess Algebra for Parallel and Distributed Processing10.1201/9781420064872.pt1Online publication date: 31-Jan-2010
  • (2010)Parameterized tiling revisitedProceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1772954.1772983(200-209)Online publication date: 24-Apr-2010
  • (2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
  • (2006)Optimizing code parallelization through a constraint network based approachProceedings of the 43rd annual Design Automation Conference10.1145/1146909.1147083(863-688)Online publication date: 24-Jul-2006
  • (2006)Automatic performance optimization of the discrete fourier transform on distributed memory computersProceedings of the 4th international conference on Parallel and Distributed Processing and Applications10.1007/11946441_74(818-832)Online publication date: 4-Dec-2006
  • (2005)The MHETA Execution Model for Heterogeneous ClustersProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.73Online publication date: 12-Nov-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media