Article

Free access

An affine partitioning algorithm to maximize parallelism and minimize communication

Authors:

Amy W. Lim,

Gerald I. Cheong,

Monica S. LamAuthors Info & Claims

ICS '99: Proceedings of the 13th international conference on Supercomputing

Pages 228 - 237

https://doi.org/10.1145/305138.305197

Published: 01 May 1999 Publication History

PDF eReader

References

[1]

J. R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the Fourteenth Annual ACM Symposium on Principles of Programmin9 Languages, pages 63-76: January 1987.]]

Digital Library

Google Scholar

[2]

J. M. Anderson, S. P. Amarasinghe, and M. S. Lain. Data and computation transformations for multiprocessors. In Proceedings of the Fifth A CM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 166-178, July 1995.]]

Digital Library

Google Scholar

[3]

J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the A CM S{GPLAN '93 Conference on Programming Language Design and Implementation, pages 112-125, June 1993.]]

Digital Library

Google Scholar

[4]

U. Banerjee. Unimodular transformations of double loops. In Proceedings of the Third Workshop on Languages and Compilers for Parallel Computing, pages 192-219, August 1990.]]

Google Scholar

[5]

U. Banerjee. Loop Transformations for Restructuring Uornpilers. Kluwer Academic, 1993.]]

Digital Library

Google Scholar

[6]

A. Darte, G. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, Laboratoire de l'Informatique du Parall~lisme, November 1996.]]

Google Scholar

[7]

P. Feautrier. Some et~cient solutions to the aftine scheduling problem, part I, one dimensional time. International Journal of Parallel Processing, 21(5):313-348, October 1992.]]

Digital Library

Google Scholar

[8]

P. Fe~utrier. Some et~cient solutions to the aftlne scheduling problem, part ii, multidimensional time. International Journal of Parallel Processing, 21(6), December 1992.]]

Digital Library

Google Scholar

[9]

P. Feautrier. Towards automatic distribution. Technical Report 92.95, Institut Blaise Pascal/Laboratoire MASI, December 1992.]]

Google Scholar

[10]

W. Kelly and W. Pugh. Minimizing communication while preserving parallelism. In Proceedings of the 1996 A CM International Conference on Supercornputing, pages 52-60, May 1996.]]

Digital Library

Google Scholar

[11]

D. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 A CM International Conference on Supercomputing, pages 323-334, July t992.]]

Digital Library

Google Scholar

[12]

I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking, in Proceedings of the A CM SIG- PLAN '97 Conference on Programming Language Design and implementation, pages 346-357, June 1997.]]

Digital Library

Google Scholar

[13]

D. J. Kuck, R. H. Kuhn, D. Padua, B. Leo. sure, and M. Wolfe. Dependence graphs and compiler optimizations. In Con/erence Record o! the Eighth A CM Symposium on Principles of Programming Languages, pages 207-218, January 1981.]]

Digital Library

Google Scholar

[14]

A. W. Lira and M. S. Lain. Maximizing parallelism and minimizing synchronization with affine transforms. In Conference Record of the ~.tth A CM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, january 1997.]]

Digital Library

Google Scholar

[15]

A. W. Lira and M. S. Lain. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445-475, May 1998.]]

Digital Library

Google Scholar

[16]

K. S. McKinley. Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors. In Proceedings of the 199d A CM International Conference on Supercomputing, July 1994.]]

Digital Library

Google Scholar

[17]

D. Padua and M. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1IL84--1201, December 1986.]]

Digital Library

Google Scholar

[18]

A. Schrijver. Theory of Linear and Integer Programming. Wiley, Chichester, 1986.]]

Digital Library

Google Scholar

[19]

K. Smith and B. Appelbe. Determining transformation sequences for loop parallelization. In Proceedings of the Fifth Workshop on Languages and Compilers }or Parallel Computing, August 1992.]]

Digital Library

Google Scholar

[20]

C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the Fifth A CM/SiGPLAN Symposium on Principles and Practice of Parallel Programming, pages 144-155, July 1995.]]

Digital Library

Google Scholar

[21]

University of Maryland. The Omega Library Version 1.1.0 Interface Guide, November 1996.]]

Google Scholar

[22]

D. Whitfield and M. L. Sofia. Investigating properties of code transformations. In Proceedings of the 1993 International Conference on Parallel Processing. ACM, August 1993.]]

Digital Library

Google Scholar

[23]

M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.]]

Digital Library

Google Scholar

[24]

M. E. Wolf and M. S. Lain. A loop transformation theory and an algorithm to maximize parallelism. Transactions on Parallel and Distributed Systems, 2(4):452- 470, October 1991.]]

Digital Library

Google Scholar

[25]

M. J. Wolfe. Optimizing $upercompilers for Supercomputers. MIT Press, Cambridge, MA, 1989.]]

Digital Library

Google Scholar

Cited By

View all

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Błaszyński PBielecki W(2023)High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled CodeEng10.3390/eng40100304:1(507-525)Online publication date: 3-Feb-2023
https://doi.org/10.3390/eng4010030
Kandemir MTang XKotra JKarakoy MMitra TYoung EXiong J(2022)Fine-Granular Computation and Data Layout Reorganization for Improving LocalityProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549386(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549386
Show More Cited By

Index Terms

An affine partitioning algorithm to maximize parallelism and minimize communication

Recommendations

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Intra-GPU synchronization is a problem for GPU controlled communication.Options, based on dynamic parallelism provide on-device synchronization.GPU controlled communication have a lower performance than CPU assisted approaches.Relieving the CPU from the ...
Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication
E2SC '14: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing

GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only ...
Partitioning streaming parallelism for multi-cores: a machine learning based approach
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICS '99: Proceedings of the 13th international conference on Supercomputing

June 1999

509 pages

ISBN:158113164X

DOI:10.1145/305138

Chairmen:
Theodore Papatheodorou
Univ. of Patras, Patras, Greece
,
Mateo Valero
Univ. Politècnica de Catalunya, Barcelona, Spain
,
Constantine D. Polychronopoulos
Univ. of Illinois
,
Yoichi Muraoka
Waseda Univ.
,
Jesus Labarta
Univ. Politècnica de Catalunya, Barcelona, Spain

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1999

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS99

Sponsor:

SIGARCH

ICS99: The 13th ACM International Conference on Supercomputing

June 20 - 25, 1999

Rhodes, Greece

Acceptance Rates

ICS '99 Paper Acceptance Rate 57 of 180 submissions, 32%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

131
Total Citations
View Citations
897
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)17

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Błaszyński PBielecki W(2023)High-Performance Computation of the Number of Nested RNA Structures with 3D Parallel Tiled CodeEng10.3390/eng40100304:1(507-525)Online publication date: 3-Feb-2023
https://doi.org/10.3390/eng4010030
Kandemir MTang XKotra JKarakoy MMitra TYoung EXiong J(2022)Fine-Granular Computation and Data Layout Reorganization for Improving LocalityProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549386(1-9)Online publication date: 30-Oct-2022
https://dl.acm.org/doi/10.1145/3508352.3549386
Bielecki WPalkowski M(2021)Space-Time Loop Tiling for Dynamic Programming CodesElectronics10.3390/electronics1018223310:18(2233)Online publication date: 12-Sep-2021
https://doi.org/10.3390/electronics10182233
Bielecki WBłaszyński P(2021)Parallel Tiled Code for Computing General Linear Recurrence EquationsElectronics10.3390/electronics1017205010:17(2050)Online publication date: 25-Aug-2021
https://doi.org/10.3390/electronics10172050
Штейнберг БШтейнберг О(2021)Program transformations as the base for optimizing parallelizing compilersProgram Systems: Theory and ApplicationsПрограммные системы: теория и приложения10.25209/2079-3316-2021-12-1-21-11312:1(21-113)Online publication date: 2021
https://doi.org/10.25209/2079-3316-2021-12-1-21-113
Kong M(2021)On the Impact of Affine Loop Transformations in Qubit AllocationACM Transactions on Quantum Computing10.1145/34654092:3(1-40)Online publication date: 30-Sep-2021
https://dl.acm.org/doi/10.1145/3465409
Tang XKandemir MKarakoy M(2021)Mix and Match: Reorganizing Tasks for Enhancing Data LocalityProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34600875:2(1-24)Online publication date: 4-Jun-2021
https://dl.acm.org/doi/10.1145/3460087
Kandemir MTang XZhao HRyoo JKarakoy MFreund SYahav E(2021)Distance-in-time versus distance-in-spaceProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454069(665-680)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3454069
Abdelaal KKong MZhou HMoreira JMueller FEtsion Y(2021)Tile size selection of affine programs for GPGPUs using polyhedral cross-compilationProceedings of the 35th ACM International Conference on Supercomputing10.1145/3447818.3460369(13-26)Online publication date: 3-Jun-2021
https://dl.acm.org/doi/10.1145/3447818.3460369
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication

Partitioning streaming parallelism for multi-cores: a machine learning based approach