Article

Free access

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Authors:

Lawrence Rauchwerger,

David PaduaAuthors Info & Claims

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

Pages 218 - 232

https://doi.org/10.1145/207110.207148

Published: 01 June 1995 Publication History

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is re-executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching; it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.

References

[1]

S. Abraham. Private communication, 1994.

[2]

J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the i Oth ACM Symposium on Principles of Programming Languages, pages 177-189,January 1983.

Digital Library

[3]

Allxant Computer 5gystems Corporation. ~"w~/Serles Architecture Manual, 1986.

[4]

Alhant Computers Systems Corporation. Alliant FX/2800 Series System Descrtption, 1991.

[5]

R. Ballance, A. Maccabe, and K. Ottenstem. The Program Dependence Web: a Representation Supporting Control- Data- and Demand- Driven Interpretation of Imperative Languages. tn Proceedings of the SIGPLAN'90 Conference on Programming Language Design and Implementation, pages 257-271, June 1990.

Digital Library

[6]

U. Banerjee. Dependence Analysis for Supercomputing. Kluwer. Boston, MA., 1988.

Digital Library

[7]

M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clemenu, S. Chin, D. Schneider, G. Fox, P. Messina, D. WaLker, C. Hsiung, J. Schwarzmemr, K. Lue, S. Orzag, E Seidl, O. Johnson, G. Swanson, R. Goodmm, and J. Martin. The PER- FECT club benchmarks: Effective performance evaluation of supercomputers. Technical Report CSRD-827, Center for Supercomputmg Research and Development, University of Illinois, Urbana, IL, May 1989.

[8]

H. Berryman and J. Saltz. A manual for PARTI runtime primluves. Interim Report 90-13, ICASE, 1990.

[9]

W. Blume and R. Elgenmann. Performance Analysis of Paralletizing Compilers on the Perfect BenchmarksTM Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643-656, November 1992.

Digital Library

[10]

M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallehsm. Journal of Supercomputing, pages 71-88, 1989.

[11]

W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm ACM, 37(4):31-41, April 1994.

Digital Library

[12]

D. K. Chen, P. C. Yew, and J. Torrellas. An efficient algorithm for the mn4ime parallelization of doacross loops. In Proceedings of Supercomputing 1994, pages 518-527, Nov. 1994.

Digital Library

[13]

A. Dinning and E. Schonberg. An empirical comparison of monitoring algorithms for access anomaly detection. In Proc. of2-nd ACM SIG. PLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 1-10, 1990.

Digital Library

[14]

R. Elgenmann, J. Hoeflinger, Z. Li, and D. Padua. Experience m the Automatic Paratlelizafion of Four Perfect-Benchmark Programs. Lecture Notes in Computer Science 589. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, pages 65-83, August 1991.

Digital Library

[15]

V. Krothapalli and P. Sadayappan. An approach to synchromzation of parallel computing. In Proceedings of the 1988 International Conference on Supercomputmg, pages 573-581, June 1988.

Digital Library

[16]

C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1985 Internattonal Conference on Parallel Processing, August 1985.

[17]

C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 869-876, August 1986.

[18]

D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Woffe. Dependence graphs and compiler opumlzations. In Proceedings of the 8 th ACM Symposium on Principles of Programming Languages, pages 207-218, January 1981.

Digital Library

[19]

E Thomson Lmghton. Introduction to Parallel Atsorithrt~ and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.

Digital Library

[20]

S. Leung and J. Zahorjan. Improving the performance of mnurne parallelization. In 4th PPOPP, pages 83-91, May 1993.

Digital Library

[21]

Zhlyuan LL Array privatization for parallel execution of loops. In Proceedings of the } 9th International Symposium on Computer Architecture, pages 313-322,1992.

Digital Library

[22]

D. E. Maydan, S. P. Amarasmghe, and M. S. Lain. Data dependence and data-flow analysis of arrays. In Proceedings 5th Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.

Digital Library

[23]

S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12): 1485-1495, 1987.

Digital Library

[24]

I. Nudler and L. Rudolph. Tools for the efficient developementof efficient parallel programs. In Proc. 1st Israeli Conference on Computer System Engineering, 1988.

[25]

D.A. Padua and M. J. Wolfe. Advanced compiler opumizations for supercomputers. Communications of the ACM, 29:1184-1201, December 1986.

Digital Library

[26]

L. Rauchwerger and D. Padua. The privatizing doall test: A runtime technique for doall loop identification and array privatization. In Proceedingsof the 1994 International Conference on Supercomputing, pages 33-43, July t994.

Digital Library

[27]

Lawrence Rauchwerger and David A. Padua. Parallelizmg WHILE Loops for Multiproces sorSystems. In Proceedings of9th lnternattonat Parallel Processing Symposium, April 1995.

Digital Library

[28]

J. Sakz and R. Mirchandaney. The preprocessed doacross loop. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processmg, pages 174-178. CRC Press, Inc., 1991. Vol. II - Software.

[29]

J. Saltz, R. Mlrchandaney, and K. Crowley. The doconsider loop. In Proceedings of the 1989 International Conference on Supercomputing, pages 29-40, June 1989.

Digital Library

[30]

J. Saltz, R. Mirchandaney, and K. Crowley. Run-t/me parallelizafion and scheduling of loops, tEEE Trans. Comput, 40(5), May 199 i.

Digital Library

[31]

E. Schonberg. On-the-fly detection of access anomalies. In Proceedings of the SIGPLAN 1989 Conference on Programming Language Design and Implementation, pages 285-297, Portland, Oregon, 1989.

Digital Library

[32]

P. Tu and D. Padua. Array privatizafion for shared and distributed memory machines. In Proceedings 2nd Workshop on Languages, Compilers, and Run-Time Envtronments for Distributed Memory Machines, September 1992.

[33]

P. Tu and D. Padua. Automatic array privafization. In Proceedings 6th Annual Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.

Digital Library

[34]

Penn Tu and David Padua. GSA based demand-driven symbolic analysis. Technical Report t339, University of Illinois at Urbana- Champaign, Cntr for Supercomputing Res & Dev, February 1994.

[35]

A. Vtadimlrescu. LSI Circuit Stmulation on Vector Computers. PhD thesis, Electronics Research Laboratory, University of California, Berkeley, October t982. Technical Rept. No. UCB/ERL M82/75.

Digital Library

[36]

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.

Digital Library

[37]

J. Wu, J. Saltz, S. Hlranandam, and H. Berryman. Runtime compilauon methods for multicomputers. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processing, pages 26-30. CRC Press, Inc., 1991. Vol. II - Software.

[38]

C. Zhu and P. C. Yew. A scheme to enforce data dependence on large multiproces sor systems. IEEE Trans. Sofiw. Eng., 13 (6):726-739, June 1987.

Digital Library

[39]

H. Zima. SupercompiIers for Parallel and Vector Computers. ACM Press, New York, New York, 1991.

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Vasiladiotis CLozano RCole MFranke B(2021)Loop Parallelization using Dynamic Commutativity Analysis2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
https://doi.org/10.1109/CGO51591.2021.9370319
Ying VJeffrey MSanchez DMartínez JDuato JEeckhout L(2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00024
Show More Cited By

Index Terms

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Recommendations

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel ...
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as ...
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

June 1995

335 pages

ISBN:0897916972

DOI:10.1145/207110

Chairman:
David W. Wall
Digital—Western Research Laboratory

ACM SIGPLAN Notices Volume 30, Issue 6
June 1995
327 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/223428
Editors:
Brent T. Hailpern
IBM T. J. Watson Research Center, Yorktown Heoghts, NY
,
Richard L. Wexelblat
IRS IS:AO, Washington, DC
Issue’s Table of Contents

Copyright © 1995 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1995

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

PLDI95

Sponsor:

SIGPLAN

PLDI95: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 18 - 21, 1995

California, La Jolla, USA

Acceptance Rates

PLDI '95 Paper Acceptance Rate 28 of 105 submissions, 27%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

201
Total Citations
View Citations
1,004
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)24

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abdi JPosluns GZhang GWang BJeffrey MAgrawal KPetrank E(2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659966
Vasiladiotis CLozano RCole MFranke B(2021)Loop Parallelization using Dynamic Commutativity Analysis2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
https://doi.org/10.1109/CGO51591.2021.9370319
Ying VJeffrey MSanchez DMartínez JDuato JEeckhout L(2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00024
Faes MGross T(2018)Concurrency-aware object-oriented programming with rolesProceedings of the ACM on Programming Languages10.1145/32765002:OOPSLA(1-30)Online publication date: 24-Oct-2018
https://dl.acm.org/doi/10.1145/3276500
Strout MHall MOlschanowsky C(2018)The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor CodeProceedings of the IEEE10.1109/JPROC.2018.2857721106:11(1921-1934)Online publication date: Nov-2018
https://doi.org/10.1109/JPROC.2018.2857721
Qiu JZhao ZWu BVishnu ASong SGropp WBeckman PLi ZCazorla F(2017)Enabling scalability-sensitive speculative parallelization for FSM computationsProceedings of the International Conference on Supercomputing10.1145/3079079.3079082(1-10)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3079079.3079082
Parihar RHuang M(2017)DRUT: An Efficient Turbo Boost Solution via Load Balancing in Decoupled Look-Ahead Architecture2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.35(91-104)Online publication date: Sep-2017
https://doi.org/10.1109/PACT.2017.35
Estebanez ALlanos DGonzalez-Escribano A(2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0421-x
Ahmad KVenkat AHall M(2017)Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in ActionLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_17(218-232)Online publication date: 24-Jan-2017
https://doi.org/10.1007/978-3-319-52709-3_17
Zhang HVenkat AHall M(2016)Compiler transformation to generate hybrid sparse computationsProceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms10.5555/3018843.3018849(34-41)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3018843.3018849
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents