Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/207110.207148acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free access

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

Published: 01 June 1995 Publication History

Abstract

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross-iteration dependences; if the test fails, then the loop is re-executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run-time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching; it detects at run-time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.

References

[1]
S. Abraham. Private communication, 1994.
[2]
J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the i Oth ACM Symposium on Principles of Programming Languages, pages 177-189,January 1983.
[3]
Allxant Computer 5gystems Corporation. ~"w~/Serles Architecture Manual, 1986.
[4]
Alhant Computers Systems Corporation. Alliant FX/2800 Series System Descrtption, 1991.
[5]
R. Ballance, A. Maccabe, and K. Ottenstem. The Program Dependence Web: a Representation Supporting Control- Data- and Demand- Driven Interpretation of Imperative Languages. tn Proceedings of the SIGPLAN'90 Conference on Programming Language Design and Implementation, pages 257-271, June 1990.
[6]
U. Banerjee. Dependence Analysis for Supercomputing. Kluwer. Boston, MA., 1988.
[7]
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clemenu, S. Chin, D. Schneider, G. Fox, P. Messina, D. WaLker, C. Hsiung, J. Schwarzmemr, K. Lue, S. Orzag, E Seidl, O. Johnson, G. Swanson, R. Goodmm, and J. Martin. The PER- FECT club benchmarks: Effective performance evaluation of supercomputers. Technical Report CSRD-827, Center for Supercomputmg Research and Development, University of Illinois, Urbana, IL, May 1989.
[8]
H. Berryman and J. Saltz. A manual for PARTI runtime primluves. Interim Report 90-13, ICASE, 1990.
[9]
W. Blume and R. Elgenmann. Performance Analysis of Paralletizing Compilers on the Perfect BenchmarksTM Programs. IEEE Transactions on Parallel and Distributed Systems, 3(6):643-656, November 1992.
[10]
M. Burke, R. Cytron, J. Ferrante, and W. Hsieh. Automatic generation of nested, fork-join parallehsm. Journal of Supercomputing, pages 71-88, 1989.
[11]
W. J. Camp, S. J. Plimpton, B. A. Hendrickson, and R. W. Leland. Massively parallel methods for engineering and science problems. Comm ACM, 37(4):31-41, April 1994.
[12]
D. K. Chen, P. C. Yew, and J. Torrellas. An efficient algorithm for the mn4ime parallelization of doacross loops. In Proceedings of Supercomputing 1994, pages 518-527, Nov. 1994.
[13]
A. Dinning and E. Schonberg. An empirical comparison of monitoring algorithms for access anomaly detection. In Proc. of2-nd ACM SIG. PLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 1-10, 1990.
[14]
R. Elgenmann, J. Hoeflinger, Z. Li, and D. Padua. Experience m the Automatic Paratlelizafion of Four Perfect-Benchmark Programs. Lecture Notes in Computer Science 589. Proceedings of the Fourth Workshop on Languages and Compilers for Parallel Computing, Santa Clara, CA, pages 65-83, August 1991.
[15]
V. Krothapalli and P. Sadayappan. An approach to synchromzation of parallel computing. In Proceedings of the 1988 International Conference on Supercomputmg, pages 573-581, June 1988.
[16]
C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1985 Internattonal Conference on Parallel Processing, August 1985.
[17]
C. Kruskal. Efficient parallel algorithms for graph problems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 869-876, August 1986.
[18]
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Woffe. Dependence graphs and compiler opumlzations. In Proceedings of the 8 th ACM Symposium on Principles of Programming Languages, pages 207-218, January 1981.
[19]
E Thomson Lmghton. Introduction to Parallel Atsorithrt~ and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992.
[20]
S. Leung and J. Zahorjan. Improving the performance of mnurne parallelization. In 4th PPOPP, pages 83-91, May 1993.
[21]
Zhlyuan LL Array privatization for parallel execution of loops. In Proceedings of the } 9th International Symposium on Computer Architecture, pages 313-322,1992.
[22]
D. E. Maydan, S. P. Amarasmghe, and M. S. Lain. Data dependence and data-flow analysis of arrays. In Proceedings 5th Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.
[23]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12): 1485-1495, 1987.
[24]
I. Nudler and L. Rudolph. Tools for the efficient developementof efficient parallel programs. In Proc. 1st Israeli Conference on Computer System Engineering, 1988.
[25]
D.A. Padua and M. J. Wolfe. Advanced compiler opumizations for supercomputers. Communications of the ACM, 29:1184-1201, December 1986.
[26]
L. Rauchwerger and D. Padua. The privatizing doall test: A runtime technique for doall loop identification and array privatization. In Proceedingsof the 1994 International Conference on Supercomputing, pages 33-43, July t994.
[27]
Lawrence Rauchwerger and David A. Padua. Parallelizmg WHILE Loops for Multiproces sorSystems. In Proceedings of9th lnternattonat Parallel Processing Symposium, April 1995.
[28]
J. Sakz and R. Mirchandaney. The preprocessed doacross loop. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processmg, pages 174-178. CRC Press, Inc., 1991. Vol. II - Software.
[29]
J. Saltz, R. Mlrchandaney, and K. Crowley. The doconsider loop. In Proceedings of the 1989 International Conference on Supercomputing, pages 29-40, June 1989.
[30]
J. Saltz, R. Mirchandaney, and K. Crowley. Run-t/me parallelizafion and scheduling of loops, tEEE Trans. Comput, 40(5), May 199 i.
[31]
E. Schonberg. On-the-fly detection of access anomalies. In Proceedings of the SIGPLAN 1989 Conference on Programming Language Design and Implementation, pages 285-297, Portland, Oregon, 1989.
[32]
P. Tu and D. Padua. Array privatizafion for shared and distributed memory machines. In Proceedings 2nd Workshop on Languages, Compilers, and Run-Time Envtronments for Distributed Memory Machines, September 1992.
[33]
P. Tu and D. Padua. Automatic array privafization. In Proceedings 6th Annual Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
[34]
Penn Tu and David Padua. GSA based demand-driven symbolic analysis. Technical Report t339, University of Illinois at Urbana- Champaign, Cntr for Supercomputing Res & Dev, February 1994.
[35]
A. Vtadimlrescu. LSI Circuit Stmulation on Vector Computers. PhD thesis, Electronics Research Laboratory, University of California, Berkeley, October t982. Technical Rept. No. UCB/ERL M82/75.
[36]
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
[37]
J. Wu, J. Saltz, S. Hlranandam, and H. Berryman. Runtime compilauon methods for multicomputers. In Dr. H.D. Schwetman, editor, Proceedings of the 1991 International Conference on Parallel Processing, pages 26-30. CRC Press, Inc., 1991. Vol. II - Software.
[38]
C. Zhu and P. C. Yew. A scheme to enforce data dependence on large multiproces sor systems. IEEE Trans. Sofiw. Eng., 13 (6):726-739, June 1987.
[39]
H. Zima. SupercompiIers for Parallel and Vector Computers. ACM Press, New York, New York, 1991.

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2021)Loop Parallelization using Dynamic Commutativity Analysis2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
June 1995
335 pages
ISBN:0897916972
DOI:10.1145/207110
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

PLDI95
Sponsor:

Acceptance Rates

PLDI '95 Paper Acceptance Rate 28 of 105 submissions, 27%;
Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)24
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)When Is Parallelism Fearless and Zero-Cost with Rust?Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659966(27-40)Online publication date: 17-Jun-2024
  • (2021)Loop Parallelization using Dynamic Commutativity Analysis2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO51591.2021.9370319(150-161)Online publication date: 27-Feb-2021
  • (2020)T4Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00024(159-172)Online publication date: 30-May-2020
  • (2018)Concurrency-aware object-oriented programming with rolesProceedings of the ACM on Programming Languages10.1145/32765002:OOPSLA(1-30)Online publication date: 24-Oct-2018
  • (2018)The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor CodeProceedings of the IEEE10.1109/JPROC.2018.2857721106:11(1921-1934)Online publication date: Nov-2018
  • (2017)Enabling scalability-sensitive speculative parallelization for FSM computationsProceedings of the International Conference on Supercomputing10.1145/3079079.3079082(1-10)Online publication date: 14-Jun-2017
  • (2017)DRUT: An Efficient Turbo Boost Solution via Load Balancing in Decoupled Look-Ahead Architecture2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.35(91-104)Online publication date: Sep-2017
  • (2017)Using the Xeon Phi Platform to Run Speculatively-Parallelized CodesInternational Journal of Parallel Programming10.1007/s10766-016-0421-x45:2(225-241)Online publication date: 1-Apr-2017
  • (2017)Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in ActionLanguages and Compilers for Parallel Computing10.1007/978-3-319-52709-3_17(218-232)Online publication date: 24-Jan-2017
  • (2016)Compiler transformation to generate hybrid sparse computationsProceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms10.5555/3018843.3018849(34-41)Online publication date: 13-Nov-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media