Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Transformations techniques for extracting parallelism in non-uniform nested loops

Published: 01 September 2008 Publication History

Abstract

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the performance by reducing the time needed for executing the program. Parallelizing and partitioning of nested loops requires efficient iteration dependence analysis. Although many loop transformations techniques exist for nested loop partitioning, most of these transformation techniques perform poorly when parallelizing nested loops with non-uniform (irregular) dependences. In this paper the affine and unimodular transformations are applied to solve the problem of parallelism in nested loops with non-uniform dependence vectors. To solve these problem few researchers converted the non-uniform nested loops to uniform nested loops and then find the parallelism. We propose applying directly the two approaches affine and unimodular transformations to extract and improve the parallelism in nested loops with non-uniform dependences. The study shows that unimodular transformation is better than affine transformation when the dependences in nested loops exist only in one statement. While affine transformation is more effective when the nested loops have a sequence of statements and the dependence exists between these different statements.

References

[1]
A. Schrijver, Theory of Linear and Integer Programming, Wiley, Chichester, 1986.
[2]
A. W. Lim, G. I. Cheong, and M. S. Lam, An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication, In Proceeding of the 13th ACM SIGARCH international Conference on Supercomputing, Rhodes, Greece, June 1999.
[3]
A. W. Lim and M. S. Lam, Cache Optimizations with Affine Partitioning, In Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, March 2001.
[4]
A. W. Lim and M. S. Lam, Maximizing Parallelism and Minimizing Synchronization with affine partitions, Parallel Computing vol. 24, no. 3-4, pp. 445-475, May 1998.
[5]
A. W. Lim and M. S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, In Conference Record of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Paris, France, January 1997.
[6]
C. Bastoul, Improving Data Locality in Static Control Programs, PhD thesis, Univ. Paris 6, Pierre et Marie Curie, December 2004.
[7]
C. Gong, R. Melhem and R. Gupta, Loop Transformations for Fault Detection in Regular Loops on Massively Parallel Systems, IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 12, pp. 1238-1249, December 1996.
[8]
C. T. Yang, S. S. Tseng, S. H. Kao, M. H. Hsieh and M. F. Jiang, Run-time Parallelization for Partially Parallel Loops, Proceedings of the 1997 International Conference on Parallel and Distributed Systems, pp. 308-313, 1997.
[9]
C. Xu, Effects of Parallelism Degree on Run-Time Parallelization of Loops, Proceedings of the 31st Hawaii International Conference on System Sciences (HICSS'98).
[10]
D. K. Chen and P. C. Yew, On Effective Execution of Non-Uniform DOACROSS Loops, IEEE Transactions on Parallel and Distributed System, vol. 7, pp. 463-476, May 1996.
[11]
D. L. Pean and C. Chen, An Optimized Three Region Partitioning Technique to Maximize Parallelism of Nested Loops with Non-uniform Dependences, Journal of Information Science and Engineering, vol. 17, pp. 463-489, 2001.
[12]
D. R. Chesney and B. H. Cheng, Generalizing the Unimodular Approach, Proceedings of the 1994 International Conference on Parallel and Distributed Systems, pp. 398-403, 1994.
[13]
E. D'Hollander, Partitioning and Labeling of Loops by Unimodular Transformations, IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 4, pp. 465-476, July 1992.
[14]
G. Goumas, M. Athanasaki and N. Koziris, An Efficient Code Generation Technique for Tiled Iteration Spaces, IEEE Tran. On Parallel and Distrib. Syst., vol. 14, no. 10, pp. 1021-1034, 2003.
[15]
J. Xue, Unimodular Transformations of Non-Perfectly Nested Loops, Parallel Computing, vol. 22, no. 12, pp. 1621-1645, 1997.
[16]
K. Psarris, X. Kong and D. Klappholz, The Direction Vector I Test, IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 11, pp. 1280-1290, November 1993.
[17]
L. N. Pouchet, When Iterative Optimization Meets the Polyhedral Model: One-dimensional Date, Master thesis (Univ. of Paris-Sud XI), Orsay, France, October 2006.
[18]
M. Rim and R. Jain, Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications, IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 4, April 1996.
[19]
M. wolf and M. Lam, A Loop Transformation Theory and an Algorithm to Maximize Parallelism, IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, pp. 452-471, Oct. 1991.
[20]
M. Wolfe, High Performance Compilers for Parallel Computing, California, Addison-Wesley publishing Company, Inc, 1996.
[21]
M. wolfe and C. Tseng, The Power Test for Data Dependence, IEEE Transactions on Parallel and Distributed Systems, vol. 3, no. 5, pp. 591-601, September 1992.
[22]
N. Likhoded, S. Bakhanovich and A. Zherelo, Obtaining Affine Transformations to Improve Locality of Loop Nests, Programming and Computer Software, vol. 31, no. 5, pp. 270-281, 2005.
[23]
N. Manjikian and T. S. Abdelrahman, Fusion of Loops for Parallelism and Locality, IEEE Transactions on Parallel and Distributed Systems, vol. 8, no. 2, pp. 193-209, February 1997.
[24]
P. M. Petersen and D. A. Padua, Static and Dynamic Evaluation of Data Dependence Analysis Techniques, IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 11, pp. 1121-1132, November 1996.
[25]
Q. Yi and K. Kennedy, Transforming Complex Loop Nests for Locality, Technical Report TR02-386, Computer Science Dep., Rice Univ., Feb. 2002.
[26]
S. Punyamurtula, V. Chaudhary, J. Ju and S. Roy, Compile Time Partitioning of Nested Loop Iteration Spaces with Non-uniform Dependences, Journal of Parallel Algorithms and Applications, vol. 12, no. 1-3, pp. 113-141, 1997.
[27]
T. Tzen and Lionel M. Ni, Dependence Uniformization: A Loop Parallelization Technique, IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 5, pp. 547-558, May 1993.
[28]
U. Banerjee, An Introduction to a Formal Theory of Dependence Analysis, Journal of Supercomputing, 2, pp. 133-149, 1988.
[29]
U. Banerjee, Loop Transformations for Restructuring Compilers: The Foundations, Boston: Kluwer Academic Publishers, 1993.
[30]
U. Banerjee, Loop Parallelization, Boston: Kluwer Academic Publishers, 1994.
[31]
U. Banerjee, Dependence Analysis, Boston: Kluwer Academic Publishers, 1997.
[32]
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev and P. Sadayappan, Affine Transformations for Communication Minimal Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences, Technical Report OSU-CISRC-5/07-TR43, Computer Science and Engineering Dep., Ohio State Univ., May 2007.
[33]
V. Maslov, Lazy array data flow dependence analysis, Conference record of the annual ACM symposium on principles of programming languages, pp. 311-325 1994.
[34]
W. Pugh, A Practical Algorithm for Exact Array Dependence Analysis, Communications of The ACM, vol. 35, no. 8, pp. 102-114, August 1992.
[35]
X. Kong, D. Klappholz and K. Psarris, The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization, IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 342-349, July 1991.
[36]
Z. Li, P. Yew and C. Zhu, An Efficient Data Dependence Analysis for Parallelizing Compilers, IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 1, pp. 26-34, January 1990.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image WSEAS Transactions on Computers
WSEAS Transactions on Computers  Volume 7, Issue 9
September 2008
139 pages

Publisher

World Scientific and Engineering Academy and Society (WSEAS)

Stevens Point, Wisconsin, United States

Publication History

Published: 01 September 2008
Revised: 10 August 2008
Received: 05 April 2008

Author Tags

  1. affine transformation
  2. distance matrix
  3. distance vector
  4. nonuniform dependence
  5. parallelism
  6. uniform dependence
  7. unimodular transformation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media