Nothing Special   »   [go: up one dir, main page]

Skip to main content

Abstract

In this paper we present a new loop transformation technique called Computation Decomposition and Alignment (CDA). Computation Decomposition first decomposes the iteration space into finer computation spaces. Computation Alignment subsequently, linearly transforms each computation space independently. CDA is a general framework in that linear transformations and its recent extensions are just special cases of CDA. CDA’s fine grained loop restructuring can incur considerable computational effort, but can exploit optimization opportunities that earlier frameworks cannot. We present four optimization contexts in which CDA can be useful. Our initial experiments demonstrate that CDA adds a new dimension to performance optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abraham, S.G., and Hudak, D.E. Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic, IEEE Transactions on Parallel and Distributed Systems, 2(3):318–328, July 91.

    Article  Google Scholar 

  2. Allen, R., Callahan, D., and Kennedy, K. Automatic decomposition of scientific programs for parallel execution, In Conference Record of the 14th Annual ACM Symposium on Principles of Programming Languages, pages 63–76, Munich, West Germany, January 1987.

    Google Scholar 

  3. Ancourt, C. and Irigoin, F. Scanning polyhedra with DO loops, In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, volume 26, pages 39–50, Williamsburg, VA, April 1991.

    Chapter  Google Scholar 

  4. Anderson, J. and Lam, M. Global optimizations for parallelism and locality on scalable parallel machines, In Proceedings of the ACM SIGPLAN ‘83 Conference on Programming Language Design and Implementation, volume 28, June 1993.

    Google Scholar 

  5. Banerjee, U. Unimodular transformations of double loops, In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.

    Google Scholar 

  6. Feautrier, P. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20, 1991.

    Google Scholar 

  7. Gilbert, J. and Schreiber, R. Optimal expression evaluation for data parallel architectures, Journal of Parallel and Distributed Computing, 13:58–64, 1991.

    Article  Google Scholar 

  8. Irigoin, F. and Triolet, R. Supernode partitioning, In Conference Record of the 15th Annual ACM Symposium on Principles of Programming Languages, pages 319–329, San Diego, CA, 1988.

    Google Scholar 

  9. Kelly, W. and Pugh, W. A framework for unifying reordering transformations, Technical Report UMIACS-TR-92–126, University of Maryland, 1992.

    Google Scholar 

  10. Kelly, W., Pugh, W., and Rosser, E. Code generation for multiple mappings, Technical Report UMIACS-TR-94–87, University of Maryland, 1994.

    Google Scholar 

  11. Kulkarni, D. and Stumm, M. Computational alignment: A new, unified pro-gram transformation for local and global optimization, Technical Report CSRI-292, Computer Systems Research Institute, University of Toronto, January 1994. http://www.eecg.toronto.edu/EECG/RESEARCH/ParallelSys

  12. Kulkarni, D., Stumm, M., Unrau, R., and Li, W. A generalized the-ory of linear loop transformations, Technical Report CSRI-317, Com-puter Systems Research Institute, University of Toronto, December 1994. http://www.eecg.toronto.edu/EECG/RESEARCH/ParallelSys

  13. Kumar, K.G., Kulkarni, D., and Basu, A. Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time, In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, July 1992.

    Google Scholar 

  14. Li, C.H. Program wanall. ftp://ftp.cs.rice.edu, Rice University, 1992.

  15. Li, W. and Pingali, K. A singular loop transformation framework based on non-singular matrices, In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.

    Google Scholar 

  16. Mosher, C. Arco Seismic Benchmarks, ARCO E&PT.

    Google Scholar 

  17. NASA, Ames Research Center. NAS Parallel Benchmarks

    Google Scholar 

  18. Padua, D. Multiprocessors: Discussion of some theoretical and practical problems, Phd thesis, University of Illinois, Urbana-Champaign, 1979.

    Google Scholar 

  19. Padua, D. and Wolfe, M. Advanced compiler optimizations for supercomputers, Communications of the ACM, 29(12):1184–1201, December 1986.

    Article  Google Scholar 

  20. Pugh, W. and Wonnacott, D. An exact method for analysis of value-based array data dependences, Technical Report CS-TR-3196, University of Maryland, 1993.

    Google Scholar 

  21. Torres, J., Ayguade, E., Labarta, J., and Valero, M. Align and distribute-based linear loop transformations, In Proceedings of Sixth Workshop on Programming Languages and Compilers for Parallel Computing, 1993.

    Google Scholar 

  22. Wolf, M. and Lam, M. An algorithmic approach to compound loop transformation, In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.

    Google Scholar 

  23. Wolfe, M. Optimizing supercompilers for supercomputers. The MIT Press, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kulkarni, D., Stumm, M. (1996). CDA Loop Transformations. In: Szymanski, B.K., Sinharoy, B. (eds) Languages, Compilers and Run-Time Systems for Scalable Computers. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2315-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-2315-4_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5979-1

  • Online ISBN: 978-1-4615-2315-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics