Abstract
In this paper we present a new loop transformation technique called Computation Decomposition and Alignment (CDA). Computation Decomposition first decomposes the iteration space into finer computation spaces. Computation Alignment subsequently, linearly transforms each computation space independently. CDA is a general framework in that linear transformations and its recent extensions are just special cases of CDA. CDA’s fine grained loop restructuring can incur considerable computational effort, but can exploit optimization opportunities that earlier frameworks cannot. We present four optimization contexts in which CDA can be useful. Our initial experiments demonstrate that CDA adds a new dimension to performance optimization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abraham, S.G., and Hudak, D.E. Compile-time partitioning of iterative parallel loops to reduce cache coherency traffic, IEEE Transactions on Parallel and Distributed Systems, 2(3):318–328, July 91.
Allen, R., Callahan, D., and Kennedy, K. Automatic decomposition of scientific programs for parallel execution, In Conference Record of the 14th Annual ACM Symposium on Principles of Programming Languages, pages 63–76, Munich, West Germany, January 1987.
Ancourt, C. and Irigoin, F. Scanning polyhedra with DO loops, In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, volume 26, pages 39–50, Williamsburg, VA, April 1991.
Anderson, J. and Lam, M. Global optimizations for parallelism and locality on scalable parallel machines, In Proceedings of the ACM SIGPLAN ‘83 Conference on Programming Language Design and Implementation, volume 28, June 1993.
Banerjee, U. Unimodular transformations of double loops, In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Feautrier, P. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20, 1991.
Gilbert, J. and Schreiber, R. Optimal expression evaluation for data parallel architectures, Journal of Parallel and Distributed Computing, 13:58–64, 1991.
Irigoin, F. and Triolet, R. Supernode partitioning, In Conference Record of the 15th Annual ACM Symposium on Principles of Programming Languages, pages 319–329, San Diego, CA, 1988.
Kelly, W. and Pugh, W. A framework for unifying reordering transformations, Technical Report UMIACS-TR-92–126, University of Maryland, 1992.
Kelly, W., Pugh, W., and Rosser, E. Code generation for multiple mappings, Technical Report UMIACS-TR-94–87, University of Maryland, 1994.
Kulkarni, D. and Stumm, M. Computational alignment: A new, unified pro-gram transformation for local and global optimization, Technical Report CSRI-292, Computer Systems Research Institute, University of Toronto, January 1994. http://www.eecg.toronto.edu/EECG/RESEARCH/ParallelSys
Kulkarni, D., Stumm, M., Unrau, R., and Li, W. A generalized the-ory of linear loop transformations, Technical Report CSRI-317, Com-puter Systems Research Institute, University of Toronto, December 1994. http://www.eecg.toronto.edu/EECG/RESEARCH/ParallelSys
Kumar, K.G., Kulkarni, D., and Basu, A. Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time, In Proceedings of the 1992 ACM International Conference on Supercomputing, Washington, July 1992.
Li, C.H. Program wanall. ftp://ftp.cs.rice.edu, Rice University, 1992.
Li, W. and Pingali, K. A singular loop transformation framework based on non-singular matrices, In Proceedings of the Fifth Workshop on Programming Languages and Compilers for Parallel Computing, August 1992.
Mosher, C. Arco Seismic Benchmarks, ARCO E&PT.
NASA, Ames Research Center. NAS Parallel Benchmarks
Padua, D. Multiprocessors: Discussion of some theoretical and practical problems, Phd thesis, University of Illinois, Urbana-Champaign, 1979.
Padua, D. and Wolfe, M. Advanced compiler optimizations for supercomputers, Communications of the ACM, 29(12):1184–1201, December 1986.
Pugh, W. and Wonnacott, D. An exact method for analysis of value-based array data dependences, Technical Report CS-TR-3196, University of Maryland, 1993.
Torres, J., Ayguade, E., Labarta, J., and Valero, M. Align and distribute-based linear loop transformations, In Proceedings of Sixth Workshop on Programming Languages and Compilers for Parallel Computing, 1993.
Wolf, M. and Lam, M. An algorithmic approach to compound loop transformation, In Proceedings of Third Workshop on Programming Languages and Compilers for Parallel Computing, Irvine, CA, August 1990.
Wolfe, M. Optimizing supercompilers for supercomputers. The MIT Press, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kulkarni, D., Stumm, M. (1996). CDA Loop Transformations. In: Szymanski, B.K., Sinharoy, B. (eds) Languages, Compilers and Run-Time Systems for Scalable Computers. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-2315-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4615-2315-4_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5979-1
Online ISBN: 978-1-4615-2315-4
eBook Packages: Springer Book Archive