Abstract
TheUniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies.
A sequential computer's memory is modeled as a sequence 〈M 0,M 1,...〉 of increasingly large memory modules. Computation takes place inM 0. Thus,M 0 might model a computer's central processor, whileM 1 might be cache memory,M 2 main memory, and so on. For each moduleM u, a busB u connects it with the next larger module Mu+1. All buses may be active simultaneously. Data is transferred along a bus in fixed-sized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parametrized by the rate at which the blocksizes increase and by the ratio of the blockcount to the blocksize. A third parameter, the transfer-cost (inverse bandwidth) function, determines the time to transfer blocks at the different levels of the hierarchy.
UMH analysis refines traditional methods of algorithm analysis by including the cost of data movement throughout the memory hierarchy. Thecommunication efficiency of a program is a ratio measuring the portion of UMH running time during which M0 is active. An algorithm that can be implemented by a program whose communication efficiency is nonzero in the limit is said to becommunication- efficient. The communication efficiency of a program depends on the parameters of the UMH model, most importantly on the transfer-cost function. Athreshold function separates those transfer-cost functions for which an algorithm is communication-efficient from those that are too costly. Threshold functions for matrix transpose, standard matrix multiplication, and Fast Fourier Transform algorithms are established by exhibiting communication-efficient programs at the threshold and showing that more expensive transfer-cost functions are too costly.
A parallel computer can be modeled as a tree of memory modules with computation occurring at the leaves. Threshold functions are established for multiplication ofN×N matrices using up to N2 processors in a tree with constant branching factor.
Similar content being viewed by others
References
Agarwal, R. C., and F. G. Gustavson, Vector and Parallel Algorithms for Cholesky Factorization on IBM 3090,Proc. Supercomputing '89, November 1989, pp. 225–233.
Aggarwal, A., B. Alpern, A. K. Chandra, and M. Snir, A Model for Hierarchical Memory,Proc. 19th Symp. on Theory of Computing, May 1987, pp. 305–314.
Aggarwal, A., A. K. Chandra, and M. Snir, Hierarchical Memory with Block Transfer,Proc. 28th Symp. on Foundations of Computer Science, October 1987, pp. 204–216.
Aggarwal, A., and J. Vitter, IO Complexity of Sorting and Related Problems,Comm. ACM, Vol. 31, September 1988, pp. 305–314.
Aho, A., J. Hopcroft, and J. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974.
Alpern, B., L. Carter, and T. Selker, Visualizing Computer Memory Architectures,Proc. IEEE Visualization '90 Conf., October 1990.
Anderson, E., Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen,LAPACK Users' Guide, SIAM, Philadelphia, PA, 1992.
Bailey, D. H., FFTs in External or Hierarchical Memory,J. Supercomput., Vol. 4, 1990, pp. 23–35.
Bakoglu, H. B., and T. Whiteside, RISC System/6000 Hardware Overview,IBM RISC System/6000 Technology, IBM Corp. SA23-2619, 1990, pp. 8–15.
Carlson, D. A., Using Local Memory to Boost the Performance of FFT Algorithms on the CRAY-2 Supercomputer,J. Supercomput., Vol. 4, 1990, pp. 345–356.
Carr, S., and K. Kennedy, Blocking Linear Algebra Codes for Memory Hierarchies,Proc. 4th SIAM Conf. on Parallel Processing for Scientific Computing, December 1989.
Carter, L., The RAM Model and the Performance Programmer, IBM Research Report RC16319, November 1990.
Cheriton, D. R., G. A. Slavenburg, and P. D. Boyle, Software-Controlled Caches in the VMP Multiprocessor,Proc. Internat. Symp. on Computer Architecture, June 1986, pp. 366–374.
Corman, T. H., Fast Permuting on Disk Arrays,Advanced Research in VLSI: Proceedings of the 1992 Brown/MIT Conference, 1992, pp. 58–76.
Ferrante, J., K. Ottenstein and J. Warren, The Program Dependence Graph and Its Use in Optimization,ACM Trans. Program. Languages Systems, Vol. 9, No. 3, July, 1987, pp. 319–349.
Lloyd, R. W., Permuting Information in Idealized Two-Level Storage,Complexity of Computer Computations, Plenum, New York, 1972, pp. 105–109.
Gallivan, K., W. Jalby, U. Meier, and A. H. Sameh, Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design,Internat. J. Supercomput. Appl, Vol. 2, No. 1, Spring 1988, pp. 12–48.
Gannon, D., and W. Jalby, The Influence of Memory Hierarchy on Algorithm Organization: Programming FFTs on a Vector Multiprocessor,The Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, eds., MIT Press, Cambridge, MA, 1987, pp. 277–301.
Hong, J.-W., and H. T. Kung, I/O Complexity: The Red-Blue Pebble Game,Proc. 13th Symp. on Theory of Computing, May 1981, pp. 326–333.
Hoskins, J.,IBM RISC System/6000: A Business Perspective, 2nd edn., Wiley, New York, 1992.
ESSL Guide and Reference, Order number SC23-0184-0, IBM Corporation, 1986.
Irigoin, F., and R. Triolet, Supernode Partitioning,Proc. 15th ACM Symp. on Principles of Programming Languages, January 1988, pp. 319–328.
Lam, T., P. Tiwari, and M. Tompa, Tradeoffs Between Communication and Space,Proc. 21st Symp, on Theory of Computing, May 1989, pp. 217–226.
McKellar, A. C., and E. G. Coffman, Jr., Organizing Matrices and Matrix Operations for Paged Memory Systems,Comm. ACM, Vol. 12, March 1969, pp. 153–165.
Padua, D. A., and M. J. Wolfe, Advanced Compiler Optimizations for Super-computers,Comm. ACM, Vol. 29, December 1986, pp. 1184–1201.
Rutledge, J. D., and H. Rubinstein, Matrix Algebra Programs for the UNIVAC, Presented at theWayne Conference on Automatic Computing Machinery and Applications, March 1951.
Snyder, L., Type Architectures, Shared Memories, and the Corollary of Modest Potential,Annual Rev. Comput. Sci., Vol. 1, 1986, pp. 289–317.
Valiant, L. G., A Bridging Model for Parallel Computation,Comm. ACM, Vol. 33, August 1990, pp. 103–111.
Van Loan, C.,Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, PA, 1992.
Vitter, J. S., and E. A. M. Shriver, Algorithms for Parallel Memory, II: Hierarchical Multilevel Memories,Algorithmica, this issue, pp. 148–169.
Author information
Authors and Affiliations
Additional information
Communicated by Jeffrey Scott Vitter.
Rights and permissions
About this article
Cite this article
Alpern, B., Carter, L., Feig, E. et al. The uniform memory hierarchy model of computation. Algorithmica 12, 72–109 (1994). https://doi.org/10.1007/BF01185206
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01185206