The uniform memory hierarchy model of computation

B. Alpern¹,
L. Carter¹,
E. Feig¹ &
…
T. Selker²

527 Accesses
101 Citations
Explore all metrics

Abstract

TheUniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies.

A sequential computer's memory is modeled as a sequence 〈M ₀,M ₁,...〉 of increasingly large memory modules. Computation takes place inM ₀. Thus,M ₀ might model a computer's central processor, whileM ₁ might be cache memory,M ₂ main memory, and so on. For each moduleM _u, a busB _u connects it with the next larger module M_u+1. All buses may be active simultaneously. Data is transferred along a bus in fixed-sized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parametrized by the rate at which the blocksizes increase and by the ratio of the blockcount to the blocksize. A third parameter, the transfer-cost (inverse bandwidth) function, determines the time to transfer blocks at the different levels of the hierarchy.

UMH analysis refines traditional methods of algorithm analysis by including the cost of data movement throughout the memory hierarchy. Thecommunication efficiency of a program is a ratio measuring the portion of UMH running time during which M₀ is active. An algorithm that can be implemented by a program whose communication efficiency is nonzero in the limit is said to becommunication- efficient. The communication efficiency of a program depends on the parameters of the UMH model, most importantly on the transfer-cost function. Athreshold function separates those transfer-cost functions for which an algorithm is communication-efficient from those that are too costly. Threshold functions for matrix transpose, standard matrix multiplication, and Fast Fourier Transform algorithms are established by exhibiting communication-efficient programs at the threshold and showing that more expensive transfer-cost functions are too costly.

A parallel computer can be modeled as a tree of memory modules with computation occurring at the leaves. Threshold functions are established for multiplication ofN×N matrices using up to N² processors in a tree with constant branching factor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal, R. C., and F. G. Gustavson, Vector and Parallel Algorithms for Cholesky Factorization on IBM 3090,Proc. Supercomputing '89, November 1989, pp. 225–233.
Aggarwal, A., B. Alpern, A. K. Chandra, and M. Snir, A Model for Hierarchical Memory,Proc. 19th Symp. on Theory of Computing, May 1987, pp. 305–314.
Aggarwal, A., A. K. Chandra, and M. Snir, Hierarchical Memory with Block Transfer,Proc. 28th Symp. on Foundations of Computer Science, October 1987, pp. 204–216.
Aggarwal, A., and J. Vitter, IO Complexity of Sorting and Related Problems,Comm. ACM, Vol. 31, September 1988, pp. 305–314.
Article MathSciNet Google Scholar
Aho, A., J. Hopcroft, and J. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974.
MATH Google Scholar
Alpern, B., L. Carter, and T. Selker, Visualizing Computer Memory Architectures,Proc. IEEE Visualization '90 Conf., October 1990.
Anderson, E., Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen,LAPACK Users' Guide, SIAM, Philadelphia, PA, 1992.
MATH Google Scholar
Bailey, D. H., FFTs in External or Hierarchical Memory,J. Supercomput., Vol. 4, 1990, pp. 23–35.
Article Google Scholar
Bakoglu, H. B., and T. Whiteside, RISC System/6000 Hardware Overview,IBM RISC System/6000 Technology, IBM Corp. SA23-2619, 1990, pp. 8–15.
Google Scholar
Carlson, D. A., Using Local Memory to Boost the Performance of FFT Algorithms on the CRAY-2 Supercomputer,J. Supercomput., Vol. 4, 1990, pp. 345–356.
Article Google Scholar
Carr, S., and K. Kennedy, Blocking Linear Algebra Codes for Memory Hierarchies,Proc. 4th SIAM Conf. on Parallel Processing for Scientific Computing, December 1989.
Carter, L., The RAM Model and the Performance Programmer, IBM Research Report RC16319, November 1990.
Cheriton, D. R., G. A. Slavenburg, and P. D. Boyle, Software-Controlled Caches in the VMP Multiprocessor,Proc. Internat. Symp. on Computer Architecture, June 1986, pp. 366–374.
Corman, T. H., Fast Permuting on Disk Arrays,Advanced Research in VLSI: Proceedings of the 1992 Brown/MIT Conference, 1992, pp. 58–76.
Ferrante, J., K. Ottenstein and J. Warren, The Program Dependence Graph and Its Use in Optimization,ACM Trans. Program. Languages Systems, Vol. 9, No. 3, July, 1987, pp. 319–349.
Article MATH Google Scholar
Lloyd, R. W., Permuting Information in Idealized Two-Level Storage,Complexity of Computer Computations, Plenum, New York, 1972, pp. 105–109.
Google Scholar
Gallivan, K., W. Jalby, U. Meier, and A. H. Sameh, Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design,Internat. J. Supercomput. Appl, Vol. 2, No. 1, Spring 1988, pp. 12–48.
Article Google Scholar
Gannon, D., and W. Jalby, The Influence of Memory Hierarchy on Algorithm Organization: Programming FFTs on a Vector Multiprocessor,The Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, eds., MIT Press, Cambridge, MA, 1987, pp. 277–301.
Google Scholar
Hong, J.-W., and H. T. Kung, I/O Complexity: The Red-Blue Pebble Game,Proc. 13th Symp. on Theory of Computing, May 1981, pp. 326–333.
Hoskins, J.,IBM RISC System/6000: A Business Perspective, 2nd edn., Wiley, New York, 1992.
Google Scholar
ESSL Guide and Reference, Order number SC23-0184-0, IBM Corporation, 1986.
Irigoin, F., and R. Triolet, Supernode Partitioning,Proc. 15th ACM Symp. on Principles of Programming Languages, January 1988, pp. 319–328.
Lam, T., P. Tiwari, and M. Tompa, Tradeoffs Between Communication and Space,Proc. 21st Symp, on Theory of Computing, May 1989, pp. 217–226.
McKellar, A. C., and E. G. Coffman, Jr., Organizing Matrices and Matrix Operations for Paged Memory Systems,Comm. ACM, Vol. 12, March 1969, pp. 153–165.
Article MATH Google Scholar
Padua, D. A., and M. J. Wolfe, Advanced Compiler Optimizations for Super-computers,Comm. ACM, Vol. 29, December 1986, pp. 1184–1201.
Article Google Scholar
Rutledge, J. D., and H. Rubinstein, Matrix Algebra Programs for the UNIVAC, Presented at theWayne Conference on Automatic Computing Machinery and Applications, March 1951.
Snyder, L., Type Architectures, Shared Memories, and the Corollary of Modest Potential,Annual Rev. Comput. Sci., Vol. 1, 1986, pp. 289–317.
Article Google Scholar
Valiant, L. G., A Bridging Model for Parallel Computation,Comm. ACM, Vol. 33, August 1990, pp. 103–111.
Article Google Scholar
Van Loan, C.,Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, PA, 1992.
MATH Google Scholar
Vitter, J. S., and E. A. M. Shriver, Algorithms for Parallel Memory, II: Hierarchical Multilevel Memories,Algorithmica, this issue, pp. 148–169.

Download references

Author information

Authors and Affiliations

IBM Watson Research Center, P.O. Box 218, 10598, Yorktown Heights, NY, USA
B. Alpern, L. Carter & E. Feig
IBM Almaden Research Center, 650 Harry Rd., 95120, San Jose, CA, USA
T. Selker

Authors

B. Alpern
View author publications
You can also search for this author in PubMed Google Scholar
L. Carter
View author publications
You can also search for this author in PubMed Google Scholar
E. Feig
View author publications
You can also search for this author in PubMed Google Scholar
T. Selker
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by Jeffrey Scott Vitter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alpern, B., Carter, L., Feig, E. et al. The uniform memory hierarchy model of computation. Algorithmica 12, 72–109 (1994). https://doi.org/10.1007/BF01185206

Download citation

Received: 03 December 1990
Revised: 03 November 1992
Issue Date: September 1994
DOI: https://doi.org/10.1007/BF01185206

The uniform memory hierarchy model of computation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Transfer Requirement Analysis with Bandwidth Curves

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

Shared Memory in the Many-Core Age

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

The uniform memory hierarchy model of computation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Transfer Requirement Analysis with Bandwidth Curves

Message Passing or Shared Memory: Evaluating the Delegation Abstraction for Multicores

Shared Memory in the Many-Core Age

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation