Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

The uniform memory hierarchy model of computation

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

TheUniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies.

A sequential computer's memory is modeled as a sequence 〈M 0,M 1,...〉 of increasingly large memory modules. Computation takes place inM 0. Thus,M 0 might model a computer's central processor, whileM 1 might be cache memory,M 2 main memory, and so on. For each moduleM u, a busB u connects it with the next larger module Mu+1. All buses may be active simultaneously. Data is transferred along a bus in fixed-sized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parametrized by the rate at which the blocksizes increase and by the ratio of the blockcount to the blocksize. A third parameter, the transfer-cost (inverse bandwidth) function, determines the time to transfer blocks at the different levels of the hierarchy.

UMH analysis refines traditional methods of algorithm analysis by including the cost of data movement throughout the memory hierarchy. Thecommunication efficiency of a program is a ratio measuring the portion of UMH running time during which M0 is active. An algorithm that can be implemented by a program whose communication efficiency is nonzero in the limit is said to becommunication- efficient. The communication efficiency of a program depends on the parameters of the UMH model, most importantly on the transfer-cost function. Athreshold function separates those transfer-cost functions for which an algorithm is communication-efficient from those that are too costly. Threshold functions for matrix transpose, standard matrix multiplication, and Fast Fourier Transform algorithms are established by exhibiting communication-efficient programs at the threshold and showing that more expensive transfer-cost functions are too costly.

A parallel computer can be modeled as a tree of memory modules with computation occurring at the leaves. Threshold functions are established for multiplication ofN×N matrices using up to N2 processors in a tree with constant branching factor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, R. C., and F. G. Gustavson, Vector and Parallel Algorithms for Cholesky Factorization on IBM 3090,Proc. Supercomputing '89, November 1989, pp. 225–233.

  2. Aggarwal, A., B. Alpern, A. K. Chandra, and M. Snir, A Model for Hierarchical Memory,Proc. 19th Symp. on Theory of Computing, May 1987, pp. 305–314.

  3. Aggarwal, A., A. K. Chandra, and M. Snir, Hierarchical Memory with Block Transfer,Proc. 28th Symp. on Foundations of Computer Science, October 1987, pp. 204–216.

  4. Aggarwal, A., and J. Vitter, IO Complexity of Sorting and Related Problems,Comm. ACM, Vol. 31, September 1988, pp. 305–314.

    Article  MathSciNet  Google Scholar 

  5. Aho, A., J. Hopcroft, and J. Ullman,The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974.

    MATH  Google Scholar 

  6. Alpern, B., L. Carter, and T. Selker, Visualizing Computer Memory Architectures,Proc. IEEE Visualization '90 Conf., October 1990.

  7. Anderson, E., Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen,LAPACK Users' Guide, SIAM, Philadelphia, PA, 1992.

    MATH  Google Scholar 

  8. Bailey, D. H., FFTs in External or Hierarchical Memory,J. Supercomput., Vol. 4, 1990, pp. 23–35.

    Article  Google Scholar 

  9. Bakoglu, H. B., and T. Whiteside, RISC System/6000 Hardware Overview,IBM RISC System/6000 Technology, IBM Corp. SA23-2619, 1990, pp. 8–15.

    Google Scholar 

  10. Carlson, D. A., Using Local Memory to Boost the Performance of FFT Algorithms on the CRAY-2 Supercomputer,J. Supercomput., Vol. 4, 1990, pp. 345–356.

    Article  Google Scholar 

  11. Carr, S., and K. Kennedy, Blocking Linear Algebra Codes for Memory Hierarchies,Proc. 4th SIAM Conf. on Parallel Processing for Scientific Computing, December 1989.

  12. Carter, L., The RAM Model and the Performance Programmer, IBM Research Report RC16319, November 1990.

  13. Cheriton, D. R., G. A. Slavenburg, and P. D. Boyle, Software-Controlled Caches in the VMP Multiprocessor,Proc. Internat. Symp. on Computer Architecture, June 1986, pp. 366–374.

  14. Corman, T. H., Fast Permuting on Disk Arrays,Advanced Research in VLSI: Proceedings of the 1992 Brown/MIT Conference, 1992, pp. 58–76.

  15. Ferrante, J., K. Ottenstein and J. Warren, The Program Dependence Graph and Its Use in Optimization,ACM Trans. Program. Languages Systems, Vol. 9, No. 3, July, 1987, pp. 319–349.

    Article  MATH  Google Scholar 

  16. Lloyd, R. W., Permuting Information in Idealized Two-Level Storage,Complexity of Computer Computations, Plenum, New York, 1972, pp. 105–109.

    Google Scholar 

  17. Gallivan, K., W. Jalby, U. Meier, and A. H. Sameh, Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design,Internat. J. Supercomput. Appl, Vol. 2, No. 1, Spring 1988, pp. 12–48.

    Article  Google Scholar 

  18. Gannon, D., and W. Jalby, The Influence of Memory Hierarchy on Algorithm Organization: Programming FFTs on a Vector Multiprocessor,The Characteristics of Parallel Algorithms, L. H. Jamieson, D. B. Gannon, and R. J. Douglass, eds., MIT Press, Cambridge, MA, 1987, pp. 277–301.

    Google Scholar 

  19. Hong, J.-W., and H. T. Kung, I/O Complexity: The Red-Blue Pebble Game,Proc. 13th Symp. on Theory of Computing, May 1981, pp. 326–333.

  20. Hoskins, J.,IBM RISC System/6000: A Business Perspective, 2nd edn., Wiley, New York, 1992.

    Google Scholar 

  21. ESSL Guide and Reference, Order number SC23-0184-0, IBM Corporation, 1986.

  22. Irigoin, F., and R. Triolet, Supernode Partitioning,Proc. 15th ACM Symp. on Principles of Programming Languages, January 1988, pp. 319–328.

  23. Lam, T., P. Tiwari, and M. Tompa, Tradeoffs Between Communication and Space,Proc. 21st Symp, on Theory of Computing, May 1989, pp. 217–226.

  24. McKellar, A. C., and E. G. Coffman, Jr., Organizing Matrices and Matrix Operations for Paged Memory Systems,Comm. ACM, Vol. 12, March 1969, pp. 153–165.

    Article  MATH  Google Scholar 

  25. Padua, D. A., and M. J. Wolfe, Advanced Compiler Optimizations for Super-computers,Comm. ACM, Vol. 29, December 1986, pp. 1184–1201.

    Article  Google Scholar 

  26. Rutledge, J. D., and H. Rubinstein, Matrix Algebra Programs for the UNIVAC, Presented at theWayne Conference on Automatic Computing Machinery and Applications, March 1951.

  27. Snyder, L., Type Architectures, Shared Memories, and the Corollary of Modest Potential,Annual Rev. Comput. Sci., Vol. 1, 1986, pp. 289–317.

    Article  Google Scholar 

  28. Valiant, L. G., A Bridging Model for Parallel Computation,Comm. ACM, Vol. 33, August 1990, pp. 103–111.

    Article  Google Scholar 

  29. Van Loan, C.,Computational Frameworks for the Fast Fourier Transform, SIAM, Philadelphia, PA, 1992.

    MATH  Google Scholar 

  30. Vitter, J. S., and E. A. M. Shriver, Algorithms for Parallel Memory, II: Hierarchical Multilevel Memories,Algorithmica, this issue, pp. 148–169.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by Jeffrey Scott Vitter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alpern, B., Carter, L., Feig, E. et al. The uniform memory hierarchy model of computation. Algorithmica 12, 72–109 (1994). https://doi.org/10.1007/BF01185206

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01185206

Key words

Navigation