Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

The Cache Complexity of Multithreaded Cache Oblivious Algorithms

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs \(O(n^{3}/\sqrt{ Z}+(Pn)^{1/3}n^{2})\) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring \(O(n^{2}/Z+n+\sqrt{ Pn^{3+\epsilon}})\) cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring \(O(n^{2}/Z+\sqrt{ Pn^{3.58}})\) cache misses with high probability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. Theory Comput. Syst. 35(3), 321–347 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Belady, L.A.: A study of replacement algorithms for virtual storage computers. IBM Syst. J. 5(2), 78–101 (1966)

    Article  Google Scholar 

  3. Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 228–237, Las Vegas, NV, July 2005

  4. Bilardi, G., Preparata, F.P.: Upper bounds to processor-time tradeoffs under bounded-speed message propagation. In: 7th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 185–194, Santa Barbara, CA, July 1995

  5. Blelloch, G.E., Gibbons, P.B.: Effectively sharing a cache among threads. In: 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 235–244, Barcelona, Spain, June 2004

  6. Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  7. Blumofe, R.D., Frigo, M., Joerg, C.F., Leiserson, C.E., Randall, K.H.: An analysis of dag-consistent distributed shared-memory algorithms. In: 8th Annunal ACM Symposium on Parallel Algorithms and Architectures, pp. 297–308, Padua, Italy, June 1996

  8. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 207–216, Santa Barbara, CA, July 1995

  9. Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  10. Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21(2), 201–206 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  11. Chowdhury, R.A., Ramachandran, V.: Cache-oblivious dynamic programming. In: 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 591–600, Miami, FL, 2006

  12. Chowdhury, R.A., Ramachandran, V.: The cache-oblivious Gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’07), pp. 71–80. Assoc. Comput. Mach., New York (2007)

    Chapter  Google Scholar 

  13. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  14. Culler, D.E., Karp, R.M., Patterson, D.A., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., von Eicken, T.: LogP: towards a realistic model of parallel computation. In: 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12, San Diego, CA, May 1993

  15. Fortune, S.J., Wyllie, J.: Parallelism in random access machines. In: 10th Annual ACM Symposium on Theory of Computing, pp. 114–118, San Diego, CA, May 1978

  16. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual Symposium on Foundations of Computer Science, pp. 285–298, New York, USA, October 1999

  17. Frigo, M., Randall, K.H., Leiserson, C.E.: The implementation of the Cilk-5 multithreaded language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998

  18. Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: International Conference on Supercomputing, pp. 361–366, Boston, MA, June 2005

  19. Gibbons, P.B., Matias, Y., Ramachandran, V.: Can a shared-memory model serve as a bridging model for parallel computation? Theory Comput. Syst. 32(3), 327–359 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  20. Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Press, Baltimore (1996)

    MATH  Google Scholar 

  21. Graham, R.L.: Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 1563–1581 (1966)

    Google Scholar 

  22. Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics. Addison-Wesley, Reading (1994)

    MATH  Google Scholar 

  23. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  24. Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebbling game. In: 13th Annual ACM Symposium on Theory of Computing, pp. 326–333, Milwaukee, Wisconsin, May 1981

  25. Leiserson, C.E.: Minicourse on multithreaded programming in Cilk, lecture 2: analysis of Cilk algorithms. Summer School on Language-Based Techniques for Concurrent and Distributed Software at the University of Oregon (2006)

  26. Narlikar, G.J., Blelloch, G.E.: Space-efficient scheduling of nested parallelism. ACM Trans. Program. Lang. Syst. 21(1), 138–173 (1999)

    Article  Google Scholar 

  27. Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28(2), 202–208 (1985)

    Article  MathSciNet  Google Scholar 

  28. Strumpen, V., Frigo, M.: Software engineering aspects of cache oblivious stencil computations. Technical Report RC24035, IBM Research, August 2006

  29. Toledo, S.: Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  30. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Frigo.

Additional information

This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frigo, M., Strumpen, V. The Cache Complexity of Multithreaded Cache Oblivious Algorithms. Theory Comput Syst 45, 203–233 (2009). https://doi.org/10.1007/s00224-007-9098-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-007-9098-2

Keywords

Navigation