Abstract
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs \(O(n^{3}/\sqrt{ Z}+(Pn)^{1/3}n^{2})\) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring \(O(n^{2}/Z+n+\sqrt{ Pn^{3+\epsilon}})\) cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring \(O(n^{2}/Z+\sqrt{ Pn^{3.58}})\) cache misses with high probability.
Similar content being viewed by others
References
Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. Theory Comput. Syst. 35(3), 321–347 (2002)
Belady, L.A.: A study of replacement algorithms for virtual storage computers. IBM Syst. J. 5(2), 78–101 (1966)
Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concurrent cache-oblivious B-trees. In: 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 228–237, Las Vegas, NV, July 2005
Bilardi, G., Preparata, F.P.: Upper bounds to processor-time tradeoffs under bounded-speed message propagation. In: 7th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 185–194, Santa Barbara, CA, July 1995
Blelloch, G.E., Gibbons, P.B.: Effectively sharing a cache among threads. In: 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 235–244, Barcelona, Spain, June 2004
Blelloch, G.E., Gibbons, P.B., Matias, Y.: Provably efficient scheduling for languages with fine-grained parallelism. J. ACM 46(2), 281–321 (1999)
Blumofe, R.D., Frigo, M., Joerg, C.F., Leiserson, C.E., Randall, K.H.: An analysis of dag-consistent distributed shared-memory algorithms. In: 8th Annunal ACM Symposium on Parallel Algorithms and Architectures, pp. 297–308, Padua, Italy, June 1996
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 207–216, Santa Barbara, CA, July 1995
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21(2), 201–206 (1974)
Chowdhury, R.A., Ramachandran, V.: Cache-oblivious dynamic programming. In: 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 591–600, Miami, FL, 2006
Chowdhury, R.A., Ramachandran, V.: The cache-oblivious Gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’07), pp. 71–80. Assoc. Comput. Mach., New York (2007)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1990)
Culler, D.E., Karp, R.M., Patterson, D.A., Sahay, A., Schauser, K.E., Santos, E., Subramonian, R., von Eicken, T.: LogP: towards a realistic model of parallel computation. In: 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 1–12, San Diego, CA, May 1993
Fortune, S.J., Wyllie, J.: Parallelism in random access machines. In: 10th Annual ACM Symposium on Theory of Computing, pp. 114–118, San Diego, CA, May 1978
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual Symposium on Foundations of Computer Science, pp. 285–298, New York, USA, October 1999
Frigo, M., Randall, K.H., Leiserson, C.E.: The implementation of the Cilk-5 multithreaded language. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: International Conference on Supercomputing, pp. 361–366, Boston, MA, June 2005
Gibbons, P.B., Matias, Y., Ramachandran, V.: Can a shared-memory model serve as a bridging model for parallel computation? Theory Comput. Syst. 32(3), 327–359 (1999)
Golub, G.H., van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Press, Baltimore (1996)
Graham, R.L.: Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 1563–1581 (1966)
Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics. Addison-Wesley, Reading (1994)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Hong, J.-W., Kung, H.T.: I/O complexity: the red-blue pebbling game. In: 13th Annual ACM Symposium on Theory of Computing, pp. 326–333, Milwaukee, Wisconsin, May 1981
Leiserson, C.E.: Minicourse on multithreaded programming in Cilk, lecture 2: analysis of Cilk algorithms. Summer School on Language-Based Techniques for Concurrent and Distributed Software at the University of Oregon (2006)
Narlikar, G.J., Blelloch, G.E.: Space-efficient scheduling of nested parallelism. ACM Trans. Program. Lang. Syst. 21(1), 138–173 (1999)
Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28(2), 202–208 (1985)
Strumpen, V., Frigo, M.: Software engineering aspects of cache oblivious stencil computations. Technical Report RC24035, IBM Research, August 2006
Toledo, S.: Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix Anal. Appl. 18(4), 1065–1081 (1997)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004.
Rights and permissions
About this article
Cite this article
Frigo, M., Strumpen, V. The Cache Complexity of Multithreaded Cache Oblivious Algorithms. Theory Comput Syst 45, 203–233 (2009). https://doi.org/10.1007/s00224-007-9098-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-007-9098-2