Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1347082.1347137acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

Provably good multicore cache performance for divide-and-conquer algorithms

Published: 20 January 2008 Publication History

Abstract

This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-and-conquer algorithms and present a new on-line scheduler, CONTROLLED-PDF, that is competitive with the standard sequential scheduler in the following sense. Given any dynamically unfolding computation DAG from this class of algorithms, the cache complexity on the multicore-cache model under our new scheduler is within a constant factor of the sequential cache complexity for both L1 and L2, while the time complexity is within a constant factor of the sequential time complexity divided by the number of processors p. These are the first such asymptotically-optimal results for any multicore model. Finally, we show that a separator-based algorithm for sparse-matrix-dense-vector-multiply achieves provably good cache performance in the multicore-cache model, as well as in the well-studied sequential cache-oblivious model.

References

[1]
www.sun.com/processors/UltraSPARC-T1/, 2007.
[2]
www.tilera.com, 2007.
[3]
Intel shows off 80-core processor. www.news.com/2100-1006_3-6158181.html, 2007.
[4]
U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3), 2002. Springer.
[5]
A. Aggarwal, B. Alpern, A. Chandra, and M. Snir. A model for hierarchical memory. In ACM STOC, 1987.
[6]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9), 1988.
[7]
S. Akl and N. Santoro. Optimal parallel merging and sorting without memory conflicts. IEEE Transactions on Computers, 36(11), 1987.
[8]
B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierachy model of computation. Algorth-mica, 12(2/3), 1994. Springer.
[9]
L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In ACM ISCA, 2000.
[10]
M. A. Bender, G. S. Brodal, R. Fagerberg, R. Jacob, and E. Vicari. Optimal sparse matrix dense vector multiplication in the I/O-model. In ACM SPAA, 2007.
[11]
M. A. Bender, J. T. Fineman, S. Gilbert, and B. C. Kuszmaul. Concurrent cache-oblivious B-trees. In ACM SPAA, 2005.
[12]
G. E. Blelloch and P. B. Gibbons. Effectively sharing a cache among threads. In ACM SPAA, 2004.
[13]
G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably efficient scheduling for languages with fine-grained parallelism. Journal of the ACM, 46(2), 1999.
[14]
G. E. Blelloch, P. B. Gibbons, Y. Matias, and G. J. Narlikar. Space-efficient scheduling of parallelism with synchronization variables. In ACM SPAA, 1997.
[15]
R. D. Blumofe, M. Frigo, C. F. Joerg, C. E. Leiserson, and K. H. Randall. An analysis of dag-consistent distributed shared-memory algorithms. In ACM SPAA, 1996.
[16]
R. Brent. The parallel evaluation of general arithmetic expressions. Journal of the ACM, 21:201--206, 1974.
[17]
E. Chan, E. S. Quintana-Orti, G. Quintana-Orti, and R. van de Geijn. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures. In ACM SPAA, 2007.
[18]
S. Chen, P. B. Gibbons, M. Kozuch, V. Liaskovitis, A. Ailamaki, G. E. Blelloch, B. Falsafi, L. Fix, N. Har-davellas, T. C. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In ACM SPAA, 2007.
[19]
R. Chowdhury and V. Ramachandran. The cache-oblivious gaussian elimination paradigm: Theoretical framework, parallelization and experimental evaluation. In ACM SPAA, 2007.
[20]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In USENIX Ann. Tech. Conf., 2005.
[21]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In IEEE FOCS, 1999.
[22]
M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In ACM SPAA, 2006.
[23]
M. T. Goodrich, M. Nelson, and N. Sitchinava. Sorting in parallel external-memory multicores. Technical report, U.C. Irvine, 2007.
[24]
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2), 2000.
[25]
L. Hammond, B. Nayfeh, and K. Olukotun. A single-chip multiprocessor. IEEE Computer, 30(9), 1997.
[26]
R. J. Lipton and R. E. Tarjan. A separator theorem for planar graphs. SIAM Journal on Applied Mathematics, 36(2), 1979.
[27]
G. J. Narlikar. Scheduling threads for low space requirement and good locality. Theory of Computing Systems, 35(2), 2002. Springer.
[28]
B. A. Nayfeh, L. Hammond, and K. Olukotun. Evaluation of design alternatives for a multiprocessor microprocessor. In ACM ISCA, 1996.
[29]
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4), 1969. Springer.
[30]
D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In ACM EuroSys, 2007.

Cited By

View all
  • (2022)Automatic HBM ManagementProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538570(147-159)Online publication date: 11-Jul-2022
  • (2021)Data Oblivious Algorithms for MulticoresProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461783(373-384)Online publication date: 6-Jul-2021
  • (2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
  • Show More Cited By

Index Terms

  1. Provably good multicore cache performance for divide-and-conquer algorithms

              Recommendations

              Comments

              Please enable JavaScript to view thecomments powered by Disqus.

              Information & Contributors

              Information

              Published In

              cover image ACM Conferences
              SODA '08: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
              January 2008
              1289 pages
              • Program Chair:
              • Shang-Hua Teng

              Sponsors

              Publisher

              Society for Industrial and Applied Mathematics

              United States

              Publication History

              Published: 20 January 2008

              Check for updates

              Qualifiers

              • Research-article

              Conference

              SODA08
              Sponsor:
              SODA08: 19th ACM-SIAM Symposium on Discrete Algorithms
              January 20 - 22, 2008
              California, San Francisco

              Acceptance Rates

              Overall Acceptance Rate 411 of 1,322 submissions, 31%

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)3
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 16 Nov 2024

              Other Metrics

              Citations

              Cited By

              View all
              • (2022)Automatic HBM ManagementProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538570(147-159)Online publication date: 11-Jul-2022
              • (2021)Data Oblivious Algorithms for MulticoresProceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3409964.3461783(373-384)Online publication date: 6-Jul-2021
              • (2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
              • (2018)An Efficient Programming Skeleton for Clusters of Multi-Core ProcessorsInternational Journal of Parallel Programming10.1007/s10766-017-0517-y46:6(1094-1109)Online publication date: 1-Dec-2018
              • (2017)Resource Oblivious Sorting on MulticoresACM Transactions on Parallel Computing10.1145/30402213:4(1-31)Online publication date: 23-Mar-2017
              • (2016)A divide-and-conquer parallel pattern implementation for multicoresProceedings of the 3rd International Workshop on Software Engineering for Parallel Systems10.1145/3002125.3002128(10-19)Online publication date: 21-Oct-2016
              • (2016)Experimental Analysis of Space-Bounded SchedulersACM Transactions on Parallel Computing10.1145/29383893:1(1-27)Online publication date: 28-Jun-2016
              • (2015)Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/276645012:2(1-24)Online publication date: 8-Jul-2015
              • (2014)A memory access model for highly-threaded many-core architecturesFuture Generation Computer Systems10.5555/2747903.274817530:C(202-215)Online publication date: 1-Jan-2014
              • (2014)Cache-adaptive algorithmsProceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms10.5555/2634074.2634145(958-971)Online publication date: 5-Jan-2014
              • Show More Cited By

              View Options

              Login options

              View options

              PDF

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media