Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The locality-aware adaptive cache coherence protocol

Published: 23 June 2013 Publication History

Abstract

Next generation multicore applications will process massive amounts of data with significant sharing. Data movement and management impacts memory access latency and consumes power. Therefore, harnessing data locality is of fundamental importance in future processors. We propose a scalable, efficient shared memory cache coherence protocol that enables seamless adaptation between private and logically shared caching of on-chip data at the fine granularity of cache lines. Our data-centric approach relies on in-hardware yet low-overhead runtime profiling of the locality of each cache line and only allows private caching for data blocks with high spatio-temporal locality. This allows us to better exploit the private caches and enable low-latency, low-energy memory access, while retaining the convenience of shared memory. On a set of parallel benchmarks, our low-overhead locality-aware mechanisms reduce the overall energy by 25% and completion time by 15% in an NoC-based multicore with the Reactive-NUCA on-chip cache organization and the ACKwise limited directory-based coherence protocol.

References

[1]
DARPA UHPC Program BAA. https://www.fbo.gov/spg/ODA/DARPA/CMO/DARPA-BAA-10-37/listing.html, March 2010.
[2]
S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson, E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook. Tile64 - processor: A 64-core soc with mesh interconnect. In International Solid-State Circuits Conference, 2008.
[3]
C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2008.
[4]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. In IEEE Micro, 30(2): 16--29, 2010.
[5]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In Int'l Symposium on Computer Architecture, 2009.
[6]
H. Hoffmann, D. Wentzlaff, and A. Agarwal. Remote Store Programming: A memory model for embedded multicore. In International Conference on High Performance Embedded Architectures and Compilers, 2010.
[7]
S. Iqbal, Y. Liang, and H. Grahn. ParMiBench - an open-source benchmark for embedded multiprocessor systems. Computer Architecture Letters, 2010.
[8]
A. Jaleel, E. Borch, M. Bhandaru, S. C. Steely Jr., and J. Emer. Achieving non-inclusive cache performance with inclusive caches: Temporal locality aware (TLA) cache management policies. In Int'l Symposium on Microarchitecture, 2010.
[9]
N. E. Jerger, L.-S. Peh, and M. Lipasti. Virtual circuit tree multicasting: A case for on-chip hardware multicast support. In Int'l Symposium on Computer Architecture, 2008.
[10]
T. L. Johnson and W.-M. W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. In Int'l Symposium on Computer architecture, 1997.
[11]
H. Kaul, M. Anders, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar. Near-threshold voltage (NTV) design - opportunities and challenges. In Design Automation Conference, 2012.
[12]
C. Kim, D. Burger, and S. W. Keckler. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches. In Int'l Conference on Architectural Support for Programming Languages and Operating Systems, 2002.
[13]
G. Kurian, J. Miller, J. Psota, J. Eastep, J. Liu, J. Michel, L. Kimerling, and A. Agarwal. ATAC: A 1000-Core Cache-Coherent Processor with On-Chip Optical Network. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2010.
[14]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Int'l Symposium on Microarchitecture, 2009.
[15]
H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache Bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Int'l Symposium on Microarchitecture, 2008.
[16]
M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, July 2012.
[17]
J. E. Miller, H. Kasture, G. Kurian, C. G. III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Int'l Symposium on High Performance Computer Architecture, 2010.
[18]
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Int'l Symposium on Microarchitecture, 2006.
[19]
D. Sanchez and C. Kozyrakis. SCD: A Scalable Coherence Directory with Flexible Sharer Set Encoding. In Int'l Symposium on High Performance Computer Architecture, 2012.
[20]
C. Sun, C.-H. O. Chen, G. Kurian, L. Wei, J. Miller, A. Agarwal, L.-S. Peh, and V. Stojanovic. DSENT - a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Int'l Symposium on Networks-on-Chip, 2012.
[21]
G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In Int'l Symposium on Microarchitecture, 1995.
[22]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Int'l Symposium on Computer Architecture, 1995.
[23]
J. Zebchuk, V. Srinivasan, M. K. Qureshi, and A. Moshovos. A tagless coherence directory. In Int'l Symposium on Microarchitecture, 2009.
[24]
M. Zhang and K. Asanović. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In Int'l Symposium on Computer Architecture, 2005.
[25]
H. Zhao, A. Shriraman, S. Dwarkadas, and V. Srinivasan. SPATL: Honey, I Shrunk the Coherence Directory. In Int'l Conference on Parallel Architectures and Compilation Techniques, 2011.

Cited By

View all
  • (2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
  • (2019)LPM: A Systematic Methodology for Concurrent Data Access Pattern Optimization from a Matching PerspectiveIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.291257330:11(2478-2493)Online publication date: 9-Oct-2019
  • (2019)Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916398(1-8)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
  • cover image ACM Other conferences
    ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
    June 2013
    686 pages
    ISBN:9781450320795
    DOI:10.1145/2485922
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2013
Published in SIGARCH Volume 41, Issue 3

Check for updates

Author Tags

  1. cache coherence
  2. multicore

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00066(578-589)Online publication date: May-2020
  • (2019)LPM: A Systematic Methodology for Concurrent Data Access Pattern Optimization from a Matching PerspectiveIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.291257330:11(2478-2493)Online publication date: 9-Oct-2019
  • (2019)Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data2019 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2019.8916398(1-8)Online publication date: Sep-2019
  • (2016)Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753269(117-124)Online publication date: Oct-2016
  • (2015)LPMProceedings of the 2015 44th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2015.97(879-888)Online publication date: 1-Sep-2015
  • (2015)Comparison of significant issues in multicore cache coherenceProceedings of the 2015 International Conference on Green Computing and Internet of Things (ICGCIoT)10.1109/ICGCIoT.2015.7380439(108-112)Online publication date: 8-Oct-2015
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2022)A Case for Fine-grain Coherence Specialization in Heterogeneous SystemsACM Transactions on Architecture and Code Optimization10.1145/353081919:3(1-26)Online publication date: 22-Aug-2022
  • (2022)Accelerating Cache Coherence in Manycore Processor through Silicon Photonic ChipletProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549338(1-9)Online publication date: 30-Oct-2022
  • (2020)Analyzing and Leveraging Shared L1 Caches in GPUsProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414623(161-173)Online publication date: 30-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media