research-article

Accelerating Cache Coherence in Manycore Processor through Silicon Photonic Chiplet

Authors:

Jiang XuAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 43, Pages 1 - 9

https://doi.org/10.1145/3508352.3549338

Published: 22 December 2022 Publication History

Abstract

Cache coherence overhead in manycore systems is becoming prominent with the increase of system scale. However, traditional electrical networks restrict the efficiency of cache coherence transactions in the system due to the limited bandwidth and long latency. Optical network promises high bandwidth and low latency, and supports both efficient unicast and multicast transmission, which can potentially accelerate cache coherence in manycore systems. This work proposes a novel photonic cache coherence network with a physically centralized logically distributed directory called PCCN for chiplet-based manycore systems. PCCN adopts a channel sharing method with a contention solving mechanism for efficient long-distance coherence-related packet transmission. Experiment results show that compared to state-of-the-art proposals, PCCN can speed up application execution time by 1.32x, reduce memory access latency by 26%, and improve energy efficiency by 1.26x, on average, in a 128-core system.

References

[1]

José L Abellán et al. 2018. Photonic-based express coherence notifications for many-core CMPs. J. Parallel and Distrib. Comput. 113 (2018), 179--194.

Digital Library

[2]

Noah Beck et al. 2018. `Zeppelin': An SoC for multichip architectures. In ISSCC.

[3]

Shekhar Borkar. 2007. Thousand core chips: a technology perspective. In DAC.

[4]

Guoqing Chen et al. 2007. Predictions of CMOS compatible on-chip optical interconnect. Integration 40, 4 (2007), 434--446.

Digital Library

[5]

Corning. 2014. Corning® Single-Mode Optical Fiber. Technical Publication (2014).

[6]

Blas Cuesta et al. 2011. Increasing the effectiveness of directory caches by avoiding the tracking of noncoherent memory blocks. TC 62, 3 (2011), 482--495.

[7]

Abhishek Das et al. 2012. Dynamic directories: A mechanism for reducing on-chip interconnect power in multicores. In DATE.

[8]

Yigit Demir et al. 2014. Galaxy: A high-performance energy-efficient multi-chip architecture using photonic interconnects. In International Conference on Supercomputing.

[9]

Randolph Kirchain and Lionel Kimerling. 2007. A roadmap for nanophotonics. Nature Photonics 1, 6 (2007), 303--305.

[10]

George Kurian, Omer Khan, and Srinivas Devadas. 2013. The locality-aware adaptive cache coherence protocol. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 523--534.

Digital Library

[11]

George Kurian, Jason E Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 477--488.

Digital Library

[12]

Sheng Li. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In ISCA. 469--480.

[13]

Zheng Li et al. 2008. PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In International Symposium on Workload Characterization.

[14]

Zheng Li et al. 2009. Spectrum: a hybrid nanophotonic-electric on-chip network. In DAC.

[15]

Zheng Li et al. 2016. JADE: A heterogeneous multiprocessor system simulation platform using recorded and statistical application models. In AISTECS.

[16]

Jie Meng, Chao Chen, Ayse Kivilcim Coskun, and Ajay Joshi. 2011. Run-time energy management of manycore systems through reconfigurable interconnects. In Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI. 43--48.

Digital Library

[17]

Micron. 2017. TN-40-07: Calculating Memory System Power for DDR4 SDRAM. Technical Publication (2017).

[18]

Samuel Naffziger et al. 2021. Pioneering Chiplet Technology and Design for the AMD EPYC^™ and Ryzen^™ Processor Families: Industrial Product. In ISCA.

[19]

John Poulton et al. 2007. A 14-mW 6.25-Gb/s transceiver in 90-nm CMOS. IEEE Journal of Solid-State Circuits 42, 12 (2007), 2745--2757.

[20]

Alberto Ros and Stefanos Kaxiras. 2012. Complexity-effective multicore coherence. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques. 241--252.

Digital Library

[21]

Alberto Ros and Stefanos Kaxiras. 2015. Callback: Efficient synchronization without invalidation with a directory just for spin-waiting. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 427--438.

Digital Library

[22]

Clint L Schow et al. 2011. A 24-channel, 300 Gb/s, 8.2 pJ/bit, full-duplex fiber-coupled optical transceiver module based on a single "holey" CMOS IC. Journal of Lightwave Technology 29, 4 (2011), 542--553.

[23]

Lisa Su. 2019. Delivering the future of high-performance computing. In 2019 IEEE Hot Chips 31 Symposium (HCS).

[24]

Pascal Vivet et al. 2020. A 7-nm 4-GHz Arm¹-core-based CoWoS¹ chiplet design for high-performance computing. Journal of Solid-State Circuits 55, 4 (2020), 956--966.

[25]

Pascal Vivet et al. 2020. IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management. Journal of Solid-State Circuits 56, 1 (2020), 79--97.

[26]

Zhehui Wang et al. 2015. Improve chip pin performance using optical interconnects. TVLSI 24, 4 (2015), 1574--1587.

[27]

Zhehui Wang et al. 2016. A holistic modeling and analysis of optical-electrical interfaces for inter/intra-chip interconnects. TVLSI 24, 7 (2016), 2462--2474.

[28]

Zhehui Wang et al. 2019. CAMON: Low-cost silicon photonic chiplet for manycore processors. TCAD 39, 9 (2019), 1820--1833.

[29]

Sebastian Werner et al. 2017. Designing low-power, low-latency networks-on-chip by optimally combining electrical and optical links. In HPCA.

[30]

Steven Cameron Woo. 1995. The SPLASH-2 programs: Characterization and methodological considerations. SIGARCH computer architecture news 23, 2 (1995), 24--36.

Digital Library

[31]

Qianfan Xu et al. 2005. Micrometre-scale silicon electro-optic modulator. nature 435, 7040 (2005), 325--327.

[32]

Yi Xu, Yu Du, Youtao Zhang, and Jun Yang. 2011. A composite and scalable cache coherence protocol for large scale CMPs. In Proceedings of the international conference on Supercomputing. 285--294.

Digital Library

[33]

Hongzhou Zhao, Arrvindh Shriraman, Snehasish Kumar, and Sandhya Dwarkadas. 2013. Protozoa: Adaptive granularity cache coherence. ACM SIGARCH Computer Architecture News 41, 3 (2013), 547--558.

Digital Library

Cited By

Nisa UBashir J(2024)Towards Efficient On-Chip Communication: A Survey on Silicon Nanophotonics and Optical Networks-on-ChipJournal of Systems Architecture10.1016/j.sysarc.2024.103171152(103171)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103171

Index Terms

Accelerating Cache Coherence in Manycore Processor through Silicon Photonic Chiplet
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Index terms have been assigned to the content through auto-classification.

Recommendations

Accelerating cache coherence mechanism with speculation
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Directory is one of the common method to maintain cache coherence in multi/many-core systems. However, directory has problems in area, latency and complexity of protocol. Conversely, directoryless coherence mechanism, where each core invalidates its own ...
Maintaining Cache Coherence through Compiler-Directed Data Prefetching

In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. TheCache Coherence With Data Prefetching(CCDP) scheme uses ...
Improving cache performance with adaptive cache topologies and deferred coherence models

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Upcoming Conference

ICCAD '24

Sponsor:
sigda

IEEE/ACM International Conference on Computer-Aided Design

October 27 - 31, 2024

New York , NY , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
262
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)18

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nisa UBashir J(2024)Towards Efficient On-Chip Communication: A Survey on Silicon Nanophotonics and Optical Networks-on-ChipJournal of Systems Architecture10.1016/j.sysarc.2024.103171152(103171)Online publication date: Jul-2024
https://doi.org/10.1016/j.sysarc.2024.103171

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents