research-article

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Authors:

Jinn-Shyan WangAuthors Info & Claims

DAC '09: Proceedings of the 46th Annual Design Automation Conference

Pages 587 - 592

https://doi.org/10.1145/1629911.1630062

Published: 26 July 2009 Publication History

Get Access

Abstract

Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.

References

[1]

Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of Annual Conference on Design Automation. 2005.

Digital Library

Google Scholar

[2]

Bourduas, S. and Zilic, Z. A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing. In Procs of the 1st international Symposium on Networks-on-Chip. 2007.

Digital Library

Google Scholar

[3]

Chang, K., Shen, J., and Chen, T. Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes. ACM Trans. Des. Autom. Electron. Syst.vol. 13, no. 1. 2008.

Digital Library

Google Scholar

[4]

Kistler, M., Perrone, M., and Petrini, F. Cell Multiprocessor Communication Network: Built for Speed. Micro vol. 26, no. 3. 2006.

Digital Library

Google Scholar

[5]

Loh, G. H., Xie, Y., and Black, B. Processor Design in 3D Die-Stacking Technologies. IEEE Micro vol. 27, no. 3. 2007.

Digital Library

Google Scholar

[6]

Li, F., Nicopoulos, C., Richardson, T., Xie, Y., Narayanan, V., and Kandemir, M. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In Procs of the ISCA. 2006.

Digital Library

Google Scholar

[7]

Pavlidis, V. F. and Friedman, E. G. 3-D topologies for net-works-on-chip. IEEE Trans. Very Large Scale Integr. Syst. vol. 15, no. 10. 2007.

Digital Library

Google Scholar

[8]

Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keck-ler, S. W. A NUCA substrate for flexible CMP cache sharing. In Procs. of Intl. Conference on Supercomputing. 2005.

Digital Library

Google Scholar

[9]

Dybdahl, H. and Stenstrom, P. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the Intl. Symp. on High Performance Computer Architecture. 2007.

Digital Library

Google Scholar

[10]

CACTI: An Integrated Cache Timing, Power, and Area Model http://www.ece.ubc.ca/~stevew/cacti/

Google Scholar

[11]

Nguyen, A.-T.; Michael, M.; Sharma, A.; Torrellas, J. The Augmint multiprocessor simulation toolkit for Intel x86 architectures. In procs of Computer Design: VLSI in Computers and Processors, 1996.

Digital Library

Google Scholar

Cited By

View all

Kang KPark SLee JBenini LDe Micheli GFanucci LTeich J(2016)A power-efficient 3-D on-chip interconnect for multi-core accelerators with stacked L2 cacheProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972149(1465-1468)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972149
Kang KBenini LDe Micheli G(2015)Cost-Effective Design of Mesh-of-Tree Interconnect for Multicore Clusters With 3-D Stacked L2 Scratchpad MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.234603223:9(1828-1841)Online publication date: Sep-2015
https://doi.org/10.1109/TVLSI.2014.2346032
Bathen LDutt N(2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2611755
Show More Cited By

Index Terms

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Communication hardware, interfaces and storage
    1. Buses and high-speed links

Recommendations

A hybrid NoC design for cache coherence optimization for chip multiprocessors
DAC '12: Proceedings of the 49th Annual Design Automation Conference

On chip many-core systems, evolving from prior multi-processor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes ...
Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs
DAC '06: Proceedings of the 43rd annual Design Automation Conference

NOC architectures have to deliver good latency-throughput performance in the face of very tight power and area budgets. However, the latency and the power consumption for transferring information down the transmitter stack, through the channel, and up ...
High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip

The conventional two-dimensional 2-D integrated circuit IC has limited scope for floor planning and therefore limits the performance improvements resulting from the Network-on-Chip NoC paradigm. The arrangement of 3-D also offers opportunities for new ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

DAC '09: Proceedings of the 46th Annual Design Automation Conference

July 2009

994 pages

ISBN:9781605584973

DOI:10.1145/1629911

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '09

Sponsor:

EDAC
SIGDA
IEEE-CAS

DAC '09: The 46th Annual Design Automation Conference 2009

July 26 - 31, 2009

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,264 of 4,035 submissions, 31%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
299
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kang KPark SLee JBenini LDe Micheli GFanucci LTeich J(2016)A power-efficient 3-D on-chip interconnect for multi-core accelerators with stacked L2 cacheProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972149(1465-1468)Online publication date: 14-Mar-2016
https://dl.acm.org/doi/10.5555/2971808.2972149
Kang KBenini LDe Micheli G(2015)Cost-Effective Design of Mesh-of-Tree Interconnect for Multicore Clusters With 3-D Stacked L2 Scratchpad MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.234603223:9(1828-1841)Online publication date: Sep-2015
https://doi.org/10.1109/TVLSI.2014.2346032
Bathen LDutt N(2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2611755
Azarkhish ELoi IBenini L(2013)A high-performance multiported L2 memory IP for scalable three-dimensional integration2013 IEEE International 3D Systems Integration Conference (3DIC)10.1109/3DIC.2013.6702347(1-8)Online publication date: Oct-2013
https://doi.org/10.1109/3DIC.2013.6702347
Bathen LDutt N(2012)Software Controlled Memories for Scalable Many-Core ArchitecturesProceedings of the 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2012.60(1-10)Online publication date: 19-Aug-2012
https://dl.acm.org/doi/10.1109/RTCSA.2012.60
Chou SChen CWen CChen TLin T(2011)Hierarchical circuit-switched NoC for multicore video processingMicroprocessors & Microsystems10.1016/j.micpro.2010.09.00935:2(182-199)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.1016/j.micpro.2010.09.009

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A hybrid NoC design for cache coherence optimization for chip multiprocessors

Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for application-specific SOCs

High-speed dynamic TDMA arbiter for inter-layer communications in 3-D network-on-chip