Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1629911.1630062acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

No cache-coherence: a single-cycle ring interconnection for multi-core L1-NUCA sharing on 3D chips

Published: 26 July 2009 Publication History

Abstract

Consistent with the trend towards the use of many cores in SOC and 3D Chip techniques, this paper proposes a "single-cycle ring" interconnection (SC_Ring) with ultra-low latency and minimal complexity. The proposed SC_Ring allows multiple single-cycle transactions in parallel. The main features of the circuit-switched design include a set of 3-ported circuit-switched routers (4~16) and a performance/timing effective arbiter. The arbiter, called "BTPC", features single-cycle arbitration and routing-control by means of the novel Binary-Tree paths convergence and path-prediction mechanisms, to provide a highly reduced time complexity. By combining this with the integration of 3D chips, the proposed ring-based interconnection offers several advantages for hierarchical clustering in future many-core systems, in terms of cost, latency, and power reductions. Moreover, based on the proposed SC_Ring, this work realizes a "level-1 non-uniform cache architecture" (L1-NUCA) for fast data communication without cache-coherency in facilitating multithreading/multi-core as a case study. Finally, experimental results show that our approach yields promising performance.

References

[1]
Kim, J., Park, D., Theocharides, T., Vijaykrishnan, N., and Das, C. R. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of Annual Conference on Design Automation. 2005.
[2]
Bourduas, S. and Zilic, Z. A Hybrid Ring/Mesh Interconnect for Network-on-Chip Using Hierarchical Rings for Global Routing. In Procs of the 1st international Symposium on Networks-on-Chip. 2007.
[3]
Chang, K., Shen, J., and Chen, T. Tailoring circuit-switched network-on-chip to application-specific system-on-chip by two optimization schemes. ACM Trans. Des. Autom. Electron. Syst.vol. 13, no. 1. 2008.
[4]
Kistler, M., Perrone, M., and Petrini, F. Cell Multiprocessor Communication Network: Built for Speed. Micro vol. 26, no. 3. 2006.
[5]
Loh, G. H., Xie, Y., and Black, B. Processor Design in 3D Die-Stacking Technologies. IEEE Micro vol. 27, no. 3. 2007.
[6]
Li, F., Nicopoulos, C., Richardson, T., Xie, Y., Narayanan, V., and Kandemir, M. Design and Management of 3D Chip Multiprocessors Using Network-in-Memory. In Procs of the ISCA. 2006.
[7]
Pavlidis, V. F. and Friedman, E. G. 3-D topologies for net-works-on-chip. IEEE Trans. Very Large Scale Integr. Syst. vol. 15, no. 10. 2007.
[8]
Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keck-ler, S. W. A NUCA substrate for flexible CMP cache sharing. In Procs. of Intl. Conference on Supercomputing. 2005.
[9]
Dybdahl, H. and Stenstrom, P. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the Intl. Symp. on High Performance Computer Architecture. 2007.
[10]
CACTI: An Integrated Cache Timing, Power, and Area Model http://www.ece.ubc.ca/~stevew/cacti/
[11]
Nguyen, A.-T.; Michael, M.; Sharma, A.; Torrellas, J. The Augmint multiprocessor simulation toolkit for Intel x86 architectures. In procs of Computer Design: VLSI in Computers and Processors, 1996.

Cited By

View all
  • (2016)A power-efficient 3-D on-chip interconnect for multi-core accelerators with stacked L2 cacheProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972149(1465-1468)Online publication date: 14-Mar-2016
  • (2015)Cost-Effective Design of Mesh-of-Tree Interconnect for Multicore Clusters With 3-D Stacked L2 Scratchpad MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.234603223:9(1828-1841)Online publication date: Sep-2015
  • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '09: Proceedings of the 46th Annual Design Automation Conference
July 2009
994 pages
ISBN:9781605584973
DOI:10.1145/1629911
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NOC
  2. SOC
  3. arbitration
  4. level-1 non-uniform cache architecture
  5. memory structure
  6. multi-core
  7. ring interconnection
  8. single-cycle transactions

Qualifiers

  • Research-article

Conference

DAC '09
Sponsor:
DAC '09: The 46th Annual Design Automation Conference 2009
July 26 - 31, 2009
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,264 of 4,035 submissions, 31%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)A power-efficient 3-D on-chip interconnect for multi-core accelerators with stacked L2 cacheProceedings of the 2016 Conference on Design, Automation & Test in Europe10.5555/2971808.2972149(1465-1468)Online publication date: 14-Mar-2016
  • (2015)Cost-Effective Design of Mesh-of-Tree Interconnect for Multicore Clusters With 3-D Stacked L2 Scratchpad MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.234603223:9(1828-1841)Online publication date: Sep-2015
  • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
  • (2013)A high-performance multiported L2 memory IP for scalable three-dimensional integration2013 IEEE International 3D Systems Integration Conference (3DIC)10.1109/3DIC.2013.6702347(1-8)Online publication date: Oct-2013
  • (2012)Software Controlled Memories for Scalable Many-Core ArchitecturesProceedings of the 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2012.60(1-10)Online publication date: 19-Aug-2012
  • (2011)Hierarchical circuit-switched NoC for multicore video processingMicroprocessors & Microsystems10.1016/j.micpro.2010.09.00935:2(182-199)Online publication date: 1-Mar-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media