Article

LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support

Authors:

Yuan XieAuthors Info & Claims

MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 409 - 420

https://doi.org/10.1109/MICRO.2010.21

Published: 04 December 2010 Publication History

Abstract

Providing quality-of-service (QoS) for concurrent tasks in many-core architectures is becoming important, especially for real-time applications. QoS support for on-chip shared resources (such as shared cache, bus, and memory controllers)in chip-multiprocessors has been investigated in recent years. Unlike other shared resources, network-on-chip (NoC) does not typically have central arbitration of accesses to the shared resource. Instead, each router shares the responsibility of resource allocation. While such distributed nature benefits the scalable performance of NoC, it also dramatically complicates the problem of providing QoS support for individual flows. Existing approaches to address this problem suffer from various shortcomings such as low network utilization and weak QoS guarantees. In this work, we propose LOFT No architecture which features both high network utilization and strong QoS guarantees. LOFT is based on the combination of two mechanisms: a) locally-synchronized frames (LSF), which is a distributed frame-based scheduling mechanism that provides flexible QoS guarantees to different flows and b)flit-reservation (FRS), which is a flow-control mechanism integrated in LSF that improves network utilization. The experimental results show that LOFT delivers flexible and reliable QoS guarantees while sufficiently utilizes available network capacity to gain high overall throughput.

References

[1]

Intel Core¿ i7 Processor Families. Intel Corporation. {Online}. Available: http://www.intel.com/design/corei7/

[2]

L. Seiler, D. Carmean, E. Sprangle et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Trans. Graphics, vol. 27, no. 3, pp. 1-15, 2008.

Digital Library

[3]

Tilera Processor Families. Tilera Corporation. {Online}. Available: http://www.tilera.com/products/processors.php

[4]

T. G. Mattson, R. Van der Wijngaart, and M. Frumkin, "Programming the Intel 80-core network-on-a-chip terascale processor," in Proc. of Conference on Supercomputing, 2008, pp. 1-11.

Digital Library

[5]

J. Howard, S. Dighe, Y. Hoskote et al., "A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS," in Digest of Technical Papers in Solid-State Circuits Conference, 7-11 2010, pp. 108-109.

[6]

R. Iyer, "CQoS: a framework for enabling QoS in shared caches of CMP platforms," in Proc. of International Conference on Supercomputing, 2004, pp. 257-266.

Digital Library

[7]

K. J. Nesbit, J. Laudon, and J. E. Smith, "Virtual private caches," in Proc. of International Symposium on Computer architecture, 2007, pp. 57-68.

Digital Library

[8]

S. Srikantaiah, R. Das, A. K. Mishra et al., "A case for integrated processor-cache partitioning in chip multiprocessors," in Proc. of the Conference on High Performance Computing Networking, Storage and Analysis, 2009, pp. 1-12.

Digital Library

[9]

K. J. Nesbit, N. Aggarwal, J. Laudon et al., "Fair queuing memory systems," in Proc. of International Symposium on Microarchitecture, 2006, pp. 208-222.

Digital Library

[10]

O. Mutlu and T. Moscibroda, "Stall-time fair memory access scheduling for chip-multiprocessors," in Proc. of International Symposium on Microarchitecture, 2007, pp. 146-160.

Digital Library

[11]

E. Ebrahimi, C. J. Lee, O. Mutlu et al., "Fairness via source throttling: a configurable and high-performance fairness substrate for multicore memory systems," in Proc. of Architectural Support for Programming Languages and Operating Systems, 2010, pp. 335-346.

Digital Library

[12]

J. W. Lee, M. C. Ng, and K. Asanovic, "Globally-synchronized frames for guaranteed quality-of-service in on-chip networks," in Proc. of International Symposium on Computer Architecture, 2008, pp. 89-100.

Digital Library

[13]

B. Grot, S. W. Keckler, and O. Mutlu, "Preemptive virtual clock: a flexible, efficient, and cost-effective QoS scheme for networks-on-chip," in Proc. of International Symposium on Microarchitecture, 2009, pp. 268-279.

Digital Library

[14]

K. Goossens, J. Dielissen, and A. Radulescu, "Æthereal network on chip: concepts, architectures, and implementations," IEEE Trans. Design and Test, vol. 22, no. 5, pp. 414-421, 2005.

Digital Library

[15]

M. Millberg, E. Nilsson, R. Thid, and A. Jantsch, "Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip," in Proc. of Conference on Design, Automation and Test in Europe, 2004, pp. 890-895.

Digital Library

[16]

T. Bjerregaard and J. Sparso, "A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip," in Proc. of Conference on Design, Automation and Test in Europe, 2005, pp. 1226-1231.

Digital Library

[17]

W.-D. Weber, J. Chou, I. Swarbrick, and D. Wingard, "A quality-of-service mechanism for interconnection networks in system-on-chips," in Proc. of Conference on Design, Automation and Test in Europe, 2005, pp. 1232-1237.

Digital Library

[18]

J. H. Kim and A. A. Chien, "Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks," in Proc. International Symposium on Computer architecture, 1996, pp. 226-236.

Digital Library

[19]

R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-aware prioritization mechanisms for on-chip networks," in Proc. of International Symposium on Microarchitecture, 2009, pp. 280-291.

Digital Library

[20]

L.-S. Peh and W. J. Dally, "Flit-reservation flow control," IEEE Trans. on Parallel and Distributed Systems, vol. 3, no. 3, pp. 194-205, 2000.

[21]

H. Zhang and S. Keshav, "Comparison of rate-based service disciplines," in Proc. of Conference on Communications architecture & protocols, 1991, pp. 113-121.

Digital Library

[22]

J. Kim, D. Park, C. Nicopoulos, N. Vijaykrishnan, and C. R. Das, "Design and analysis of an NoC architecture from performance, reliability and energy perspective," in Proc. of ACM symposium on Architecture for Networking and Communications Systems, 2005, pp. 173-182.

Digital Library

[23]

L.-S. Peh and W. J. Dally, "A delay model and speculative architecture for pipelined routers," in Proc. of International Symposium on High-Performance Computer Architecture, 2001, p. 255.

Digital Library

[24]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proc. of International Symposium on Microarchitecture, 2009, pp. 469-480.

Digital Library

[25]

D. Vantrease, N. Binkert, R. Schreiber, and M. H. Lipasti, "Light speed arbitration and flow control for nanophotonic interconnects," in Proc. of International Symposium on Microarchitecture, 2009, pp. 304-315.

Digital Library

Cited By

Song YLin B(2020)Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware ForwardingACM Transactions on Architecture and Code Optimization10.1145/337714917:1(1-26)Online publication date: 4-Mar-2020
https://dl.acm.org/doi/10.1145/3377149
Song YAlavoine OLin B(2019)A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS TargetsACM Transactions on Architecture and Code Optimization10.1145/331980416:2(1-23)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3319804
Chung JRo YKim JAhn JKim JKim JLee JAhn J(2019)Enforcing Last-level Cache Partitioning through Memory Virtual ChannelsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00016(97-109)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1109/PACT.2019.00016
Show More Cited By

Index Terms

LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support
1. Hardware

Recommendations

LOFT

As one of the main trends of communication technology for 3D integrated circuits, the 3D networks-on-chip (NoCs) have drawn high concern from the academia. The links are main components of the NoCs. For the permanent link faults, the fault-tolerant ...
LOFT: Low-Overhead Freshness Transmission in Sensor Networks
SUTC '08: Proceedings of the 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008)

Sequence numbers have been used by a variety of network protocols as freshness identifiers to achieve reliable transmission and provide protection against replay attacks. The number of bits allocated for a sequence number shall not be too small in order ...
LOFT: lock-free transactional data structures
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

Concurrent data structures are widely used in modern multicore architectures, providing atomicity (linearizability) for each concurrent operation. However, it is often desirable to execute several operations on multiple data structures atomically. We ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

December 2010

542 pages

ISBN:9780769542997

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 December 2010

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
413
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song YLin B(2020)Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware ForwardingACM Transactions on Architecture and Code Optimization10.1145/337714917:1(1-26)Online publication date: 4-Mar-2020
https://dl.acm.org/doi/10.1145/3377149
Song YAlavoine OLin B(2019)A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS TargetsACM Transactions on Architecture and Code Optimization10.1145/331980416:2(1-23)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3319804
Chung JRo YKim JAhn JKim JKim JLee JAhn J(2019)Enforcing Last-level Cache Partitioning through Memory Virtual ChannelsProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2019.00016(97-109)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1109/PACT.2019.00016
Song WKim GJung HChung JAhn JLee JKim J(2017)History-Based Arbitration for Fairness in Processor-Interconnect of NUMA ServersACM SIGARCH Computer Architecture News10.1145/3093337.303775345:1(765-777)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093337.3037753
Song WKim GJung HChung JAhn JLee JKim J(2017)History-Based Arbitration for Fairness in Processor-Interconnect of NUMA ServersACM SIGPLAN Notices10.1145/3093336.303775352:4(765-777)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3093336.3037753
Song WKim GJung HChung JAhn JLee JKim JChen YTemam OCarter J(2017)History-Based Arbitration for Fairness in Processor-Interconnect of NUMA ServersProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3037697.3037753(765-777)Online publication date: 4-Apr-2017
https://dl.acm.org/doi/10.1145/3037697.3037753
Zhan JKayiran OLoh GDas CXie YHsu WYang CLipasti MLee H(2016)OSCARThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195672(1-13)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195672
Lu ZYao Y(2016)Aggregate Flow-Based Performance Fairness in CMPsACM Transactions on Architecture and Code Optimization10.1145/301442913:4(1-27)Online publication date: 28-Dec-2016
https://dl.acm.org/doi/10.1145/3014429
Jie JMingche LLiquan XEbrahimi MLocatelli R(2015)A Low-Latency and High-Throughput Multiple-Level Arbitration Scheme Supporting Quality-of-Service in Optical On-chip NetworkProceedings of the 8th International Workshop on Network on Chip Architectures10.1145/2835512.2835519(9-14)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2835512.2835519
Sarkar AMueller FRamaprasad H(2015)Static Task Partitioning for Locked Caches in Multicore Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/263855714:1(1-30)Online publication date: 21-Jan-2015
https://dl.acm.org/doi/10.1145/2638557
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten