Nothing Special   »   [go: up one dir, main page]

US9007920B2 - QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes - Google Patents

QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes Download PDF

Info

Publication number
US9007920B2
US9007920B2 US13/745,696 US201313745696A US9007920B2 US 9007920 B2 US9007920 B2 US 9007920B2 US 201313745696 A US201313745696 A US 201313745696A US 9007920 B2 US9007920 B2 US 9007920B2
Authority
US
United States
Prior art keywords
noc
channels
weights
traffic flows
computed weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/745,696
Other versions
US20140204764A1 (en
Inventor
Sailesh Kumar
Eric Norige
Joji Philip
Mahmud Hassan
Sundari Mitra
Joseph Rowlands
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
NetSpeed Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NetSpeed Systems Inc filed Critical NetSpeed Systems Inc
Priority to US13/745,696 priority Critical patent/US9007920B2/en
Assigned to Netspeed Systems reassignment Netspeed Systems ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HASSAN, MAHMUD, KUMAR, SAILESH, MITRA, SUNDARI, NORIGE, Eric, PHILIP, JOJI, ROWLANDS, JOSEPH
Publication of US20140204764A1 publication Critical patent/US20140204764A1/en
Application granted granted Critical
Publication of US9007920B2 publication Critical patent/US9007920B2/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: Netspeed Systems, Inc.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0032Distributed allocation, i.e. involving a plurality of allocating devices, each making partial allocation
    • H04L5/0035Resource allocation in a cooperative multipoint environment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/302Route determination based on requested QoS

Definitions

  • Methods and example implementations described herein are generally directed to interconnect architecture, and more specifically, to weight assignment and weighted arbitration of node channels in a Network on Chip (NoC) system interconnect architecture.
  • NoC Network on Chip
  • SoCs System-on-Chips
  • CMPs Chip Multi-Processors
  • the on-chip interconnect plays a role in providing high-performance communication between the various components. Due to scalability limitations of traditional buses and crossbar based interconnects, Network-on-Chip (NoC) has emerged as a paradigm to interconnect a large number of components on the chip.
  • NoC is a global shared communication infrastructure made up of several routing nodes interconnected with each other using point-to-point physical links.
  • Messages are injected by the source and are routed from the source node to the destination over multiple intermediate nodes and physical links.
  • the destination node then ejects the message and provides the message to the destination.
  • the terms ‘components’, ‘blocks’, ‘hosts’ or ‘cores’ will be used interchangeably to refer to the various system components which are interconnected using a NoC. Terms ‘routers’ and ‘nodes’ will also be used interchangeably. Without loss of generalization, the system with multiple interconnected components will itself be referred to as a ‘multi-core system’.
  • FIG. 1( a ) shows a 3D mesh NoC, where there are three layers of 3 ⁇ 3 2D mesh NoC shown over each other.
  • the NoC routers have up to two additional ports, one connecting to the router in the higher layer, and another connecting to the router in the lower layer.
  • Router 111 in the middle layer of the example has both ports used one connecting to the router at the top layer and another connecting to the router at the bottom layer.
  • Routers 110 and 112 are at the bottom and top mesh layers respectively, therefore they have only the upper facing and lower facing ports of the two additional ports used.
  • the inter-layer ports or channels between these three routers are 113 and 114 .
  • Packets are message transport units for intercommunication between various components. Routing involves identifying a path composed of a set of routers and physical links of the network over which packets are sent from a source to a destination. Components are connected to one or multiple ports of one or multiple routers, with each such port having a unique ID. Packets carry the destination's router and port ID for use by the intermediate routers to route the packet to the destination component.
  • routing techniques include deterministic routing, which involves choosing the same path from A to B for every packet. This form of routing is independent from the state of the network and does not load balance across path diversities, which might exist in the underlying network. However, such deterministic routing may be implemented in hardware, maintains packet ordering and may be rendered free of network level deadlocks. Shortest path routing may minimize the latency as such routing reduces the number of hops from the source to the destination. For this reason, the shortest path may also be the lowest power path for communication between the two components. Dimension-order routing is a form of deterministic shortest path routing in 2-D, 2.5-D, and 3-D mesh networks. In this routing scheme, messages are routed along each coordinates in a particular sequence until the message reaches the final destination.
  • FIG. 2 pictorially illustrates an example of XY routing in a two dimensional mesh. More specifically, FIG. 2 illustrates XY routing from node ‘34’ to node ‘00’.
  • each component is connected to only one port of one router.
  • a packet is first routed over the x-axis till the packet reaches node ‘04’ where the x-coordinate of the node is the same as the x-coordinate of the destination node.
  • the packet is next routed over the y-axis until the packet reaches the destination node.
  • dimension order routing may not be feasible between certain source and destination nodes, and alternative paths may have to be taken.
  • the alternative paths may not be shortest or minimum turn.
  • Source routing and routing using tables are other routing options used in NoC.
  • Adaptive routing can dynamically change the path taken between two points on the network based on the state of the network. This form of routing may be complex to analyze and implement.
  • a NoC interconnect may contain multiple physical networks. Over each physical network, there may exist multiple virtual networks, wherein different message types are transmitted over different virtual networks. In this case, at each physical link or channel, there are multiple virtual channels; each virtual channel may have dedicated buffers at both end points. In any given clock cycle, only one virtual channel can transmit data on the physical channel.
  • NoC interconnects may employ wormhole routing, wherein, a large message or packet is broken into small pieces known as flits (also referred to as flow control digits).
  • the first flit is the header flit, which holds information about this packet's route and key message level info along with payload data and sets up the routing behavior for all subsequent flits associated with the message.
  • one or more body flits follows the head flit, containing the remaining payload of data.
  • the final flit is the tail flit, which in addition to containing the last payload also performs some bookkeeping to close the connection for the message.
  • virtual channels are often implemented.
  • the physical channels are time sliced into a number of independent logical channels called virtual channels (VCs).
  • VCs provide multiple independent paths to route packets, however they are time-multiplexed on the physical channels.
  • a virtual channel holds the state needed to coordinate the handling of the flits of a packet over a channel. At a minimum, this state identifies the output channel of the current node for the next hop of the route and the state of the virtual channel (idle, waiting for resources, or active).
  • the virtual channel may also include pointers to the flits of the packet that are buffered on the current node and the number of flit buffers available on the next node.
  • wormhole plays on the way messages are transmitted over the channels: the output port at the next router can be so short that received data can be translated in the head flit before the full message arrives. This allows the router to quickly set up the route upon arrival of the head flit and then opt out from the rest of the conversation. Since a message is transmitted flit by flit, the message may occupy several flit buffers along its path at different routers, creating a worm-like image.
  • each of the five components are connected to a local router node, and the router nodes are connected with each other using point to point channels as shown in FIG. 3 .
  • each of the channels have a receive bandwidth of the destination component equal to a transmit bandwidth of the source component.
  • FIG. 4 the routers and components are separated for clarity the channels that connect components with their local routers are illustrated.
  • router 41 in FIG. 4 messages arriving at the left input port (e.g., from router 42 ) and the bottom input port (e.g., from source 4) will contend for the right output port (e.g., to router 40 ). If routers implement uniformly fair arbitration policy to arbitrate between incoming messages at different input ports contending for an output port, then the output port's bandwidth will be equally split between the two input ports as shown. Each input port will receive 50% of the destination bandwidth—source 4 therefore will receive half of the destination bandwidth.
  • router 42 in FIG. 4 messages arriving at the left input port (e.g. from router 43 ) and bottom input port (e.g., from source 3) will contend for the right output port. If routers implement uniformly fair arbitration policy to arbitrate between incoming messages at different input ports contending for an output port, then the 50% output port's bandwidth (computed in the above step) will be equally split between the two input ports as shown. Each input port will receive 25% of the destination bandwidth—source 3 therefore will receive a quarter of the destination bandwidth.
  • FIG. 4 illustrates that even though each router employs a uniformly fair arbitration policy wherein the router gives fair share of output port bandwidth among all input port contenders, the four sources receive vastly different shares of the destination bandwidth.
  • the bandwidth allocated to various source components when they content for various destinations may vary substantially. This may be undesirable in many applications, wherein fair or equal allocation of various resources among all contenders may be important to achieve a high system performance.
  • weighted allocation is desired, so that the various resource bandwidths are allocated among various contenders in a pre-specified ratio.
  • Rate limiting the sources Each source contending for a resource destination is allowed to send data at a pre-specified rate based on its fair share. This technique is independent of the state of other sources, whether the other sources are contending for the resource or not. Therefore, based upon the pre-specified rates of sources, rate limiting of the sources can either lead to under-utilization of resource bandwidth, or unfair allocation.
  • Age based arbitration Every message injected by various components carries timestamp information, which describes the age of the message. Within the NoC interconnect, routers give higher preference to older messages over newer messages, whenever multiple messages content for an output port. This technique can provide end-to-end uniform fairness, however it is unable to provide weighted fairness. Furthermore, age based arbitration comes at a high implementation cost of additional bits needed to carry the age information and complex circuitry at every router to determine the oldest message.
  • aspects of the present application include a method, which may involve computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; using the weights to perform weighted arbitration between channels in the NoC to provide QoS; dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and performing weighted arbitration using the newly computed channel weights.
  • NoC network on chip
  • aspects of the present application include a computer readable storage medium storing instructions for executing a process.
  • the process may involve computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; using the weights to perform weighted arbitration between channels in the NoC to provide QoS; dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and performing weighted arbitration using the newly computed channel weights.
  • NoC network on chip
  • aspects of the present application include a system, which may involve a weight allocation module for computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; a weighted arbitration module for using the weights to perform weighted arbitration between channels in the NoC to provide QoS; a weight adjustment module dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and having the weighted arbitration by the weighted arbitration module by using the newly computed channel weights.
  • NoC network on chip
  • FIGS. 1( a ), 1 ( b ) 1 ( c ) and 1 ( d ) illustrate examples of Bidirectional ring, 2D Mesh, 2D Taurus, and 3D Mesh NoC Topologies.
  • FIG. 2 illustrates an example of XY routing in a related art two dimensional mesh.
  • FIG. 3 illustrates an example of a NoC interconnect.
  • FIG. 4 illustrates a NoC interconnect with routers and interconnects separated for clarity and bandwidth received by various channels.
  • FIG. 5 illustrates an example assignment of weights to various NoC channels, in accordance with an example implementation.
  • FIG. 6 illustrates the flow chart to compute the weight of various channels using their bandwidth requirement and then performing weight round robin arbitrations at each router, in accordance with an example implementation.
  • FIG. 7 illustrates a router's arbiter in accordance with an example implementation.
  • FIG. 8 illustrates a computer/server block diagram upon which the example implementations described herein may be implemented.
  • Example implementations of the present application involve arbitration by configuring router port weights and using the weights for local arbitration at the routers.
  • Such weighted arbitration schemes have been employed in Internet traffic management, but not in a mesh or Taurus 2-D, 2.5-D or 3-D NoC interconnects.
  • Assigning weights presents some difficulty in implementing such a solution, as well other problems. For example, if all of the sources are participating, then a configuration of router port weights may provide fair allocation. However when several sources are not contending for the resources, then unfair allocation may occur, and it may be difficult to adjust weights in such scenarios.
  • When multiple flows between different sets of source and destination pairs share a channel there is some difficulty in ensuring that the two flows get their fair share of bandwidth and are not affected by each other. If multiple virtual or physical channels are used to isolate the two flows then there is additional difficulty with respect to how the weights are assigned to the channels.
  • the example implementations described herein are directed to resolving such difficulties.
  • Example implementations described herein are directed to solutions for 2-D, 2.5-D and 3-D NoC interconnects that provide end-to-end uniform-fair and weighted-fair allocation of resource bandwidths among various contending components.
  • the example implementations are fully distributed and involve assigning weights to various channels of the NoC interconnect based on the system traffic Quality of Service (QoS) specification, and performing weighted arbitrations between various channels at the NoC nodes based on the weight of the channels.
  • QoS system traffic Quality of Service
  • a sample design for implementation of weighted arbitration between multiple channels at a NoC node is also described.
  • Example implementations further involve computing the weight of various channels based on the bandwidth specification of traffic flows that go over the channels.
  • the flows that go over a channel depend upon the routes that are being used in the NoC, and the routes may be adjusted to balance the weights of various channels.
  • Example implementations also involve optionally updating the channel weights dynamically based on the traffic conditions in the network.
  • Each node may monitor the traffic flows that are currently active and contending for the resources and those that are currently inactive, and accordingly update the weights of the channels to avoid any unfair bandwidth allocations. Further, some or all weights may be dynamically updated, and some weights may be set as permanently static weights with no adjustments used, depending on the desired implementation.
  • Example implementations of computing the weights to be assigned to various channels and performing weighted arbitration at various nodes to achieve end-to-end QoS and fair allocation of resources are described below. All traffic flows between all pairs of source and destination nodes are first routed using the default routes that avoids deadlock and meets the QoS isolation properties. Subsequently, at every virtual and physical channel in the NoC, the bandwidth requirements of the flows over the channel are added together, to obtain the total bandwidth requirement at the channel. Peak or average bandwidth numbers may be used based on the system performance specification. The resulting bandwidth requirement of NoC channels can be used directly as their weights; however a large number of bits may be required to represent the weights if there is a high degree of bandwidth disparity between various NoC channels. This may also complicate the weighted arbitration logic. In an implementation of the system, the weight of a channel may be restricted to use a certain maximum number of bits, say n-bits.
  • the bandwidth of a channel at a node may be normalized with respect to the maximum bandwidth of all channels at the node. Such normalization can be utilized because any channel at a node only competes with other channels at this node during arbitration. Additional implementations may further normalize with respect to the maximum bandwidth of only those channels at this node with which this channel contends with.
  • the normalized bandwidth numbers are then multiplied by 2 n , and the integer part of the result is represented with n-bits, which is used as the weight of the channels.
  • a channel's bandwidth is normalized over only the channels present at the same node or the ones that contents with this channel as opposed to all NoC channels, the normalization may result in the best possible approximation of channel bandwidth to n-bit weights with a minimum loss of information.
  • Such normalization can be useful as the bandwidth at various channels may differ widely in a heterogeneous mesh (see, for example, U.S. patent application Ser. No. 13/647,557, herein incorporated by reference in its entirety for all purposes), or Taurus NoC, especially in a 2.5-D or 3-D organization.
  • FIG. 5 an example assignment of weights to various channels of the NoC illustrated in FIG. 4 is shown in FIG. 5 .
  • the bandwidth at the incoming channel and the outgoing channel is same, therefore after normalization their weights are 1.
  • the bandwidth requirement of the incoming channel from left is three times the bandwidth requirement of the incoming channel at the bottom, as there are three flows being carried on the former channel versus only one flow at the latter. Therefore the weights are 3 and 1 respectively.
  • weights of the left and bottom incoming channels at node 42 are 2 and 1 respectively and at node 43 are 1 and 1 respectively.
  • the example weight computation and weighted arbitration first lists all flows between source and destination pairs that traverse various channels of the NoC (step 600 in FIG. 6 ). Then for all channels, based on its flows, its maximum bandwidth requirement is computed (step 601 in FIG. 6 ). Once the bandwidth requirements are known, at each router, the bandwidth requirement of each of its channels is normalized with respect to the router channel that has the highest bandwidth requirement and then represented using a n-bit weight value (step 602 in FIG. 6 ). Finally, once all weight values of all channels are known, the weight round robin arbitration policy is implemented at all routers (step 603 in FIG. 6 ).
  • weighted arbitration at the NoC nodes is described below.
  • a NoC node there may exist multiple incoming and outgoing physical channels, and within each physical channel, there may be multiple virtual channels.
  • multiple input VCs may contend for each output virtual channel.
  • the VC is assigned to the input for an entire packet which may be composed of multiple flits.
  • the input VCs that got an output VC assigned arbitrate for the output physical channel since in one cycle only one flit can be transmitted on a physical channel.
  • multiple VCs may be interleaved on a physical channel; in a fair arbitration, each VC contending for the output physical channel sends one flit per cycle in a round robin order.
  • deficit counter based weighted round robin system may be used. In this design when a VC's chance to transmit a flit on the output channel comes in the round-robin order, the VC is allowed to send up to w flits over the next several cycles, where w is the weight of the VC.
  • the round robin order proceeds to serve the next input VC.
  • the output VC assigned to the input VC remains assigned to it even if an End of Packet (EOP) flit appears, as long as there are more packets at the input VC to be sent to the output channel.
  • EOP End of Packet
  • a counter may be needed at every input VC, with the counter being at least as wide as the VC's weight. Whenever the VCs chance arrives in the round robin order to transmit a flit, the counter is reset to the VC's weight and is decremented by one every time the VC is able to send a flit. VC continues to win the arbitration at the output physical channel until counter becomes 0, or VC becomes empty, or VC does not have any more packets to send at the current output.
  • FIG. 7 illustrates an example of such weighted arbitration at a NoC node between three input VCs.
  • the three input VCs have packets for the same physical channel X 700 and has been assigned an output VC. Let each input VC has weight 2 , and the round robin pointer is at the first VC.
  • First VC has 5 single flit packets; the first three packets 701 , 702 , 703 are to be sent to channel X, while the next packet 704 is to be sent over another physical channel Y. If the deficit counter of the VC is 2, the VC will win arbitration for the physical channel during next two cycles, and will be able to send packets 701 and 702 .
  • the deficit counter of the VC will become zero, and the round robin pointer will move to the second input VC; the output VC assigned to the first input VC stays assigned.
  • the second input VC has a single packet 705 to be sent to channel X; after it is transmitted, the input channel becomes empty.
  • the output VC assigned to this input VC is freed up and the round robin pointer moves to the third input VC.
  • the third VC has two packets 706 and 707 , which are transmitted over next two cycles, and then the VC becomes empty as well as its deficit counter becomes zero.
  • the output VC assigned to the input VC is freed up and the round robin pointer advances, comes back to the first input VC. Packet 703 is transmitted in the next cycle.
  • the next packet 704 does not go on the output channel X, therefore the output VC assigned to the first input VC is freed up; this input VC will now contend for a VC at the output channel Y. Arbitration for the output channel X stops at this point, unless there are new packets that arrived at some input VC and has an output VC at channel X assigned to them.
  • the example implementation is fully distributed as weighted arbitrations are performed locally at every node between the channels present at the node.
  • the example implementation can scale well with increasing number of nodes in the NoC and can provide uniform and weighted allocation when all flows are active and contending for the resources.
  • FIG. 5 If in this system source 1 and source 2 are not contending for the destination, then only flows from source 3 and source 4 will arbitrate at node 41 . Based on the configured channel weights, source 3 will receive three times more bandwidth than source 4. The unfair allocation occurs because the weights are statically configured.
  • various channels at various NoC nodes may monitor the state of certain flows. When a flow becomes inactive at a channel, the channel's weight is reduced by the equivalent of flow's bandwidth. When a flow becomes active again, the channel's weight is incremented.
  • a flow may be defined to be active at a channel, if at least one packet has been received at the channel over certain fixed time period. If no packets have been received during this time period then the flow is assumed inactive. The time period may be defined based on the bandwidth of the slowest flow on the channel, or on the node or in the entire NoC. For every channel a timer is kept which expires after this time period.
  • An example implementation to efficiently track per-channel's flow activity uses a bit-vector per channel with one bit for every flow on the channel. Whenever a packet of a flow arrives, the corresponding bit in the bit-vector is set. Every time the timer expires, all bits are examined; if a bit is not set then the corresponding flow is assumed inactive. After all bits are examined, the bit-vector is cleared. Based on the status of the flows on the channel, the channel's weight is recomputed.
  • Alternative implementations to further reduce the number of bits needed per channel may be employed such as compressed bit vector, where a single bit tracks the activity of a group of flows, instead of a single flow.
  • a Bloom filter may also be used to represent the set of active flows on a channel.
  • FIG. 8 illustrates an example computer system 800 on which example implementations may be implemented.
  • the computer system 800 includes a server 805 which may involve an I/O unit 835 , storage 860 , and a processor 810 operable to execute one or more units as known to one of skill in the art.
  • the term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 810 for execution, which may come in the form of computer-readable storage mediums, such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible media suitable for storing electronic information, or computer-readable signal mediums, which can include transitory media such as carrier waves.
  • the I/O unit processes input from user interfaces 840 and operator interfaces 845 which may utilize input devices such as a keyboard, mouse, touch device, or verbal command.
  • the server 805 may also be connected to an external storage 850 , which can contain removable storage such as a portable hard drive, optical media (CD or DVD), disk media or any other medium from which a computer can read executable code.
  • the server may also be connected an output device 855 , such as a display to output data and other information to a user, as well as request additional information from a user.
  • the connections from the server 805 to the user interface 840 , the operator interface 845 , the external storage 850 , and the output device 855 may via wireless protocols, such as the 802.11 standards, Bluetooth® or cellular protocols, or via physical transmission media, such as cables or fiber optics.
  • the output device 855 may therefore further act as an input device for interacting with a user.
  • the processor 810 may execute one or more modules.
  • the weight assignment module 811 may be configured to compute the weights to be assigned to various channels at various nodes in the NoC based on the traffic profile and bandwidth information. Weights may be used for the weighted arbitration at various NoC nodes.
  • the weighted arbitration module 812 present at every NoC node implements the weighted arbitration; a deficit counter based implementation of weighted arbitration policy or an alternative implementations may be used.
  • the weight adjustment module 813 may be configured to dynamically adjust the weight of various channels at various NoC nodes, by monitoring the activity of the flows over the channel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Systems and methods described herein are directed to solutions for NoC interconnects that provide end-to-end uniform- and weighted-fair allocation of resource bandwidths among various contenders. The example implementations are fully distributed and involve computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels. Example implementations may involve using the weights to perform weighted arbitration between channels in the NoC to provide quality of service (QoS). The weights may be adjusted dynamically by monitoring the activity of flows at the channels. The newly adjusted weights can be used to perform the weighted arbitrations to avoid unfair bandwidth allocations.

Description

BACKGROUND
1. Technical Field
Methods and example implementations described herein are generally directed to interconnect architecture, and more specifically, to weight assignment and weighted arbitration of node channels in a Network on Chip (NoC) system interconnect architecture.
2. Related Art
The number of components on a chip is rapidly growing due to increasing levels of integration, system complexity and shrinking transistor geometry. Complex System-on-Chips (SoCs) may involve a variety of components e.g., processor cores, DSPs, hardware accelerators, memory and I/O, while Chip Multi-Processors (CMPs) may involve a large number of homogenous processor cores, memory and I/O subsystems. In both systems the on-chip interconnect plays a role in providing high-performance communication between the various components. Due to scalability limitations of traditional buses and crossbar based interconnects, Network-on-Chip (NoC) has emerged as a paradigm to interconnect a large number of components on the chip. NoC is a global shared communication infrastructure made up of several routing nodes interconnected with each other using point-to-point physical links.
Messages are injected by the source and are routed from the source node to the destination over multiple intermediate nodes and physical links. The destination node then ejects the message and provides the message to the destination. For the remainder of this application, the terms ‘components’, ‘blocks’, ‘hosts’ or ‘cores’ will be used interchangeably to refer to the various system components which are interconnected using a NoC. Terms ‘routers’ and ‘nodes’ will also be used interchangeably. Without loss of generalization, the system with multiple interconnected components will itself be referred to as a ‘multi-core system’.
There are several topologies in which the routers can connect to one another to create the system network. Bi-directional rings (as shown in FIG. 1( a)), 2-D (two dimensional) mesh (as shown in FIG. 1( b)) and 2-D Taurus (as shown in FIG. 1( c)) are examples of topologies in the related art. Mesh and Taurus can also be extended to 2.5-D (two and half dimensional) or 3-D (three dimensional) organizations. FIG. 1( d) shows a 3D mesh NoC, where there are three layers of 3×3 2D mesh NoC shown over each other. The NoC routers have up to two additional ports, one connecting to the router in the higher layer, and another connecting to the router in the lower layer. Router 111 in the middle layer of the example has both ports used one connecting to the router at the top layer and another connecting to the router at the bottom layer. Routers 110 and 112 are at the bottom and top mesh layers respectively, therefore they have only the upper facing and lower facing ports of the two additional ports used. The inter-layer ports or channels between these three routers are 113 and 114.
Packets are message transport units for intercommunication between various components. Routing involves identifying a path composed of a set of routers and physical links of the network over which packets are sent from a source to a destination. Components are connected to one or multiple ports of one or multiple routers, with each such port having a unique ID. Packets carry the destination's router and port ID for use by the intermediate routers to route the packet to the destination component.
Examples of routing techniques include deterministic routing, which involves choosing the same path from A to B for every packet. This form of routing is independent from the state of the network and does not load balance across path diversities, which might exist in the underlying network. However, such deterministic routing may be implemented in hardware, maintains packet ordering and may be rendered free of network level deadlocks. Shortest path routing may minimize the latency as such routing reduces the number of hops from the source to the destination. For this reason, the shortest path may also be the lowest power path for communication between the two components. Dimension-order routing is a form of deterministic shortest path routing in 2-D, 2.5-D, and 3-D mesh networks. In this routing scheme, messages are routed along each coordinates in a particular sequence until the message reaches the final destination. For example in a 3-D mesh network, one may first route along the X dimension until it reaches a router whose X-coordinate is equal to the X-coordinate of the destination router. Next, the message takes a turn and is routed in along Y dimension and finally takes another turn and moves along the Z dimension until the message reaches the final destination router. Dimension ordered routing is often minimal turn and shortest path routing.
FIG. 2 pictorially illustrates an example of XY routing in a two dimensional mesh. More specifically, FIG. 2 illustrates XY routing from node ‘34’ to node ‘00’. In the example of FIG. 2, each component is connected to only one port of one router. A packet is first routed over the x-axis till the packet reaches node ‘04’ where the x-coordinate of the node is the same as the x-coordinate of the destination node. The packet is next routed over the y-axis until the packet reaches the destination node.
In heterogeneous mesh topology in which one or more routers or one or more links are absent, dimension order routing may not be feasible between certain source and destination nodes, and alternative paths may have to be taken. The alternative paths may not be shortest or minimum turn.
Source routing and routing using tables are other routing options used in NoC. Adaptive routing can dynamically change the path taken between two points on the network based on the state of the network. This form of routing may be complex to analyze and implement.
A NoC interconnect may contain multiple physical networks. Over each physical network, there may exist multiple virtual networks, wherein different message types are transmitted over different virtual networks. In this case, at each physical link or channel, there are multiple virtual channels; each virtual channel may have dedicated buffers at both end points. In any given clock cycle, only one virtual channel can transmit data on the physical channel.
NoC interconnects may employ wormhole routing, wherein, a large message or packet is broken into small pieces known as flits (also referred to as flow control digits). The first flit is the header flit, which holds information about this packet's route and key message level info along with payload data and sets up the routing behavior for all subsequent flits associated with the message. Optionally, one or more body flits follows the head flit, containing the remaining payload of data. The final flit is the tail flit, which in addition to containing the last payload also performs some bookkeeping to close the connection for the message. In wormhole flow control, virtual channels are often implemented.
The physical channels are time sliced into a number of independent logical channels called virtual channels (VCs). VCs provide multiple independent paths to route packets, however they are time-multiplexed on the physical channels. A virtual channel holds the state needed to coordinate the handling of the flits of a packet over a channel. At a minimum, this state identifies the output channel of the current node for the next hop of the route and the state of the virtual channel (idle, waiting for resources, or active). The virtual channel may also include pointers to the flits of the packet that are buffered on the current node and the number of flit buffers available on the next node.
The term “wormhole” plays on the way messages are transmitted over the channels: the output port at the next router can be so short that received data can be translated in the head flit before the full message arrives. This allows the router to quickly set up the route upon arrival of the head flit and then opt out from the rest of the conversation. Since a message is transmitted flit by flit, the message may occupy several flit buffers along its path at different routers, creating a worm-like image.
Based upon the traffic between various end points, and the routes and physical networks that are used for various messages, different physical channels of the NoC interconnect may experience different levels of load and congestion. During congestion, when multiple sources transmit messages to the same destination, their messages may contend with each other and with the cross-traffic for the bandwidth. Therefore, the effective destination bandwidth received by each source will depend on their positions in the network, how their routes overlap with each other, cross-traffic along their routes to the destination, and the arbitration policies deployed at various routers where arbitration is needed. In spite of uniformly fair arbitration policies at all routers, depending on location of various sources there may be a substantial difference in the destination bandwidth received.
Consider a section of a NoC interconnect shown in FIG. 3, wherein four components (source 1, source 2, source 3, and source 4) transmit messages to one component (destination). In this example, the maximum data transmit bandwidth of the four source components is equal to the maximum data receive bandwidth of the destination component. Each of the five components are connected to a local router node, and the router nodes are connected with each other using point to point channels as shown in FIG. 3. In the example of FIG. 3, each of the channels have a receive bandwidth of the destination component equal to a transmit bandwidth of the source component.
In the system shown in FIG. 3, if all four source components attempt to transmit data at their peak transmit rate and if the destination component is ready to accept data at its peak receive rate, then messages from the four source components will contend with each other within the NoC interconnect.
In FIG. 4, the routers and components are separated for clarity the channels that connect components with their local routers are illustrated.
At router 41 in FIG. 4, messages arriving at the left input port (e.g., from router 42) and the bottom input port (e.g., from source 4) will contend for the right output port (e.g., to router 40). If routers implement uniformly fair arbitration policy to arbitrate between incoming messages at different input ports contending for an output port, then the output port's bandwidth will be equally split between the two input ports as shown. Each input port will receive 50% of the destination bandwidth—source 4 therefore will receive half of the destination bandwidth.
At router 42 in FIG. 4, messages arriving at the left input port (e.g. from router 43) and bottom input port (e.g., from source 3) will contend for the right output port. If routers implement uniformly fair arbitration policy to arbitrate between incoming messages at different input ports contending for an output port, then the 50% output port's bandwidth (computed in the above step) will be equally split between the two input ports as shown. Each input port will receive 25% of the destination bandwidth—source 3 therefore will receive a quarter of the destination bandwidth.
At router 43 in FIG. 4, messages arriving at the left input port (e.g., from router 44) and bottom input port (e.g., from source 2) will contend for the right output port (e.g., to router 42). If routers implement uniformly fair arbitration policy to arbitrate between incoming messages at different input ports contending for an output port, then the 25% output port's bandwidth (computed in the above step) will be equally split between the two input ports as shown. Each input port will receive 12.5% of the destination bandwidth—source 2 therefore will receive 12.5% of the destination bandwidth. The remaining 12.5% bandwidth will be received by source 1.
The example of FIG. 4 illustrates that even though each router employs a uniformly fair arbitration policy wherein the router gives fair share of output port bandwidth among all input port contenders, the four sources receive vastly different shares of the destination bandwidth. In a complex network with additional cross-traffic, the bandwidth allocated to various source components when they content for various destinations may vary substantially. This may be undesirable in many applications, wherein fair or equal allocation of various resources among all contenders may be important to achieve a high system performance. In many systems, weighted allocation is desired, so that the various resource bandwidths are allocated among various contenders in a pre-specified ratio.
There are several techniques in the related art to provide uniform or weighted fair arbitration within a single router, wherein the output port bandwidth is allocated to contending input ports based on the weight specification. Weighted round-robin, deficit round-robin, weighted fair queuing, etc. are a few techniques that are used in the related art. Guaranteeing weighted- or uniform-allocation of various resources among contenders in a distributed NoC interconnect with resources and contenders connected at arbitrary positions in the NoC interconnect is challenging. A few techniques that are used in the related art are described below.
Rate limiting the sources: Each source contending for a resource destination is allowed to send data at a pre-specified rate based on its fair share. This technique is independent of the state of other sources, whether the other sources are contending for the resource or not. Therefore, based upon the pre-specified rates of sources, rate limiting of the sources can either lead to under-utilization of resource bandwidth, or unfair allocation.
Age based arbitration: Every message injected by various components carries timestamp information, which describes the age of the message. Within the NoC interconnect, routers give higher preference to older messages over newer messages, whenever multiple messages content for an output port. This technique can provide end-to-end uniform fairness, however it is unable to provide weighted fairness. Furthermore, age based arbitration comes at a high implementation cost of additional bits needed to carry the age information and complex circuitry at every router to determine the oldest message.
SUMMARY
Aspects of the present application include a method, which may involve computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; using the weights to perform weighted arbitration between channels in the NoC to provide QoS; dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and performing weighted arbitration using the newly computed channel weights.
Aspects of the present application include a computer readable storage medium storing instructions for executing a process. The process may involve computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; using the weights to perform weighted arbitration between channels in the NoC to provide QoS; dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and performing weighted arbitration using the newly computed channel weights.
Aspects of the present application include a system, which may involve a weight allocation module for computing weights for various channels in a network on chip (NoC) based on the bandwidth requirements of flows at the channels; a weighted arbitration module for using the weights to perform weighted arbitration between channels in the NoC to provide QoS; a weight adjustment module dynamically adjusting the weights by monitoring the activity of flows at the channels to avoid unfair allocations, and having the weighted arbitration by the weighted arbitration module by using the newly computed channel weights.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1( a), 1(b) 1(c) and 1(d) illustrate examples of Bidirectional ring, 2D Mesh, 2D Taurus, and 3D Mesh NoC Topologies.
FIG. 2 illustrates an example of XY routing in a related art two dimensional mesh.
FIG. 3 illustrates an example of a NoC interconnect.
FIG. 4 illustrates a NoC interconnect with routers and interconnects separated for clarity and bandwidth received by various channels.
FIG. 5 illustrates an example assignment of weights to various NoC channels, in accordance with an example implementation.
FIG. 6 illustrates the flow chart to compute the weight of various channels using their bandwidth requirement and then performing weight round robin arbitrations at each router, in accordance with an example implementation.
FIG. 7 illustrates a router's arbiter in accordance with an example implementation.
FIG. 8 illustrates a computer/server block diagram upon which the example implementations described herein may be implemented.
DETAILED DESCRIPTION
Example implementations of the present application involve arbitration by configuring router port weights and using the weights for local arbitration at the routers. Such weighted arbitration schemes have been employed in Internet traffic management, but not in a mesh or Taurus 2-D, 2.5-D or 3-D NoC interconnects. Assigning weights presents some difficulty in implementing such a solution, as well other problems. For example, if all of the sources are participating, then a configuration of router port weights may provide fair allocation. However when several sources are not contending for the resources, then unfair allocation may occur, and it may be difficult to adjust weights in such scenarios. When multiple flows between different sets of source and destination pairs share a channel, there is some difficulty in ensuring that the two flows get their fair share of bandwidth and are not affected by each other. If multiple virtual or physical channels are used to isolate the two flows then there is additional difficulty with respect to how the weights are assigned to the channels. The example implementations described herein are directed to resolving such difficulties.
In a distributed NoC interconnect connecting various resources and components with each other using multiple routers and point to point links between the routers, unfair allocation of resource bandwidth may occur to various contending components depending upon their locations in the NoC topology, despite every router performing a fair arbitration locally among its input channels contending for an output channel.
Example implementations described herein are directed to solutions for 2-D, 2.5-D and 3-D NoC interconnects that provide end-to-end uniform-fair and weighted-fair allocation of resource bandwidths among various contending components. The example implementations are fully distributed and involve assigning weights to various channels of the NoC interconnect based on the system traffic Quality of Service (QoS) specification, and performing weighted arbitrations between various channels at the NoC nodes based on the weight of the channels. A sample design for implementation of weighted arbitration between multiple channels at a NoC node is also described.
Example implementations further involve computing the weight of various channels based on the bandwidth specification of traffic flows that go over the channels. The flows that go over a channel depend upon the routes that are being used in the NoC, and the routes may be adjusted to balance the weights of various channels.
Example implementations also involve optionally updating the channel weights dynamically based on the traffic conditions in the network. Each node may monitor the traffic flows that are currently active and contending for the resources and those that are currently inactive, and accordingly update the weights of the channels to avoid any unfair bandwidth allocations. Further, some or all weights may be dynamically updated, and some weights may be set as permanently static weights with no adjustments used, depending on the desired implementation.
Example implementations of computing the weights to be assigned to various channels and performing weighted arbitration at various nodes to achieve end-to-end QoS and fair allocation of resources are described below. All traffic flows between all pairs of source and destination nodes are first routed using the default routes that avoids deadlock and meets the QoS isolation properties. Subsequently, at every virtual and physical channel in the NoC, the bandwidth requirements of the flows over the channel are added together, to obtain the total bandwidth requirement at the channel. Peak or average bandwidth numbers may be used based on the system performance specification. The resulting bandwidth requirement of NoC channels can be used directly as their weights; however a large number of bits may be required to represent the weights if there is a high degree of bandwidth disparity between various NoC channels. This may also complicate the weighted arbitration logic. In an implementation of the system, the weight of a channel may be restricted to use a certain maximum number of bits, say n-bits.
To represent channel's bandwidth using n-bit weights, they may be normalized first. Instead of normalizing with respect to the maximum bandwidth value across the entire NoC, the bandwidth of a channel at a node may be normalized with respect to the maximum bandwidth of all channels at the node. Such normalization can be utilized because any channel at a node only competes with other channels at this node during arbitration. Additional implementations may further normalize with respect to the maximum bandwidth of only those channels at this node with which this channel contends with. The normalized bandwidth numbers are then multiplied by 2n, and the integer part of the result is represented with n-bits, which is used as the weight of the channels. Since a channel's bandwidth is normalized over only the channels present at the same node or the ones that contents with this channel as opposed to all NoC channels, the normalization may result in the best possible approximation of channel bandwidth to n-bit weights with a minimum loss of information. Such normalization can be useful as the bandwidth at various channels may differ widely in a heterogeneous mesh (see, for example, U.S. patent application Ser. No. 13/647,557, herein incorporated by reference in its entirety for all purposes), or Taurus NoC, especially in a 2.5-D or 3-D organization.
Using this scheme, an example assignment of weights to various channels of the NoC illustrated in FIG. 4 is shown in FIG. 5. There are four flows from four sources which are contending for the destination's bandwidth. Assume that each flow needs to share the bandwidth equally. At node 40, the bandwidth at the incoming channel and the outgoing channel is same, therefore after normalization their weights are 1. At node 41, the bandwidth requirement of the incoming channel from left is three times the bandwidth requirement of the incoming channel at the bottom, as there are three flows being carried on the former channel versus only one flow at the latter. Therefore the weights are 3 and 1 respectively. Similarly, weights of the left and bottom incoming channels at node 42 are 2 and 1 respectively and at node 43 are 1 and 1 respectively. With this weight assignment, if weighted arbitration is performed at all nodes then fair allocation of destination bandwidth may be provided to all sources.
The example weight computation and weighted arbitration first lists all flows between source and destination pairs that traverse various channels of the NoC (step 600 in FIG. 6). Then for all channels, based on its flows, its maximum bandwidth requirement is computed (step 601 in FIG. 6). Once the bandwidth requirements are known, at each router, the bandwidth requirement of each of its channels is normalized with respect to the router channel that has the highest bandwidth requirement and then represented using a n-bit weight value (step 602 in FIG. 6). Finally, once all weight values of all channels are known, the weight round robin arbitration policy is implemented at all routers (step 603 in FIG. 6).
An example implementation of weighted arbitration at the NoC nodes is described below. At a NoC node there may exist multiple incoming and outgoing physical channels, and within each physical channel, there may be multiple virtual channels. In a standard node design, during every cycle, multiple input VCs may contend for each output virtual channel. Once an input VC wins the arbitration for an output VC, the VC is assigned to the input for an entire packet which may be composed of multiple flits. After VC arbitration, the input VCs that got an output VC assigned, arbitrate for the output physical channel since in one cycle only one flit can be transmitted on a physical channel. During this arbitration, multiple VCs may be interleaved on a physical channel; in a fair arbitration, each VC contending for the output physical channel sends one flit per cycle in a round robin order. To implement a weighted arbitration, deficit counter based weighted round robin system may be used. In this design when a VC's chance to transmit a flit on the output channel comes in the round-robin order, the VC is allowed to send up to w flits over the next several cycles, where w is the weight of the VC. After w flits are transmitted (or if VC becomes empty or the VC does not have any more flits to send to this output channel), the round robin order proceeds to serve the next input VC. During the entire period over which up to w flits are transmitted, the output VC assigned to the input VC remains assigned to it even if an End of Packet (EOP) flit appears, as long as there are more packets at the input VC to be sent to the output channel. Output VCs continues to be freed up at the packet completion, i.e. upon an EOP flit.
To track whether w flits are transmitted or not, a counter may be needed at every input VC, with the counter being at least as wide as the VC's weight. Whenever the VCs chance arrives in the round robin order to transmit a flit, the counter is reset to the VC's weight and is decremented by one every time the VC is able to send a flit. VC continues to win the arbitration at the output physical channel until counter becomes 0, or VC becomes empty, or VC does not have any more packets to send at the current output.
FIG. 7 illustrates an example of such weighted arbitration at a NoC node between three input VCs. The three input VCs have packets for the same physical channel X 700 and has been assigned an output VC. Let each input VC has weight 2, and the round robin pointer is at the first VC. First VC has 5 single flit packets; the first three packets 701, 702, 703 are to be sent to channel X, while the next packet 704 is to be sent over another physical channel Y. If the deficit counter of the VC is 2, the VC will win arbitration for the physical channel during next two cycles, and will be able to send packets 701 and 702. At this time, the deficit counter of the VC will become zero, and the round robin pointer will move to the second input VC; the output VC assigned to the first input VC stays assigned. The second input VC has a single packet 705 to be sent to channel X; after it is transmitted, the input channel becomes empty. The output VC assigned to this input VC is freed up and the round robin pointer moves to the third input VC. The third VC has two packets 706 and 707, which are transmitted over next two cycles, and then the VC becomes empty as well as its deficit counter becomes zero. The output VC assigned to the input VC is freed up and the round robin pointer advances, comes back to the first input VC. Packet 703 is transmitted in the next cycle. The next packet 704 does not go on the output channel X, therefore the output VC assigned to the first input VC is freed up; this input VC will now contend for a VC at the output channel Y. Arbitration for the output channel X stops at this point, unless there are new packets that arrived at some input VC and has an output VC at channel X assigned to them.
The example implementation is fully distributed as weighted arbitrations are performed locally at every node between the channels present at the node. The example implementation can scale well with increasing number of nodes in the NoC and can provide uniform and weighted allocation when all flows are active and contending for the resources. However, there may be unfair allocation when certain flows are not active in the system. Consider the example shown in FIG. 5. If in this system source 1 and source 2 are not contending for the destination, then only flows from source 3 and source 4 will arbitrate at node 41. Based on the configured channel weights, source 3 will receive three times more bandwidth than source 4. The unfair allocation occurs because the weights are statically configured. When some flows are not active in a group of flows at a channel, then the active flows may receive the unused bandwidth provisioned for the inactive flows, thereby receiving unfair bandwidth. To address this, dynamic adjustments to the channel's weights based on the activity of various flows is proposed. In such a design, various channels at various NoC nodes may monitor the state of certain flows. When a flow becomes inactive at a channel, the channel's weight is reduced by the equivalent of flow's bandwidth. When a flow becomes active again, the channel's weight is incremented. A flow may be defined to be active at a channel, if at least one packet has been received at the channel over certain fixed time period. If no packets have been received during this time period then the flow is assumed inactive. The time period may be defined based on the bandwidth of the slowest flow on the channel, or on the node or in the entire NoC. For every channel a timer is kept which expires after this time period.
An example implementation to efficiently track per-channel's flow activity uses a bit-vector per channel with one bit for every flow on the channel. Whenever a packet of a flow arrives, the corresponding bit in the bit-vector is set. Every time the timer expires, all bits are examined; if a bit is not set then the corresponding flow is assumed inactive. After all bits are examined, the bit-vector is cleared. Based on the status of the flows on the channel, the channel's weight is recomputed. Alternative implementations to further reduce the number of bits needed per channel may be employed such as compressed bit vector, where a single bit tracks the activity of a group of flows, instead of a single flow. A Bloom filter may also be used to represent the set of active flows on a channel.
FIG. 8 illustrates an example computer system 800 on which example implementations may be implemented. The computer system 800 includes a server 805 which may involve an I/O unit 835, storage 860, and a processor 810 operable to execute one or more units as known to one of skill in the art. The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 810 for execution, which may come in the form of computer-readable storage mediums, such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of tangible media suitable for storing electronic information, or computer-readable signal mediums, which can include transitory media such as carrier waves. The I/O unit processes input from user interfaces 840 and operator interfaces 845 which may utilize input devices such as a keyboard, mouse, touch device, or verbal command.
The server 805 may also be connected to an external storage 850, which can contain removable storage such as a portable hard drive, optical media (CD or DVD), disk media or any other medium from which a computer can read executable code. The server may also be connected an output device 855, such as a display to output data and other information to a user, as well as request additional information from a user. The connections from the server 805 to the user interface 840, the operator interface 845, the external storage 850, and the output device 855 may via wireless protocols, such as the 802.11 standards, Bluetooth® or cellular protocols, or via physical transmission media, such as cables or fiber optics. The output device 855 may therefore further act as an input device for interacting with a user.
The processor 810 may execute one or more modules. The weight assignment module 811 may be configured to compute the weights to be assigned to various channels at various nodes in the NoC based on the traffic profile and bandwidth information. Weights may be used for the weighted arbitration at various NoC nodes. The weighted arbitration module 812 present at every NoC node implements the weighted arbitration; a deficit counter based implementation of weighted arbitration policy or an alternative implementations may be used. The weight adjustment module 813 may be configured to dynamically adjust the weight of various channels at various NoC nodes, by monitoring the activity of the flows over the channel.
Furthermore, some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to most effectively convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In the example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the example implementations disclosed herein. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and examples be considered as examples, with a true scope and spirit of the application being indicated by the following claims.

Claims (14)

What is claimed is:
1. A method, comprising:
computing weights for one or more channels of a Network on Chip (NoC) based on a bandwidth requirement of traffic flows for each of the one or more channels;
assigning the computed weights to the one or more channels of the NoC;
conducting weighted arbitration at nodes of the NoC; and
dynamically adjusting at least one of the computed weights assigned to the one or more channels of the NoC, based on activity of the traffic flows of the one or more channels corresponding to the at least one of the computed weights;
wherein the traffic flows that go over the each of the one or more channels are based on routes used in the NoC, wherein the traffic flows are representative of source/destination pairs across the routes used in the NoC.
2. The method of claim 1, wherein the dynamically adjusting the computed weights comprises,
monitoring the traffic flows of the one or more channels and determining an activity level of each of the traffic flows over a number of cycles.
3. The method of claim 1, wherein the conducting weighted arbitration at the nodes of the NoC is based on the dynamically adjusted computed weights.
4. The method of claim 1, further comprising:
determining at least one of the computed weights to be static and not dynamically adjustable.
5. The method of claim 1, wherein the NoC comprises one of a heterogeneous 2-D mesh, heterogeneous 2.5-D mesh, heterogeneous 3-D mesh, Taurus NoC interconnect and ring NoC interconnect.
6. A non-transitory computer readable storage medium storing instructions for executing a process, the instructions comprising:
computing weights for one or more channels of a Network on Chip (NoC) based on a bandwidth requirement of traffic flows for each of the one or more channels;
assigning the computed weights to the one or more channels of the NoC;
conducting weighted arbitration at nodes of the NoC; and
dynamically adjusting at least one of the computed weights assigned to the one or more channels of the NoC, based on activity of the traffic flows of the one or more channels corresponding to the at least one of the computed weights;
wherein the traffic flows that go over the each of the one or more channels are based on routes used in the NoC, wherein the traffic flows are representative of source/destination pairs across the routes used in the NoC.
7. The non-transitory computer readable storage medium of claim 6, wherein the dynamically adjusting the computed weights comprises monitoring the traffic flows of the one or more channels and determining an activity level of each of the traffic flows over a number of cycles.
8. The non-transitory computer readable storage medium of claim 6, wherein the conducting weighted arbitration at the nodes of the NoC is based on the dynamically adjusted computed weights.
9. The non-transitory computer readable storage medium of claim 6, wherein the instructions further comprise determining at least one of the computed weights to be static and not dynamically adjustable.
10. A system, comprising:
a processor configured to execute one or more modules;
a weight assignment module configured to compute weights for one or more channels of a Network on Chip (NoC) based on a bandwidth requirement of traffic flows for each of the one or more channels and to assign the computed weights to the one or more channels of the NoC;
a weighted arbitration module configured to conduct weighted arbitration at nodes of the NoC; and
a weight adjustment module configured to dynamically adjust at least one of the computed weights assigned to the one or more channels of the NoC, based on activity of the traffic flows of the one or more channels corresponding to the at least one of the computed weights;
wherein the traffic flows that go over the each of the one or more channels are based on routes used in the NoC, wherein the traffic flows are representative of source/destination pairs across the routes used in the NoC.
11. The system of claim 10, wherein the weight adjustment module is configured to dynamically adjust the computed weights by monitoring the traffic flows of the one or more channels and by determining an activity level of each of the traffic flows over a number of cycles.
12. The system of claim 10, wherein the weighted arbitration module is configured to conducting weighted arbitration at the nodes of the NoC based on the dynamically adjusted computed weights.
13. The system of claim 10, wherein the weight adjustment module is configured to determine at least one of the computed weights to be static and not dynamically adjustable.
14. The system of claim 10, wherein the NoC comprises one of a heterogeneous 2-D mesh, heterogeneous 2.5-D mesh, heterogeneous 3-D mesh, Taurus NoC interconnect and ring NoC interconnect.
US13/745,696 2013-01-18 2013-01-18 QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes Active 2033-06-28 US9007920B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/745,696 US9007920B2 (en) 2013-01-18 2013-01-18 QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/745,696 US9007920B2 (en) 2013-01-18 2013-01-18 QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes

Publications (2)

Publication Number Publication Date
US20140204764A1 US20140204764A1 (en) 2014-07-24
US9007920B2 true US9007920B2 (en) 2015-04-14

Family

ID=51207583

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/745,696 Active 2033-06-28 US9007920B2 (en) 2013-01-18 2013-01-18 QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes

Country Status (1)

Country Link
US (1) US9007920B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190258573A1 (en) * 2018-02-22 2019-08-22 Netspeed Systems, Inc. Bandwidth weighting mechanism based network-on-chip (noc) configuration
US10673745B2 (en) 2018-02-01 2020-06-02 Xilinx, Inc. End-to-end quality-of-service in a network-on-chip

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9906467B2 (en) * 2014-06-13 2018-02-27 D.E. Shaw Research, Llc Inverse weighted arbitration
US9940236B2 (en) 2014-12-17 2018-04-10 Intel Corporation Pointer chasing across distributed memory
US9558528B2 (en) * 2015-03-25 2017-01-31 Xilinx, Inc. Adaptive video direct memory access module
US9806997B2 (en) 2015-06-16 2017-10-31 At&T Intellectual Property I, L.P. Service specific route selection in communication networks
US10073939B2 (en) 2015-11-04 2018-09-11 Chronos Tech Llc System and method for application specific integrated circuit design
US9977853B2 (en) 2015-11-04 2018-05-22 Chronos Tech Llc Application specific integrated circuit link
US10181939B2 (en) * 2016-07-08 2019-01-15 Chronos Tech Llc Systems and methods for the design and implementation of an input and output ports for circuit design
US10637592B2 (en) 2017-08-04 2020-04-28 Chronos Tech Llc System and methods for measuring performance of an application specific integrated circuit interconnect
US11087057B1 (en) 2019-03-22 2021-08-10 Chronos Tech Llc System and method for application specific integrated circuit design related application information including a double nature arc abstraction
US11909822B2 (en) * 2021-06-30 2024-02-20 Gm Cruise Holdings Llc Streaming algorithm for deficit round robin arbitration
CN116055386B (en) * 2023-03-07 2023-06-02 燧原智能科技(成都)有限公司 Port weight updating method, device, chip and storage medium

Citations (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432785A (en) 1992-10-21 1995-07-11 Bell Communications Research, Inc. Broadband private virtual network service and system
US5764740A (en) 1995-07-14 1998-06-09 Telefonaktiebolaget Lm Ericsson System and method for optimal logical network capacity dimensioning with broadband traffic
US5991308A (en) 1995-08-25 1999-11-23 Terayon Communication Systems, Inc. Lower overhead method for data transmission using ATM and SCDMA over hybrid fiber coax cable plant
US6003029A (en) 1997-08-22 1999-12-14 International Business Machines Corporation Automatic subspace clustering of high dimensional data for data mining applications
US6249902B1 (en) 1998-01-09 2001-06-19 Silicon Perspective Corporation Design hierarchy-based placement
US20020071392A1 (en) 2000-10-25 2002-06-13 Telecommunications Research Laboratories, An Alberta Corporation Design of a meta-mesh of chain sub-networks
US20020073380A1 (en) 1998-09-30 2002-06-13 Cadence Design Systems, Inc. Block based design methodology with programmable components
US6415282B1 (en) 1998-04-22 2002-07-02 Nec Usa, Inc. Method and apparatus for query refinement
US20020095430A1 (en) 1999-12-30 2002-07-18 Decode Genetics Ehf SQL query generator utilizing matrix structures
US20040216072A1 (en) 2003-04-17 2004-10-28 International Business Machines Corporation Porosity aware buffered steiner tree construction
US20050147081A1 (en) 2003-12-26 2005-07-07 Swarup Acharya Route determination method and apparatus for virtually-concatenated data traffic
US6925627B1 (en) 2002-12-20 2005-08-02 Conexant Systems, Inc. Method and apparatus for power routing in an integrated circuit
US20060161875A1 (en) 2005-01-06 2006-07-20 Chae-Eun Rhee Method of creating core-tile-switch mapping architecture in on-chip bus and computer-readable medium for recording the method
US20060209846A1 (en) * 2005-03-08 2006-09-21 Commissariat A L'energie Atomique Globally asynchronous communication architecture for system on chip
US20070118320A1 (en) 2005-11-04 2007-05-24 Synopsys, Inc. Simulating topography of a conductive material in a semiconductor wafer
US20070244676A1 (en) 2006-03-03 2007-10-18 Li Shang Adaptive analysis methods
US20070256044A1 (en) 2006-04-26 2007-11-01 Gary Coryer System and method to power route hierarchical designs that employ macro reuse
US20070267680A1 (en) 2006-05-17 2007-11-22 Kabushiki Kaisha Toshiba Semiconductor integrated circuit device
US7318214B1 (en) 2003-06-19 2008-01-08 Invarium, Inc. System and method for reducing patterning variability in integrated circuit manufacturing through mask layout corrections
US20080072182A1 (en) 2006-09-19 2008-03-20 The Regents Of The University Of California Structured and parameterized model order reduction
US20080120129A1 (en) 2006-05-13 2008-05-22 Michael Seubert Consistent set of interfaces derived from a business object model
US20080232387A1 (en) * 2005-07-19 2008-09-25 Koninklijke Philips Electronics N.V. Electronic Device and Method of Communication Resource Allocation
US20090070726A1 (en) 2005-06-09 2009-03-12 Pyxis Technology, Inc. Enhanced Routing Grid System and Method
US7564865B2 (en) * 2004-04-05 2009-07-21 Koninklijke Philips Electronics N.V. Weight factor based allocation of time slot to use link in connection path in network on chip IC
US7590959B2 (en) 2005-10-31 2009-09-15 Seiko Epson Corporation Layout system, layout program, and layout method for text or other layout elements along a grid
US20090268677A1 (en) * 2008-04-24 2009-10-29 National Taiwan University network resource allocation system and method of the same
US20090313592A1 (en) 2006-10-10 2009-12-17 Ecole Polytechnique Federale De Lausanne (Epfl) Method to design network-on-chip (noc) - based communication systems
US20100040162A1 (en) 2007-04-10 2010-02-18 Naoki Suehiro Transmission method, transmission device, receiving method, and receiving device
US7724735B2 (en) * 2006-05-29 2010-05-25 Stmicroelectronics Sa On-chip bandwidth allocator
US7725859B1 (en) 2003-08-01 2010-05-25 Cadence Design Systems, Inc. Methods and mechanisms for inserting metal fill data
US7808968B1 (en) 1998-07-06 2010-10-05 At&T Intellectual Property Ii, L.P. Method for determining non-broadcast multiple access (NBMA) connectivity for routers having multiple local NBMA interfaces
US20110035523A1 (en) 2009-08-07 2011-02-10 Brett Stanley Feero Communication infrastructure for a data processing apparatus and a method of operation of such a communication infrastructure
US20110060831A1 (en) 2008-06-12 2011-03-10 Tomoki Ishii Network monitoring device, bus system monitoring device, method and program
US20110072407A1 (en) 2009-09-18 2011-03-24 International Business Machines Corporation Automatic Positioning of Gate Array Circuits in an Integrated Circuit Design
US7917885B2 (en) 2005-06-27 2011-03-29 Tela Innovations, Inc. Methods for creating primitive constructed standard cells
US20110154282A1 (en) 2009-12-17 2011-06-23 Springsoft, Inc. Systems and methods for designing and making integrated circuits with consideration of wiring demand ratio
US8050256B1 (en) 2008-07-08 2011-11-01 Tilera Corporation Configuring routing in mesh networks
US20110276937A1 (en) 2005-06-24 2011-11-10 Pulsic Limited Integrated Circuit Routing with Compaction
US8059551B2 (en) 2005-02-15 2011-11-15 Raytheon Bbn Technologies Corp. Method for source-spoofed IP packet traceback
US8099757B2 (en) 2007-10-15 2012-01-17 Time Warner Cable Inc. Methods and apparatus for revenue-optimized delivery of content in a network
US20120023473A1 (en) 2010-07-21 2012-01-26 Brown Jeffrey S Granular channel width for power optimization
US20120022841A1 (en) 2010-07-22 2012-01-26 Polyhedron Software Ltd. Method and apparatus for estimating the state of a system
US20120026917A1 (en) 2009-01-09 2012-02-02 Microsoft Corporation Server-centric high performance network architecture for modular data centers
US8136071B2 (en) 2007-09-12 2012-03-13 Neal Solomon Three dimensional integrated circuits and methods of fabrication
US20120110541A1 (en) 2010-10-29 2012-05-03 International Business Machines Corporation Constraint optimization of sub-net level routing in asic design
US8203938B2 (en) * 2008-05-22 2012-06-19 Level 3 Communications, Llc Multi-router IGP fate sharing
US20120155250A1 (en) 2010-12-21 2012-06-21 Verizon Patent And Licensing Inc. Method and system of providing micro-facilities for network recovery
US8281297B2 (en) 2003-02-05 2012-10-02 Arizona Board Of Regents Reconfigurable processing
US8312402B1 (en) 2008-12-08 2012-11-13 Cadence Design Systems, Inc. Method and apparatus for broadband electromagnetic modeling of three-dimensional interconnects embedded in multilayered substrates
US20130051397A1 (en) * 2011-08-26 2013-02-28 Sonics, Inc. Credit flow control scheme in a router with flexible link widths utilizing minimal storage
US20130080073A1 (en) 2010-06-11 2013-03-28 Waters Technologies Corporation Techniques for mass spectrometry peak list computation using parallel processing
US8412795B2 (en) * 2009-04-29 2013-04-02 Stmicroelectronics S.R.L. Control device for a system-on-chip and corresponding method
US20130103369A1 (en) 2011-10-25 2013-04-25 Massachusetts Institute Of Technology Methods and apparatus for constructing and analyzing component-based models of engineering systems
US8448102B2 (en) 2006-03-09 2013-05-21 Tela Innovations, Inc. Optimizing layout of irregular structures in regular layout context
US20130151215A1 (en) 2011-12-12 2013-06-13 Schlumberger Technology Corporation Relaxed constraint delaunay method for discretizing fractured media
US20130159944A1 (en) 2011-12-15 2013-06-20 Taiga Uno Flare map calculating method and recording medium
US20130174113A1 (en) 2011-12-30 2013-07-04 Arteris SAS Floorplan estimation
US8492886B2 (en) 2010-02-16 2013-07-23 Monolithic 3D Inc 3D integrated circuit with logic
US20130207801A1 (en) 2012-02-14 2013-08-15 James Barnes Approach for prioritizing network alerts
US20130219148A1 (en) 2012-02-17 2013-08-22 National Taiwan University Network on chip processor with multiple cores and routing method thereof
US8541819B1 (en) 2010-12-09 2013-09-24 Monolithic 3D Inc. Semiconductor device and structure
US20130263068A1 (en) 2012-03-27 2013-10-03 International Business Machines Corporation Relative ordering circuit synthesis
US8601423B1 (en) 2012-10-23 2013-12-03 Netspeed Systems Asymmetric mesh NoC topologies
US20130326458A1 (en) 2012-06-01 2013-12-05 International Business Machines Corporation Timing refinement re-routing
US8667439B1 (en) 2013-02-27 2014-03-04 Netspeed Systems Automatically connecting SoCs IP cores to interconnect nodes to minimize global latency and reduce interconnect cost
US20140068132A1 (en) 2012-08-30 2014-03-06 Netspeed Systems Automatic construction of deadlock free interconnects
US20140092740A1 (en) 2012-09-29 2014-04-03 Ren Wang Adaptive packet deflection to achieve fair, low-cost, and/or energy-efficient quality of service in network on chip devices
US20140098683A1 (en) 2012-10-09 2014-04-10 Netspeed Systems Heterogeneous channel capacities in an interconnect
US8705368B1 (en) * 2010-12-03 2014-04-22 Google Inc. Probabilistic distance-based arbitration
US8717875B2 (en) 2011-04-15 2014-05-06 Alcatel Lucent Condensed core-energy-efficient architecture for WAN IP backbones

Patent Citations (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5432785A (en) 1992-10-21 1995-07-11 Bell Communications Research, Inc. Broadband private virtual network service and system
US5764740A (en) 1995-07-14 1998-06-09 Telefonaktiebolaget Lm Ericsson System and method for optimal logical network capacity dimensioning with broadband traffic
US5991308A (en) 1995-08-25 1999-11-23 Terayon Communication Systems, Inc. Lower overhead method for data transmission using ATM and SCDMA over hybrid fiber coax cable plant
US6003029A (en) 1997-08-22 1999-12-14 International Business Machines Corporation Automatic subspace clustering of high dimensional data for data mining applications
US6249902B1 (en) 1998-01-09 2001-06-19 Silicon Perspective Corporation Design hierarchy-based placement
US6415282B1 (en) 1998-04-22 2002-07-02 Nec Usa, Inc. Method and apparatus for query refinement
US7808968B1 (en) 1998-07-06 2010-10-05 At&T Intellectual Property Ii, L.P. Method for determining non-broadcast multiple access (NBMA) connectivity for routers having multiple local NBMA interfaces
US20020073380A1 (en) 1998-09-30 2002-06-13 Cadence Design Systems, Inc. Block based design methodology with programmable components
US20020095430A1 (en) 1999-12-30 2002-07-18 Decode Genetics Ehf SQL query generator utilizing matrix structures
US20020071392A1 (en) 2000-10-25 2002-06-13 Telecommunications Research Laboratories, An Alberta Corporation Design of a meta-mesh of chain sub-networks
US6925627B1 (en) 2002-12-20 2005-08-02 Conexant Systems, Inc. Method and apparatus for power routing in an integrated circuit
US8281297B2 (en) 2003-02-05 2012-10-02 Arizona Board Of Regents Reconfigurable processing
US20040216072A1 (en) 2003-04-17 2004-10-28 International Business Machines Corporation Porosity aware buffered steiner tree construction
US7065730B2 (en) 2003-04-17 2006-06-20 International Business Machines Corporation Porosity aware buffered steiner tree construction
US7318214B1 (en) 2003-06-19 2008-01-08 Invarium, Inc. System and method for reducing patterning variability in integrated circuit manufacturing through mask layout corrections
US7725859B1 (en) 2003-08-01 2010-05-25 Cadence Design Systems, Inc. Methods and mechanisms for inserting metal fill data
US20050147081A1 (en) 2003-12-26 2005-07-07 Swarup Acharya Route determination method and apparatus for virtually-concatenated data traffic
US7564865B2 (en) * 2004-04-05 2009-07-21 Koninklijke Philips Electronics N.V. Weight factor based allocation of time slot to use link in connection path in network on chip IC
US20060161875A1 (en) 2005-01-06 2006-07-20 Chae-Eun Rhee Method of creating core-tile-switch mapping architecture in on-chip bus and computer-readable medium for recording the method
US8059551B2 (en) 2005-02-15 2011-11-15 Raytheon Bbn Technologies Corp. Method for source-spoofed IP packet traceback
US7957381B2 (en) * 2005-03-08 2011-06-07 Commissariat A L'energie Atomique Globally asynchronous communication architecture for system on chip
US20060209846A1 (en) * 2005-03-08 2006-09-21 Commissariat A L'energie Atomique Globally asynchronous communication architecture for system on chip
US20090070726A1 (en) 2005-06-09 2009-03-12 Pyxis Technology, Inc. Enhanced Routing Grid System and Method
US20110276937A1 (en) 2005-06-24 2011-11-10 Pulsic Limited Integrated Circuit Routing with Compaction
US7917885B2 (en) 2005-06-27 2011-03-29 Tela Innovations, Inc. Methods for creating primitive constructed standard cells
US20080232387A1 (en) * 2005-07-19 2008-09-25 Koninklijke Philips Electronics N.V. Electronic Device and Method of Communication Resource Allocation
US7590959B2 (en) 2005-10-31 2009-09-15 Seiko Epson Corporation Layout system, layout program, and layout method for text or other layout elements along a grid
US20070118320A1 (en) 2005-11-04 2007-05-24 Synopsys, Inc. Simulating topography of a conductive material in a semiconductor wafer
US20070244676A1 (en) 2006-03-03 2007-10-18 Li Shang Adaptive analysis methods
US8448102B2 (en) 2006-03-09 2013-05-21 Tela Innovations, Inc. Optimizing layout of irregular structures in regular layout context
US20070256044A1 (en) 2006-04-26 2007-11-01 Gary Coryer System and method to power route hierarchical designs that employ macro reuse
US20080120129A1 (en) 2006-05-13 2008-05-22 Michael Seubert Consistent set of interfaces derived from a business object model
US20070267680A1 (en) 2006-05-17 2007-11-22 Kabushiki Kaisha Toshiba Semiconductor integrated circuit device
US7724735B2 (en) * 2006-05-29 2010-05-25 Stmicroelectronics Sa On-chip bandwidth allocator
US20080072182A1 (en) 2006-09-19 2008-03-20 The Regents Of The University Of California Structured and parameterized model order reduction
US20090313592A1 (en) 2006-10-10 2009-12-17 Ecole Polytechnique Federale De Lausanne (Epfl) Method to design network-on-chip (noc) - based communication systems
US20100040162A1 (en) 2007-04-10 2010-02-18 Naoki Suehiro Transmission method, transmission device, receiving method, and receiving device
US8136071B2 (en) 2007-09-12 2012-03-13 Neal Solomon Three dimensional integrated circuits and methods of fabrication
US8099757B2 (en) 2007-10-15 2012-01-17 Time Warner Cable Inc. Methods and apparatus for revenue-optimized delivery of content in a network
US20090268677A1 (en) * 2008-04-24 2009-10-29 National Taiwan University network resource allocation system and method of the same
US8203938B2 (en) * 2008-05-22 2012-06-19 Level 3 Communications, Llc Multi-router IGP fate sharing
US20110060831A1 (en) 2008-06-12 2011-03-10 Tomoki Ishii Network monitoring device, bus system monitoring device, method and program
US8050256B1 (en) 2008-07-08 2011-11-01 Tilera Corporation Configuring routing in mesh networks
US8312402B1 (en) 2008-12-08 2012-11-13 Cadence Design Systems, Inc. Method and apparatus for broadband electromagnetic modeling of three-dimensional interconnects embedded in multilayered substrates
US20120026917A1 (en) 2009-01-09 2012-02-02 Microsoft Corporation Server-centric high performance network architecture for modular data centers
US8412795B2 (en) * 2009-04-29 2013-04-02 Stmicroelectronics S.R.L. Control device for a system-on-chip and corresponding method
US20110035523A1 (en) 2009-08-07 2011-02-10 Brett Stanley Feero Communication infrastructure for a data processing apparatus and a method of operation of such a communication infrastructure
US20110072407A1 (en) 2009-09-18 2011-03-24 International Business Machines Corporation Automatic Positioning of Gate Array Circuits in an Integrated Circuit Design
US20110154282A1 (en) 2009-12-17 2011-06-23 Springsoft, Inc. Systems and methods for designing and making integrated circuits with consideration of wiring demand ratio
US8492886B2 (en) 2010-02-16 2013-07-23 Monolithic 3D Inc 3D integrated circuit with logic
US20130080073A1 (en) 2010-06-11 2013-03-28 Waters Technologies Corporation Techniques for mass spectrometry peak list computation using parallel processing
US20120023473A1 (en) 2010-07-21 2012-01-26 Brown Jeffrey S Granular channel width for power optimization
US20120022841A1 (en) 2010-07-22 2012-01-26 Polyhedron Software Ltd. Method and apparatus for estimating the state of a system
US20120110541A1 (en) 2010-10-29 2012-05-03 International Business Machines Corporation Constraint optimization of sub-net level routing in asic design
US8543964B2 (en) 2010-10-29 2013-09-24 International Business Machines Corporation Constraint optimization of sub-net level routing in asic design
US8705368B1 (en) * 2010-12-03 2014-04-22 Google Inc. Probabilistic distance-based arbitration
US8541819B1 (en) 2010-12-09 2013-09-24 Monolithic 3D Inc. Semiconductor device and structure
US20120155250A1 (en) 2010-12-21 2012-06-21 Verizon Patent And Licensing Inc. Method and system of providing micro-facilities for network recovery
US8717875B2 (en) 2011-04-15 2014-05-06 Alcatel Lucent Condensed core-energy-efficient architecture for WAN IP backbones
US20130051397A1 (en) * 2011-08-26 2013-02-28 Sonics, Inc. Credit flow control scheme in a router with flexible link widths utilizing minimal storage
US20130103369A1 (en) 2011-10-25 2013-04-25 Massachusetts Institute Of Technology Methods and apparatus for constructing and analyzing component-based models of engineering systems
US20130151215A1 (en) 2011-12-12 2013-06-13 Schlumberger Technology Corporation Relaxed constraint delaunay method for discretizing fractured media
US20130159944A1 (en) 2011-12-15 2013-06-20 Taiga Uno Flare map calculating method and recording medium
US20130174113A1 (en) 2011-12-30 2013-07-04 Arteris SAS Floorplan estimation
US20130207801A1 (en) 2012-02-14 2013-08-15 James Barnes Approach for prioritizing network alerts
US20130219148A1 (en) 2012-02-17 2013-08-22 National Taiwan University Network on chip processor with multiple cores and routing method thereof
US20130263068A1 (en) 2012-03-27 2013-10-03 International Business Machines Corporation Relative ordering circuit synthesis
US20130326458A1 (en) 2012-06-01 2013-12-05 International Business Machines Corporation Timing refinement re-routing
US8635577B2 (en) 2012-06-01 2014-01-21 International Business Machines Corporation Timing refinement re-routing
US20140068132A1 (en) 2012-08-30 2014-03-06 Netspeed Systems Automatic construction of deadlock free interconnects
CN103684961A (en) 2012-08-30 2014-03-26 网速系统公司 Automatic construction of deadlock free interconnects
US20140092740A1 (en) 2012-09-29 2014-04-03 Ren Wang Adaptive packet deflection to achieve fair, low-cost, and/or energy-efficient quality of service in network on chip devices
US20140098683A1 (en) 2012-10-09 2014-04-10 Netspeed Systems Heterogeneous channel capacities in an interconnect
WO2014059024A1 (en) 2012-10-09 2014-04-17 Netspeed Systems Heterogeneous channel capacities in an interconnect
US20140115218A1 (en) 2012-10-23 2014-04-24 Netspeed Systems ASYMMETRIC MESH NoC TOPOLOGIES
US20140115298A1 (en) 2012-10-23 2014-04-24 Netspeed Systems ASYMMETRIC MESH NoC TOPOLOGIES
US8601423B1 (en) 2012-10-23 2013-12-03 Netspeed Systems Asymmetric mesh NoC topologies
US8667439B1 (en) 2013-02-27 2014-03-04 Netspeed Systems Automatically connecting SoCs IP cores to interconnect nodes to minimize global latency and reduce interconnect cost

Non-Patent Citations (19)

* Cited by examiner, † Cited by third party
Title
Ababei, C., et al., Achieving Network on Chip Fault Tolerance by Adaptive Remapping, Parallel & Distributed Processing, 2009, IEEE International Symposium, 4 pgs.
Abts, D., et al., Age-Based Packet Arbitration in Large-Radix k-ary n-cubes, Supercomputing 2007 (SC07), Nov. 10-16, 2007, 11 pgs.
Beretta, I, et al., A Mapping Flow for Dynamically Reconfigurable Multi-Core System-on-Chip Design, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Aug. 2011, 30(8), pp. 1211-1224.
Das, R., et al., Aergia: Exploiting Packet Latency Slack in On-Chip Networks, 37th International Symposium on Computer Architecture (ISCA '10), Jun. 19-23, 2010, 11 pgs.
Ebrahimi, E., et al., Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems, ASPLOS '10, Mar. 13-17, 2010, 12 pgs.
Falko et al., Fair rate packet arbitration in Network-on-Chip, Sep. 26-28, 2011, IEEE. *
Gindin, R., et al., NoC-Based FPGA: Architecture and Routing, Proceedings of the First International Symposium on Networks-on-Chip (NOCS'07), May 2007, pp. 253-262.
Grot, B., Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees, ISCA '11, Jun. 4-8, 2011, 12 pgs.
Grot, B., Preemptive Virtual Clock: A Flexible, Efficient, and Cost-Effective QOS Scheme for Networks-on-Chip, Micro '09, Dec. 12-16, 2009, 12 pgs.
Grot, B., Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors, 6th Annual Workshop on the Interaction between Operating Systems and Computer Architecture, Jun. 2006, 11 pgs.
International Search Report and Written Opinion for PCT/US2013/064140, Jan. 22, 2014, 9 pgs.
International Search Report and Written Opinion for PCT/US2014/012003, Mar. 26, 2014, 9 pgs.
International Search Report and Written Opinion for PCT/US2014/012012, May 14, 2014, 9 pgs.
Jiang, N., et al., Performance Implications of Age-Based Allocations in On-Chip Networks, CVA MEMO 129, May 24, 2011, 21 pgs.
Lee et al., Probabilistic Distance-based Arbitration:Providing Equality of Service for Many-core CMPs, Dec. 4-8, 2010, IEEE. *
Lee, J. W., et al., Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks, 35th IEEE/ ACM International Symposium on Computer Architecture (ISCA), Jun. 2008, 12 pgs.
Lee, M. M., et al., Approximating Age-Based Arbitration in On-Chip Networks, PACT '10, Sep. 11-15, 2010, 2 pgs.
Li, B., et al., CoQoS: Coordinating QoS-Aware Shared Resources in NoC-based SoCs, J. Parallel Distrib. Comput., 71 (5), May 2011, 14 pgs.
Yang, J., et al., Homogeneous NoC-based FPGA: The Foundation for Virtual FPGA, 10th IEEE International Conference on Computer and Information Technology (CIT 2010), Jun. 2010, pp. 62-67.

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10673745B2 (en) 2018-02-01 2020-06-02 Xilinx, Inc. End-to-end quality-of-service in a network-on-chip
US20190258573A1 (en) * 2018-02-22 2019-08-22 Netspeed Systems, Inc. Bandwidth weighting mechanism based network-on-chip (noc) configuration
US10983910B2 (en) * 2018-02-22 2021-04-20 Netspeed Systems, Inc. Bandwidth weighting mechanism based network-on-chip (NoC) configuration

Also Published As

Publication number Publication date
US20140204764A1 (en) 2014-07-24

Similar Documents

Publication Publication Date Title
US9007920B2 (en) QoS in heterogeneous NoC by assigning weights to NoC node channels and using weighted arbitration at NoC nodes
US9571402B2 (en) Congestion control and QoS in NoC by regulating the injection traffic
US9130856B2 (en) Creating multiple NoC layers for isolation or avoiding NoC traffic congestion
US10110499B2 (en) QoS in a system with end-to-end flow control and QoS aware buffer allocation
US9742630B2 (en) Configurable router for a network on chip (NoC)
JP6093867B2 (en) Non-uniform channel capacity in the interconnect
US9699079B2 (en) Streaming bridge design with host interfaces and network on chip (NoC) layers
US9160627B2 (en) Multiple heterogeneous NoC layers
US9294354B2 (en) Using multiple traffic profiles to design a network on chip
US9825809B2 (en) Dynamically configuring store-and-forward channels and cut-through channels in a network-on-chip
US9185023B2 (en) Heterogeneous SoC IP core placement in an interconnect to optimize latency and interconnect performance
US9185026B2 (en) Tagging and synchronization for fairness in NOC interconnects
US10983910B2 (en) Bandwidth weighting mechanism based network-on-chip (NoC) configuration
JP5834178B2 (en) Semiconductor circuit bus system
US10298485B2 (en) Systems and methods for NoC construction
JP2017502418A (en) Cache coherent NOC (network on chip) with variable number of cores, input / output (I / O) devices, directory structure, and coherency points
US9762474B2 (en) Systems and methods for selecting a router to connect a bridge in the network on chip (NoC)
US20180198682A1 (en) Strategies for NoC Construction Using Machine Learning
US9590924B1 (en) Network device scheduler and methods thereof
US11144457B2 (en) Enhanced page locality in network-on-chip (NoC) architectures
CN117135107B (en) Network communication topology system, routing method, device and medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: NETSPEED SYSTEMS, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUMAR, SAILESH;NORIGE, ERIC;PHILIP, JOJI;AND OTHERS;REEL/FRAME:029797/0470

Effective date: 20130117

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NETSPEED SYSTEMS, INC.;REEL/FRAME:060753/0662

Effective date: 20220708

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8