DyXYZ: Fully Adaptive Routing Algorithm for 3D NoCs
Masoumeh Ebrahimi1, Xin Chang2, Masoud Daneshtalab1, Juha Plosila1, Pasi Liljeberg1, Hannu Tenhunen1
1
Department of Information Technology, University of Turku, Finland
2
Department of Electronic and Microelectronics, University of Mons, Belgium
{masebr, masdan, juplos, pakrli, hanten}@utu.fi; xin.chang@umons.ac.be
Abstract—Traditional methods in 3D NoCs simply use a
deterministic routing algorithm to deliver packets from a source to
a destination node. However, deterministic methods are unable to
distribute the traffic load over the network, which results in
degrading the performance. In this paper, we present a fully
adaptive routing algorithm for 3D NoCs, named DyXYZ. In
DyXYZ, the congestion information at the input buffer of the
neighboring routers is used as congestion metric to select among
the output channels. This algorithm is proven to be deadlock free by
using 4, 4, and 2 virtual channels along the X, Y, and Z dimensions,
respectively.
Keywords-component; 3D network, fully adaptive routing
algorithms; alternative routes
I.
INTRODUCTION
The performance of future System-on-Chip (SoC) designs
is limited by the chip area and interconnect problem. According
to the Technology Roadmap for Semiconductors (ITRS),
There-Dimensional (3D) integration is counted as a promising
technology to sustain the performance improvement beyond
65nm [1]-[6]. In 3D integration technology, multiple active
silicon layers are stacked together using Through-Silicon-Vias
(TSVs). The major advantage of 3D ICs is the considerable
reduction in the footprint and global interconnect length,
resulting in increased performance [1]. 3D Networks-on-Chip
(3D NoCs), on the other hand, are emerging as a solution for
the interconnect complexity in 3D SoCs [7][8].
Routing algorithms are used to route a packet from a source
to a destination. Routing algorithms can be classified according
to the place where routing decisions are taken [9]. In a source
routing [10], the path can be established at the source node
prior to the packet injection while in a distributed routing, the
path is determined in a distributed manner while the packet
travels across the network. Unlike distributed routing, in the
source routing approach, the path information is carried by
packets, thus routers does not require any extra computation for
making routing decisions. This results in a simpler routing unit
and a faster communication. However, as packets are required
to carry the path information (i.e. that is usually large), the
bandwidth requirement and scalability become major
challenges. When distributed routing is used, the path is
computed at each intermediate node. Distributed routing can be
classified into deterministic and adaptive [11]-[16]. A
deterministic routing algorithm uses a fixed path for each pair
of source and destination nodes. Implementations of
deterministic routing algorithms are simple but they are unable
to balance the load across the links. The simplest deterministic
routing method is dimension order routing. The dimension
order routing algorithm routes packets by crossing dimensions
in strictly increasing order, reducing the offset in one direction
to zero before routing in the next one. Deterministic routing
algorithms perform well with uniform traffic pattern while they
are very inefficient under non-uniform traffic. In contrast, in
adaptive routing algorithms, a packet is not restricted to a
single path when traveling from a source node to its
destination. Thereby, adaptive routing algorithms can decrease
the probability of routing packets through congested regions.
Several partially and fully adaptive routing algorithms are
introduced in 2D networks such as DyXY [17] and OddEven [18]. The routing selection of these algorithms is usually
performed using the congestion status of the network. Many of
them consider local traffic condition in the routing decision in
which each router analyses the congestion conditions of its own
and adjacent routers to choose an output channel. This group of
algorithms could improve the performance significantly as
compared to dimension order routing due to the distribution of
packets over the network. However, routing decisions based on
local congestion information may lead to an unbalanced
distribution of traffic load. Some other algorithms such as
CATRA [19] and HARAQ [20] take more global information
into account, reducing the probability of making a wrong
decision. However, regardless of the imposed complexity and
area overhead, providing global information is complex. In
sum, adaptive routing algorithms based on global congestion
information improve the performance over the methods based
on local congestion information. This performance gain is at
the cost of a large area overhead, a more complex routing unit,
and the need for congestion detection and propagation
mechanism. There are few partially adaptive methods
presented in 3D NoCs such as MAR [8].
In this paper, we present a fully adaptive routing algorithm,
named DyXYZ, to route packets adaptively in the network. To
the best of our knowledge this is the first work presenting a
fully adaptive routing algorithm in 3D mesh NoCs while
guaranteeing the deadlock freeness. This method can efficiently
alleviate congestion in the network by choosing among
multiple paths based on the traffic condition.
The paper is organized as follows. The proposed fully
adaptive routing algorithm (DyXYZ) in 3D NoCs is given in
Section II. Results are reported in Section III while the
conclusion is given in the last section.
II.
DYNAMIC XYZ FOR 3D NOCS
A. Dimension Order Routing in 3D NoCs
In a dimension order routing, packets are crossing
dimensions in a strictly increasing (or descending) order,
reducing the offset along one dimension to zero before routing
in the next dimension. For example in Fig. 1, three packets (i.e.
P1, P2, and P3) are sent from the source node 0 to destination
nodes 8, 16, and 22. As shown in this figure, at the source
node, all the three packets should be sent through the east
direction. As they cannot be sent through the east link
simultaneously, a queue of packets is created behind this link
(acting as a performance bottleneck) while the links in the
north and up directions are free. Similarly, at node 1, packets
P2 and P3 are in competition with each other to receive the
north output channel while the link in the up direction is free.
These examples show the inefficiency of the XYZ routing
algorithm in distributing packets.
negative directions of them. The similar perspective is applied
to the Z dimension. As shown in Table 1, virtual networks
cover separate sets of virtual channels and thus no deadlock has
occurred in the network.
Table 1. Assigning virtual channels to sub-networks
Sub-network
ENU
END
ESU
ESD
WNU
WND
WSU
WSD
Virtual Channels
X0,Y0,Z0
X1,Y1,Z0
X2,Y0,Z1
X3,Y1,Z1
X0,Y2,Z2
X1,Y3,Z2
X2,Y2,Z3
X3,Y3,Z3
In Fig. 2, different colors are used to distinguish between
separate set of virtual channels for each sub-network. Fig. 3
shows the paths diversity of sending a packet from the source
node 13 to destinations 2 and 24, located in sub-networks ESD
and WNU, respectively. Packets can be delivered through the
bolded lines to the destination nodes.
Fig. 1. XYZ routing algorithm.
B. Fully Adaptive Routing Algorithm in 3D Mesh NoCs
In this subsection, we propose a fully adaptive routing
algorithm for 3D NoCs. Using this algorithm, at each
intermediate node, packets can be delivered through three
directions, as long as the distance along a direction has not
reached zero. In this method, the network is divided into eight
sub-networks as ENU, END, ESU, ESD, WNU, WND, WSU,
and WSD where N, S, E, W, U, and D stand for North, South,
East, West, Up, and Down directions. Since packets can be
delivered on X, Y, or Z direction without any restriction the
deadlock freeness should be guaranteed. If we could prove that
the sub-networks are disjoint and each uses a separate set of
virtual channels, then the whole network is deadlock free.
According to Table 1, there are four sub-networks covering
East direction (i.e. ENU, END, ESU, and ESD). For each subnetwork, a separate virtual channel number is allocated,
keeping the sub-networks disjoint in the X dimension. For
instance, vc0, vc1, vc2, and vc3 along the X dimension are
allocated to ENU, END, ESU, and ESD sub-networks,
respectively. WNU, WND, WSU, and WSD sub-networks use
the same virtual channels in the X dimension, but in the
opposite directions. In other words, each of east and west
direction involves half of virtual channels in the X dimension.
Similarly, four virtual channels are enough to cover the Y+ and
Y- directions in all regions. ENU, END, WNU, and WND
utilize the positive directions of the virtual channels in the Y
dimension while ESU, ESD, WSU, and WSD contain the
Fig. 2. Different virtual networks cover different sets of virtual
channels.
Fig. 3. Path diversity from source node 13 to destinations 2 and 24
C. Reducing the Number of Virtual Channels to Two
along the Z dimension
The presented method requires four virtual channels long
each dimension. However, the number of virtual channels can
be reduced to two virtual channels along one of dimensions, we
called this method Dynamic XYZ (DyXYZ). In this way, the
whole network can be split into two main sub-networks, each
having four sub-networks. Dividing the network into two parts
can be done along one of dimensions, reducing the number of
virtual channels from four to two for that dimension. Since two
main sub-networks are separated, the network remains
deadlock free. On the other hand, the sub-networks within each
main sub-network use different sets of virtual channels on the
two remaining dimensions. Therefore, the eight sub-networks
are disjoint and the network is deadlock free. By using
DyXYZ, packets can be routed along X, Y or Z dimension at
each intermediate node without creating any cycles. All the
routing options for DyXYZ routing algorithm using two virtual
channels in the Z dimension and four virtual channels in the X
and Y dimensions is shown in Table 2. Note that, the positions
are determined regarding the source and destination positions
but not the current node.
Table 2. Routing options in DyXYZ routing algorithm.
Direction
Virtual Channels
ENU
END
ESU
ESD
WNU
WND
WSU
WSD
EN
ES
EU
ED
WN
WS
WU
WD
NU
ND
SU
SD
E
W
N
S
U
D
X0,Y0,Z0
X1,Y1,Z1
X2,Y0,Z0
X3,Y1,Z1
X0,Y2,Z0
X1,Y3,Z1
X2,Y2,Z0
X3,Y3,Z1
(X0,X1),(Y0,Y1)
(X2,X3),(Y0,Y1)
(X0,X2),(Z0)
(X0,X3),(Z1)
(X0,X1),(Y2,Y3)
(X2,X3),(Y2,Y3)
(X0,X2),(Z0)
(X1,X3),(Z1)
(Y0,Y2),(Z0)
(Y1,Y3),(Z1)
(Y0,Y2),(Z0)
(Y1,Y1),(Z1)
(X1,X2,X3,X4)
(X1,X2,X3,X4)
(Y0,Y1,Y2,Y3)
(Y0,Y1,Y2,Y3)
(Z0,Z1)
(Z0,Z1)
As shown in Table 2, packets have some degree of adaptiveness
on selecting among virtual channels when the distance along
one or two dimensions are non-zero (i.e. the destination is to
the EN, ES, WN, ES, EU, ED, WU, WD, NU, ND, SU, SD, E,
W, N, S, U, and D position of the source node). For example,
when the destination is to the EN of the source node, the packet
can be delivered through each of the vc0 or vc1 of the X or Y
dimension. When destination is to the E, W, N or S directions,
each of four available virtual channels can be selected by
packets. As the selection function and for a better efficiency,
the routing unit chooses a dimension which is less congested.
The congestion is determined based on the number of free
buffer slots at the corresponding input buffer of the
neighboring router. When the number of free buffer slots is
greater than the threshold value, the input buffer is counted as a
congested one. In this work, the threshold value is considered
to be 70% of the total input buffer size.
Dir:Direction
N:North; S:South; E:East; W:West; U:Up; D:Down
--------------------------------Regions regarding source and dest position-If (Dir=ENU or Dir=EN or Dir=U or Dir=N)
Selects X0 or Y0 or Z0;
If (Dir=END or Dir=ED or Dir=E)
Selects X1 or Y1 or Z1;
If (Dir=ESU or Dir=ES or Dir=EU)
Selects X2 or Y0 or Z0;
If (Dir=ESD or Dir=SD or Dir=S)
Selects X3 or Y1 or Z1;
If (Dir=WNU or Dir=WU or Dir=NU)
Selects X0 or Y2 or Z0;
If (Dir=WND or Dir=ND or Dir=WN)
Selects X1 or Y3 or Z1;
If (Dir=WSU or Dir=WS or Dir=SU)
Selects X2 or Y2 or Z0;
If (Dir=WSD or Dir=WD or Dir=W or Dir=D)
Selects X3 or Y3 or Z1;
End If;
Fig. 4. Assigning virtual channels to directions in the
DyXYZ routing algorithm
III.
RESULTS AND DISCUSSION
To assess the efficiency of the proposed DyXYZ, we have
developed a cycle-accurate NoC simulator based on wormhole
switching in 3D-mesh configuration. The simulator calculates
the average delay and the power consumption for the message
transmission. To estimate the power consumption of networks,
we have used Orion [29] as well as the power and delay values
of vertical links in [28]. XYZ and the proposed DyXYZ routing
algorithms use four virtual channels in the X and Y dimensions
and two virtual channels in the Z dimension. The on-chip
network, considered for experiment is formed by a typical
wormhole router structure including input buffers, a routing
unit, a switch allocator and a crossbar. Each router has 7
input/output ports, a natural extension from a 5-port 2D router
by adding two ports to make connections to the upper and
lower layers [25][26]. There are some other types of 3D routers
such as the hybrid router [26] and MIRA [27]. However, since
router efficiency is out of the concept of this paper, we have
chosen simple 7-port router in our simulation. The arbitration
scheme of the switch allocator in the typical router structure is
round-robin. The data width and the frequency were set to 32
bits and 1 GHz, respectively, and each input channel has a
buffer size of 7 flits. The packet size was assumed to be 5 flits.
We also assume that the 3D mesh topology is regular and the
delays on wires will not exceed the clock period. For the
performance metric, we use the latency defined as the number
of cycles between the initiation of a packet and the time when
the tail of the packet reaches the destination.
A. Performance Analysis
control mechanism is required. In the XYZ routing algorithm,
all packets follow the same path and do not require a reordering
mechanism. However, by using virtual channels in XYZ, it
cannot support in-order delivery since packets might reach
destination in an out-of-order manner due to facing different
congestion in different virtual channels. To deal with this
problem, an approach similar to [22] is needed which assigns a
virtual channel to a flow, requiring look up table at each
intermediate node. In both Fig. 5 and Fig. 6, the performance of
the XYZ routing algorithm is obtained for the best case where
switching between virtual channels is possible (i.e. in-order
delivery cannot be supported).
1) Uniform Traffic Profile
XYZ
DyXYZ
1000
XYZ
600
400
200
0
0,005
DyXYZ
500
Latency
800
Latency
In the uniform traffic profile, each processing element (PE)
generates data packets and sends them to another PE using a
uniform distribution. In Fig. 5, the average communication
delay as a function of the average packet injection rate is
plotted for all schemes. As observed from the results, XYZ
leads to a lower latency than the DyXYZ method. The reason is
that, under uniform traffic, dimension order routings are the
best matches for evenly distributing traffic over the network.
0,008
0,011
0,014
0,017
0,02
Packet injection rate
400
Fig. 6. Performance under hotspot traffic
300
B. Physical Analysis
To assess the area overhead and power consumption of the
presented communication architectures, the whole platforms
including network interfaces [30] and routers are synthesized
using Synopsys Design Compiler with the 65nm LP CMOS
technology from STMicroelectronics, while the backend is
performed with the Cadence Encounter tool. Depending on the
technology and manufacturing process, the pitches of TSVs can
range from 1μm2 to 10μm2 [26][31][32]. In this work, the pad
size for TSVs is assumed to be 5μm2 with the pitch of around
8m2. Since all algorithms (i.e. the XYZ and DyXYZ routing
algorithms) uses the same number of virtual channels, the total
area cost of a layer for all platforms is almost similar and about
18 mm2. The power dissipation of the XYZ and DyXYZ
methods were calculated and compared under the hotspot
traffic model. The results for the average power under hotspot
traffic are shown in Table 3. The average power values are
computed near the saturation point, 0.014 under the hotspot
traffic profile. As a result, the average power consumption of
the DyXYZ scheme is 11% larger than that of the XYZ
scheme. This indicates that DyXYZ distributes the traffic
better.
200
100
0
0,005
0,01
0,015
0,02
0,025
0,03
0,035
Packet Injection Rate
Fig. 5. Performance under uniform traffic profile.
2) Hotspot Traffic Profile
Under the hotspot traffic pattern, some nodes are chosen as
hotspots receiving an extra portion of the traffic in addition to
the regular uniform traffic. In simulations, given a hotspot
percentage of H, a newly generated message is directed to each
hotspot node with an additional H percent probability. We
simulate the hotspot traffic with four hotspot nodes at positions
(2, 1) and (3, 1) in layer 2 and the same positions in layer 3 in
the 4×4×4 mesh network. The performance of each network
with H=10% is illustrated in Fig. 6. As observed from the
figure, DyXYZ leads to a better performance since it considers
the congestion condition of the network in the routing decision.
In fact, the improvement is achieved by smoothly balancing the
traffic load over the network which reduces the number of the
hotspots and, hence, improving the performance rather than the
XYZ routing algorithm. It is worth mentioning that DyXYZ
can adaptively select between output directions based on the
congestion condition in the input buffer of the neighboring
routers. To support in-order delivery when the underlying
routing algorithm is DyXYZ, either reordering buffer or flow
Table 3. Power dissipation
Network platforms
XYZ
DyXYZ
Power (W)
dynamic & static
19.34
17.21
IV.
CONCLUSION
Traffic can be distributed across the network by using
alternative paths. In this paper, we proposed a fully adaptive
routing algorithm, named DyXYZ. In this method, at each
intermediate node, a packet is sent to a direction which is less
congested. This algorithm is proven to be deadlock free by
using 4, 4, and 2 virtual channels along the X, Y, and Z
dimensions, respectively.
Although this algorithm can alleviate the congestion, it
cannot be used for the applications demanding in-order
delivery. In-order delivery is a basic requirement for many
applications but it has been rarely investigated in literature. In
order to address the in-order delivery issue in 3D network, we
are working to propose a method as a future work.
ACKNOWLEDGMENT
The authors wish to acknowledge Nokia, Elisa, and
KAUTE Foundations for the partial financial support during
the course of this research.
References
[14] M. Ebrahimi et al., “An Efficent Dynamic Multicast Routing Protocol
for Distributing Traffic in NOCs,” in Proceedings of 12th ACM/IEEE
Design, Automation, and Test in Europe Conference (DATE), pp. 1064 1069 , April 2009.
[15] B. Marvasti et al., “PAMPR: Power-Aware and Minimum Path Routing
Algorithm for NoCs,” in Proceedings of 15th IEEE International
Conference on Electronics, Circuits, and Systems (ICECS), pp. 418-421,
Sep 2008, Malta.
[16] M. Daneshtalab et al. “Distributing Congestions in NoCs through a
Dynamic Routing Algorithm based on Input and Output Selections,” in
Proceedings of 20th IEEE International Conference on VLSI Design
(VLSID), pp. 546-550, 2007.
[17] M. Li, Q. Zeng, W. Jone, “DyXY - a proximity congestion-aware
deadlock-free dynamic routing method for network on chip”, in
proceedings of DAC, pp. 849-852, 2006.
[18] G. M. Chiu, “The odd-even turn model for adaptive routing”, IEEE
Transaction on Parallel and Distributed Systems, 11:729 – 738, 2000.
[19] M. Ebrahimi et al., “CATRA-Congestion Aware Trapezoid-based
Routing Algorithm for On-Chip Networks,” in Proceedings of 15th
ACM/IEEE Design, Automation, and Test in Europe (DATE), pp. 320325, 2012.
[20] M. Ebrahimi et al. “HARAQ: Congestion-Aware Learning Model for
Highly Adaptive Routing Algorithm in On-Chip Networks,” in
Proceedings of 6th ACM/IEEE International Symposium on Networkson-Chip (NOCS), pp. 19-26, 2012.
[1]
V. F. Pavlidis, E.G. Rochester Friedman, “3-D Topologies for Networkson-Chip,” Vol.15, No.10, pp. 1081-1090, 2006.
[21] S. Murali, et al., “A Method for Routing Packets Across Multiple Paths
in NoCs with In-Order Delivery and Fault-Tolerance Gaurantees,” VLSI
Design, vol. 2007, Article ID 37627, 11 pages, 2007..
[2]
M. Ebrahimi, et al., “Cluster-based Topologies for 3D Networks-onChip Using Advanced Inter-layer Bus Architecture,” Elsevier Journal of
Computer and System Sciences (JCSS-elsevier), 2012.
[22] M. Lis et al., “Guaranteed in-order packet delivery using Exclusive
Dynamic Virtual Channel Allocation”, technical report CSAIL-TR2009-036, Massachusetts Institute of Technology, 2009.
[3]
L.P. Carloni, P. Pande, X. Yuan, “Networks-on-chip in emerging
interconnect paradigms: Advantages and challenges”, pp. 93-102, 2009.
[23] M. Lis, et al., “Path-Diverse Inorder Routing”, International conference
on Green Circuits and Systems (ICGCS), pp. 311-316, 2010.
[4]
M. Elgamel, M. A. Bayoumi, “Interconnect Noise Optimization in
Nanometer Technologies”, 2010.
T. C. Xu et al., “A Study of 3D Network-on-Chip Design for Data
Parallel H.264 Coding,” Microprocessors and Microsystems, Vol. 35,
No. 7, pp. 603-612, 2011.
T. C. Xu et al., “Optimal Number and Placement of Through Silicon
Vias in 3D Network-on-Chip,” In Proceedings of the 14th IEEE
International Symposium on Design and Diagnostics of Electronic
Circuits & Systems (DDECS), pp.105-110, 2011.
[24] S. Mubeen, Sh. Kumar, “Designing Efficient Source Routing for Mesh
Topology Network on Chip Platforms”, in Proceedings of the 13th
Euromicro Conference on Digital System Design: Architectures,
Methods and Tools, pp.181-188, 2010.
[7]
M. Daneshtalab et al., “HIBS-Novel Inter-layer Bus Structure for
Stacked Architectures,” in Proceedings of IEEE International 3D
Systems Integration Conference (3DIC), pp. 1-7, 2012.
[27] D. Park et. al., “MIRA: A Multi-Layered On-Chip Interconnect Router
Architecture”, 35th International Symp. on Computer Architecture
(ISCA), pp.251-261, 2008.
[8]
M. Ebrahimi et al., “Exploring Partitioning Methods for 3D Networkson-Chip Utilizing Adaptive Routing Model,” in Proceedings of 5th
ACM/IEEE International Symposium on Networks-on-Chip (NOCS),
pp. 73-80, 2011.
[28] V.F. Pavlidis and E.G. Friedman, “3-D topologies for networks-onchip”,
IEEE Transactions on Very Large Scale Integration Systems, v.15, i.10,
pp.1081-1090, 2007.
[9]
J. Duato, C. Yalamanchili, L. Ni, “Interconnection networks: an
engineering approach,” Morgan Kaufmann Publishers, 2003.
[29] A. Kahng, B. Li, L.-S. Peh, and K. Samadi, "Orion 2.0: A fast and
accurate noc power and area model for early-stage design space
exploration," in proc. of DATE, pp. 423-428, 2009.
[5]
[6]
[10] S. Mubeen, S. Kumar, “Designing Efficient Source Routing for Mesh
Topology Network on Chip Platforms”, pp. 181-188, 2010.
[11] M. Daneshtalab et al., “Adaptive Input-output Selection Based On-Chip
Router Architecture,” Journal of Low Power Electronics (JOLPE), Vol.
8, No. 1, pp. 11-29, 2012.
[12] M. Dehyadegari et al. “An Adaptive Fuzzy Logic-based Routing
Algorithm for Networks-on-Chip,” in Proceedings of 13th IEEE/NASAESA International Conference on Adaptive Hardware and Systems
(AHS), pp. 208-214, 2011.
[13] M. Ebrahimi et al. “HAMUM – A Novel Routing Protocol for Unicast
and Multicast Traffic in MPSoCs,” in Proceedings of 18th IEEE
Euromicro Conference on Parallel, Distributed and Network-Based
Computing (PDP), pp. 525-532, 2010.
[25] Z. Liu, J.Duato, “Adaptive Unicast and Multicast in 3D Mesh
Networks”, Proc. of the Twenty-Seventh Hawaii Int. Conf., v.1, pp.173182, 1994.
[26] F. Li, C. Nicopoulos, T. Richardson, et. al, “Design and Management of
3D Chip Multiprocessors Using Network-in-Memory”, ISCA-33, 2006.
[30] M. Ebrahimi, et al., “A High-Performance Network Interface
Architecture for NoCs Using Reorder Buffer Sharing,” in Proceedings of
18th IEEE Euromicro Conference on Parallel, Distributed and NetworkBased Computing (PDP), pp. 547-550, 2010.
[31] M. Daneshtalab et al., “Memory-Efficient Logic Layer Communication
Platform for 3D-Stacked Memory-on-Processor Architectures,” in
Proceedings of IEEE International 3D Systems Integration Conference
(3DIC), pp. 1-8, Jan. 2012.
[32] U. Kang et al., “8Gb 3-D DDR3 DRAM Using Through-Silicon-Via
Technology,” In Proc. International Solid State Circuits Conference
(ISSCC), pp. 130–131, 2009.