Application-Specific Networks-on-Chip Topology
Customization Using Network Partitioning
Ahmed A. Morgan
Haytham Elmiligi
M. Watheq El-Kharashi
Elec. and Comp. Eng. Dept.
University of Victoria
BC, Canada V8W 3P6
Elec. and Comp. Eng. Dept.
University of Victoria
BC, Canada V8W 3P6
Dept. of Comp. & Sys. Eng.
Ain Shams University
Cairo 11517, Egypt
ahmorgan@ece.uvic.ca
haytham@ece.uvic.ca
watheq@ece.uvic.ca
Fayez Gebali
Elec. and Comp. Eng. Dept.
University of Victoria
BC, Canada V8W 3P6
fayez@ece.uvic.ca
ABSTRACT
One of the most challenging problems in Application-Specific
Networks-on-Chip (ASNoC) design is to customize the topological structure of the on-chip network in order to meet the
application requirements with the minimum possible cost.
In this paper, the area cost of ASNoCs is reduced by using network partitioning techniques. The enhancement in
area cost is achieved by reducing both routers area and the
number of global links. Given the application core graph,
Fiduccia-Mattheyses (FM) algorithm is adopted with modification to formulate the partitioning problem as an optimization one. As a proof of concept, our technique is applied
to three different applications with different number of cores.
Results show that the proposed technique is a promising way
to reduce the ASNoC area compared to other topology generation techniques.
Categories and Subject Descriptors
B.4.3 [INPUT/OUTPUT AND DATA COMMUNICATIONS]: Interconnections—Topology
General Terms
Algorithm, Analysis, Optimization, Topology
Keywords
Application-Specific Networks-on-Chip (ASNoC), Core graph,
Network partitioning
1.
INTRODUCTION
The number of cores within modern System-on-Chip (SoC)
applications grows rapidly. Although this large number of
Permission to make digital or hard copies of part or all of this
work or personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or
commercial advantage and that copies bear this notice and the
full citation on the first page. To copy otherwise, to republish,
to post on servers, or to redistribute to lists, requires prior
specific permission and/or a fee.
,)07
1RYHPEHU&DLUR(J\SW
.
© ACM 2008 ISBN: 978-1-60558-407-2 ...$5.00
computational resources allows more functionality to be integrated on a single chip, the communication among these
cores constitutes a major performance bottleneck. ApplicationSpecific Networks-on-Chip (ASNoC) is an emerging paradigm
proposed to handle the communication problem between
these computational resources [2].
ASNoC aims at optimizing the design of the on-chip network to comply with the application requirements. High
throughput, low latency, low energy consumption, and low
area are the main desirable characteristics of ASNoC-based
systems [9]. The choice of a network topology for ASNoCbased applications significantly impacts their performance.
in the literature, both standard and custom topologies are
used for ASNoCs [1, 22, 23]. Although standard topologies
could be easily implemented and ensure the regularity of the
on-chip network, they might not be the best choice for ASNoCs due to the mismatch between the topology and the
application characteristics [2, 22]. Therefore, it is required
to carefully customize the topological structure of ASNoCs
to meet the design requirements, while minimizing the cost.
In [10], a topology generation methodology to minimize the
power consumption cost of ASNoCs was proposed. In this
paper, network partitioning techniques are employed to optimize ASNoCs topology from the area cost point of view.
To this end, our contribution is threefold.
1. Analyzing the impact of network partitioning on ASNoC area,
2. Formulating a partitioning technique for any application. As the problem of network partitioning is known
to be an NP-hard one [6], it is formulated as an optimization one based on the famous Fiduccia-Mattheyses
(FM) algorithm [11], and
3. Validating our technique through three real applications as case studies. Results show how applying network partitioning reduces the area cost compared to
both standard and custom topologies.
This paper is organized as follows. Related work is discussed in Section 2. The impact of applying network partitioning techniques on ASNoC area is investigated in Section
3. Section 4 gives some background from graph theory on
2.
RELATED WORK
Standard topologies like ring, octagon, spidergon, mesh,
torus, folded torus, SPIN, and BFT were compared and
evaluated with respect to throughput, average latency, energy consumption, and area in [5, 23]. Different algorithms,
such as NMAP [20], PMAP [17], BMAP [24], GMAP [13],
and PBB [13], were proposed to map any application onto
these standard topologies. A tool, SUNMAP, was presented
in [21] to automatically select the best standard topology
for a given application. SUNMAP was only limited to five
topologies (mesh, torus, hypercube, butterfly, and clos).
The advantages of using custom topologies over standard
topologies for ASNoC were explained in [2, 22]. The authors
in [22] proposed a methodology for ASNoC custom topology generation by adding some long-range links to standard
mesh topology. Their methodology tries to connect distant
nodes that communicate with each other frequently with direct long links. The long-range link insertion methodology
proved useful in reducing the average latency and improving
the throughput.
Graph partitioning algorithms1 are widely used in many
fields, such as parallel computing, power system analysis,
and VLSI design to divide a large network into smaller partitions [6, 15, 19]. The most popular partitioning algorithms are Kernighan-Lin (KL) [16] and its derivatives like
Fiduccia-Mattheyses (FM) [11] and Min-Cut [18]. Many
other techniques were also proposed to carry out the partitioning task like linear, scattered, random, and spectral
methods [7, 12].
A tool, OIDIPUS, that uses network partitioning for ASNoC topology generation was presented in [1]. As being the
first work to use network partitioning, the tool objectives
were limited to reduce latency and power consumption. The
number of partitions and the size of each partition should
be given to the tool. Moreover, each partition was only limited to ring topology. Finally, the generated topology from
the tool was not compared to any other standard or custom
topologies. To the best of our knowledge, the work presented
here is the first work that tries to reduce ASNoC area using
network partitioning. The proposed technique is completely
general and puts no restrictions on the number of partitions
or the size and the topology of each partition.
3.
IMPACT OF NETWORK PARTITIONING
ON ASNOC AREA
The ever-increasing on-chip communication requirements
for modern applications force ASNoCs to replace ordinary
shared buses. Nevertheless, ASNoC area is not supposed to
exceed 10% of the whole design area [14]. The area of the
on-chip networks is mainly determined by routers area [24].
Moreover, ports with their buffers constitute the major percentage of any router area [8]. Therefore, to lower ASNoC
1
Network partitioning and graph partitioning are used interchangeably in this paper.
area, the number of ports for the routers used in generating
the topology should be reduced. To explain this point further, the area of input-queue routers with different number
of ports, including the local port, is replicated from [24] in
Figure 1. Each port has an equal buffer size and consists
of two channels for packets reception and transmission, respectively. The figure clarifies how reducing the number of
ports is a key factor in reducing the ASNoC area.
4
11
x 10
10
9
Router area (µm2)
how to represent different applications. Formulation and
assumptions of the network partitioning problem are discussed in Section 5. Section 6 represents some experimental
results and justifies the efficiency of our technique through
case studies. Finally, we draw our conclusions and give ideas
for future work in Section 7.
8
7
6
5
4
3
2
1
0
2
3
4
5
Number of ports
Figure 1: Router area versus number of ports [24].
Network partitioning divides a large network into smaller
partitions. Partitions are then connected together to produce a complete custom topology. It is apparent that smaller
partitions require routers with less number of ports than
those used in a single unpartitioned network. As a result,
partitioning could be used to reduce ASNoC routers area.
Not only does using network partitioning reduce routers
area, but also removes some of the global links. Global links
connect between routers, whereas local links connect cores
to routers. Global links are usually long and require driving circuits and repeaters. Accordingly, it is desirable to
decrease the number of global links in the generated topology. This discussion clarifies that network partitioning could
be used to reduce the two major factors of ASNoC area;
routers area and the number of global links. Consequently,
the area reduction problem converges into a network partitioning problem.
As an example of the area reduction resulted from using
network partitioning, Figure 2 shows a mesh implementation
of 16-node application before and after partitioning. The
two partitions are separated by the dotted line in Figure
2(b). The partitioning strategy itself is discussed in details
in Section 5. The number in each node in the figure represents the number of ports of the router in this node. From
one hand, partitioning results in removing three global links
in this example. On the other hand, Table 1 summarizes
the total number of routers with different number of ports
before and after partitioning and the total ASNoC routers
area. Using network partitioning reduces the ASNoC routers
area by 7.36% in this example.
As the number of nodes becomes larger, the number of
links removed by partitioning increases. By removing a link,
the number of ports of the two routers that were connected
by this link decreases by one. According to Figure 1, the
router area decreases in a nearly linear proportion with the
number of ports removed. Therefore, the more cores exist
in the application, the more global links are removed by
partitioning, and the more routers area is saved. As a result,
3
4
4
3
3
4
4
3
C1
70
C2
362
C3
362
362
C7
4
5
5
3
4
4
5
3
353
357
C6
300
C8
27
C5
C4
C9
49
0.5
910
C16
C10
C13
C11
32
16
C9
190
C12
16
C4
173
C1
0.5
670
16
313
C5
600
C5
64
C2
60
C7
C1
128
64
C6
C3
64
C3
C2
64
64
313
4
3
4
5
5
5
4
3
500
94
C10
3
4
3
3
4
(a) before partitioning
4
4
3
(b) after partitioning
Circles represent routers and rectangles represent local processing cores.
Number in each circle represents the number of ports of the router in this node.
Figure 2: Mesh implementation of a 16-node application before and after partitioning.
Table 1: Number of routers and total routers area
before and after partitioning of a 16-node application, mapped onto mesh topology, as shown in Figure 2.
Before
partitioning
After
partitioning
5-port
Router
4
4-port
Router
8
3-port
Router
2
Total routers
area (µm2 )
1,332,000
2
6
8
1,234,000
network partitioning is to be more useful in reducing the area
cost as the number of cores in the application increases.
4.
C11
GRAPH THEORY REPRESENTATION OF
APPLICATIONS
Graph theory concepts could be used to represent ASNoC
applications. The application description and traffic characteristics are assumed to be statically analyzed and known
a priori in this paper. Therefore, any application could be
described by a core graph [20, 21]. Graph partitioning algorithms could be then applied to this graph. Core graph
represents the processing elements in the application and the
traffic between them. As shown in Figure 3, a core graph
is a directed graph G(C, W), where each vertex ci ∈ C represents a core in the application, and the directed edge wij
∈ W represents the communication between cores ci and cj .
The weight of the edge expresses the number of packets per
time step transfered from core ci to core cj . Using ASNoC
terminology, a core graph could be represented in the form
of a traffic distribution matrix (λ) [10]. If there are n cores
in the application, the dimension of the matrix is n×n. Any
element λij in the matrix represents the weight of the edge
wij . That is, the number of packets sent from core ci to
core cj per time step. For example, the traffic distribution
matrix for the PIP application shown in Figure 3(c) is represented as:
157
16
16
16
C15
16
C14
500
C12
(a)
250
C8
40
40
C8
64
C7
64
C4
C6
(c)
(b)
Figure 3: Examples of core graphs (a) VOPD [20],
[24] (b) MPEG-4 [10], [21] (c) PIP [4], [20].
5.
PROBLEM FORMULATION
The problem of network partitioning is known to be an
NP-hard one [6]. Therefore, partitioning is usually formulated as an optimization problem. FM algorithm is adopted
with modification to formulate our partitioning technique.
Therefore, it is given the name ASNoC-FM. This section
presents the mathematical formulation of the partitioning
scheme used in this paper. Using ASNoC notations, the objective function to be minimized is first formulated in Subsection 5.1 according to the FM algorithm. Two constraints
are then added to the algorithm in Subsection 5.2 to match
the ASNoC design environment.
FM algorithm is selected because of its superior performance over other partitioning techniques [6]. The algorithm
aims at minimizing the total weights of those edges cut due
to partitioning. Using ASNoC terminology, the FM algorithm results in minimizing the inter-partition traffic. The
advantage of this minimization is fourfold. First, it ensures
that the cores that communicate with each other frequently
are within the same partition. Second, this minimization
avoids creating bottlenecks in the ASNoC through the interpartition links. Third, it reduces the required buffering for
those nodes located at the boundary of partitions. Fourth,
minimizing the inter-partition traffic proves useful in mitigating the effect of partitioning on the average delay, as
explained in Section 6.
5.1
Objective Function Formulation
In order to formulate the partitioning problem mathematically, suppose that there are n cores in the application and
it is required to divide them into m partitions, P1 · · ·Pm . The
number of cores within any partition, Pk , is nk . Accordingly,
the total traffic (λtotal ), in packets/time step, exchanged between all the cores within the application, is expressed as:
λtotal =
n
n X
X
λij
(1)
i=1 j=1
For any partition, Pk , the total traffic within the partition
(λk ), in packets/time step, is expressed as:
λk =
n
n X
X
λij
, ∀ {i, j} where ci , cj ∈ Pk
(2)
i=1 j=1
0
128
0
0
λ=
0
0
0
0
0
0
0
0
0
0
0
0
0
64
0
0
0
0
0
0
0
0
64
0
0
0
0
0
64
0
0
0
0
0
0
0
0
0
0
0
64
0
0
0
0
0
0
64
0
64
0
0
0
0
0
0
0
0
64
0
Similarly, for any partition, Pk , the total traffic sent from
its nodes to all nodes in other partitions (λ̄k ), in packets/time step, is expressed as:
λ̄k =
n X
n
X
i=1 j=1
λij
, ∀ {i, j} where ci ∈ Pk and cj ∈
/ Pk (3)
The total traffic (λtotal ) depends on the application itself
and will not be affected by partitioning. In contrast, the
intra-partition traffic (λk ) and the inter-partition traffic (λ̄k )
change according to the employed partitioning scheme. As
explained in Section 5, FM algorithm aims at minimizing the
total traffic between all partitions. As a result, the overall
inter-partition traffic (λinter ), in packets/time step, to be
minimized as an objective function is expressed as:
λinter =
m
X
λ̄k
(4)
k=1
5.2
Constraints Formulation
The FM optimization algorithm is originally proposed as
a graph partitioning technique. It is going to be modified
it in this subsection by adding some constraints to make it
more suitable for the on-chip environment. The first constraint to be added to the FM algorithm is to divide the
total ASNoC traffic equally between partitions, i.e., λ1 =
· · · = λm = λtotal /m. This traffic-balance constraint ensures equal activity in each partition and avoids creating
hot spots in the design. As the traffic for all partitions cannot exactly be the same, a more realistic unbalance traffic
factors, uλ1 · · · uλm , are to be used. For any partition (Pk ),
its traffic factor (uλk ) is expressed as a percentage of the
total traffic (λtotal ). The choice of these traffic factors gives
the designer the flexibility to allow for some traffic variations
between partitions. Although these factors are applicationdependent and no specific values could be proposed for them,
they should be carefully chosen. Very large factors produce
completely unbalanced partitions, whereas no result might
be obtained with very small factors. Using these unbalance
traffic factors, the first constraint is expressed as:
λtotal
± uλk · λtotal , ∀ 1 ≤ k ≤ m
(5)
m
The second constraint to be added to the FM optimization
algorithm is to divide the cores equally between partitions,
i.e., n1 = · · · = nm = n/m. This core-balance constraint
is to facilitate the floorplanning. Similar to the first constraint, unbalance core factors, un1 · · · unm , are used to allow
for some variation between the number of cores within partitions. For any partition (Pk ), its core factor (unk ) represents
the number of cores above or below the average number of
cores for all partitions. According to the number of cores
within the application, each factor is usually either 0 or 1.
Using these unbalance core factors, the second constraint is
expressed as:
jnk
nk =
± unk , ∀ 1 ≤ k ≤ m
(6)
m
To this end, our ASNoC-FM optimization problem is completely formulated as:
M inimize :
m
X
λinter =
λ̄k
(7)
λk =
k=1
Subject to :
λk − (
1
± uλk ) · λtotal = 0
m
,∀ 1 ≤ k ≤ m
(8)
,∀ 1 ≤ k ≤ m
(9)
and
nk −
jnk
m
= ±unk
where n is the total number of cores in the application, nk
is the number of cores in a specific partition (Pk ), and m is
the number of partitions. λij is the amount of traffic sent
from core ci to core cj . uλk and unk are the traffic and core
unbalance factors, respectively. λk and λ̄k are the intrapartition and the inter-partition traffic for partition (Pk ),
as expressed in (2) and (3), respectively. Finally, λtotal and
λinter are the total and the total inter-partition traffic, as
expressed in (1) and (4), respectively.
The ASNoC-FM algorithm starts with an arbitrary division of cores between partitions. Cores within any partition are ranked according to the amount of communication
with other partitions. The core with the highest rank is
then moved to the partition with which it communicates
the most. The move is only confirmed and carried out if it
reduces the inter-partition traffic according to (7) and meets
the constraints in (8) and (9). The ranking process is then
repeated and the algorithm continues with all other cores to
partition the cores with the minimum inter-partition traffic.
ASNoC-FM finally produces the cores within each partition
corresponding to this minimum traffic.
6.
EXPERIMENTAL RESULTS
As a proof of concept, the ASNoC-FM algorithm is applied
to three applications with different number of cores: Video
Object Plane Decoder (VOPD) with 16 cores, as shown in
Figure 3(a) [20, 24], MPEG-4 decoder with 12 cores, as
shown in Figure 3(b) [21, 24], and a Picture-In-Picture (PIP)
with 8 cores, as shown in Figure 3(c) [4, 20]. As the number
of cores is not large, the number of partitions is chosen to
be two. The software package, Chaco [12], is used to carry
out the partitioning. NMAP is employed for the mapping
because of its superior performance over other mapping techniques [20]. Mesh topology is used for all partitions. The
reason behind mesh selection is threefold. First, mesh is
proved to outperform other ASNoC topologies with respect
to area and power consumption [4, 23]. Second, most of
practical NoC implementations are achieved with grid-based
topologies, like mesh and torus [3]. Third, mesh topology
could be easily implemented inside chips.
Total routers area and the number of global links resulted
from using ASNoC-FM algorithm are compared to those of a
standard mesh topology, mapped also by NMAP algorithm.
Long-range links are then inserted into the mesh according
to the methodology proposed in [22] and discussed in the
related work. The resultant topology is given the name custom long-range topology. Total routers area and the number
of global links of the long-range topology are also included
in the comparison. These results are shown in Figures 4 and
5, respectively. Using ASNoC-FM, routers area is reduced
by 7.46%, 6.86%, and 5.53% with respect to standard topology and by 10.75%, 11.05%, and 11.7% with respect to custom long-range topology for the VOPD, MPEG-4, and PIP
applications, respectively. Similarly, the number of global
links are reduced by 12.5%, 11.76%, and 10% with respect
to standard topology and by 19.05%, 20%, and 22.2% with
respect to custom long-range topology for the same three
applications, respectively. From these results, it is apparent
that network partitioning is a promising way to reduce ASNoC area especially for applications with large number of
cores. Any ASNoC design faces a trade-off between cost
and performance. Although ASNoC-FM algorithm reduces
both routers area and the number of global links, it increases
5
x 10
12
Routers area (µm2)
1.6
ASNoC−FM
Standard mesh
Custom long−range
Average internode disance (hops)
14
10
8
6
4
2
0
VOPD
MPEG−4
1.2
1
0.8
0.6
0.4
0.2
0
PIP
ASNoC−FM
Standard mesh
Custom long−range
1.4
VOPD
Application
MPEG−4
Figure 4: Routers area for different topology generation techniques.
Figure 6: Average internode distance for different
topology generation techniques.
Number of global links
Average internode distance (hops)
3.5
25
ASNoC−FM
Standard mesh
Custom long−range
20
15
10
5
VOPD
MPEG−4
2.5
2
1.5
1
0.5
the average delay of the ASNoC. The zero-load average delay of ASNoC is usually expressed by the average internode
distance [22]. To see the effect of using ASNoC-FM on the
delay, the average internode distances for the three applications are calculated and plotted in Figure 6. The average
internode distances are increased by 1.81%, 1.48%, and 20%
with respect to standard topology and by 9.1%, 11.84%,
and 25% with respect to custom long-range topology for the
VOPD, MPEG-4, and PIP applications, respectively.
The increase in average delay comes from the fact that the
traffic to or from some nodes, other than those on the boundary, in a specific partition needs more hops to reach the destination nodes in other partitions. Consequently, minimizing
the inter-partition traffic by using ASNoC-FM makes the increase in the average delay minimum. To validate this point,
Figure 7 represents the average internode distances resulting
from using other partitioning schemes. It is apparent that
ASNoC-FM completely outperforms random and scattered
partitioning schemes. Moreover, it is better than the spectral scheme by a ratio up to 1.95%. This result emphasizes
that ASNoC-FM has the minimum effect on average delay.
From the results above, it is apparent that application requirements and design constraints are to control the ASNoC
topology generation. From one hand, previous results prove
that network partitioning methodology outperforms other
standard and custom topology generation methods from the
area-cost point of view in the expenses of the delay. On the
VOPD
MPEG−4
PIP
Application
PIP
Applicaion
Figure 5: Number of global links for different topology generation techniques.
ASNoC−FM
Spectral
Scattered
Random
3
0
0
PIP
Application
Figure 7: Comparison between average internode
distances for different partitioning schemes.
other hand, long-range link insertion is a powerful method to
reduce the delay in the expenses of the area-cost. Finally, if
both area and delay are of equal importance, a compromise
is required to select the most suitable topology generation
method according to the application requirements and the
design constraints.
7.
CONCLUSION AND FUTURE WORK
ASNoC is an efficient paradigm to solve the on-chip communication problem in modern SoC applications. ASNoC
area should be kept as small as possible. In this paper, we
analyzed how the area of the on-chip network could be reduced using network partitioning techniques. The smaller
partitions require less number of ports and buffering, which
constitute the main parts of the on-chip area. The formulation of the partitioning problem is presented by adopting
and modifying the FM graph partitioning algorithm. Results prove that network partitioning is an effective way to
reduce the area especially for applications with large number of cores. We plan to extend this work in different ways.
First, we will try to develop some methods to compensate
for the increase in average delay resulting from using network partitioning. Second, we are going to investigate and
model the optimum number of partitions and the optimum
number of nodes per partition. Third, we will target some
of the reliability issues resulting from using network partitioning. Finally, we will try to combine partitioning with
3D NoCs to better enhance the performance.
8.
REFERENCES
[1] T. Ahonen, D. Siguenza-Tortosa, H. Bin, and
J. Nurmi. Topology optimization for
application-specific networks-on-chip. In Proceedings
of the 2004 ACM international workshop on System
level interconnect prediction (SLIP’04), pages 53–60,
Paris, France, Feb. 14–15, 2004.
[2] L. Benini. Application specific NoC design. In
Proceedings of the IEEE Design, Automation and Test
in Europe Conference (DATE’06), volume 1, pages
1–5, Munich, Germany, Mar. 6–10, 2006.
[3] L. Benini and G. D. Micheli. Networks on chips:
technology and tools. Morgan Kaufmann, San
Francisco, CA, USA, July 2006.
[4] D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar,
S. Stergiou, L. Benini, and G. D. Micheli. NoC
synthesis flow for customized domain specific
multiprocessor systems-on-chip. IEEE Transactions on
Parallel and Distributed Systems, 16(2):113–129, Feb.
2005.
[5] L. Bononi and N. Concer. Simulation and analysis of
network on chip architectures: Ring, spidergon and 2D
mesh. In Proceedings of the IEEE Design, Automation
and Test in Europe Conference (DATE’06), volume 2,
pages 154–159, Munich, Germany, Mar. 6–10, 2006.
[6] B. Chamberlain. Graph partitioning algorithms for
distributing workloads of parallel computations.
Technical Report UW-CSE-98-10-03, University of
Washington, WA, USA, Oct. 1998.
[7] P. Chan, M. Schlag, and J. Zien. Spectral k-way
ratio-cut partitioning and clustering. IEEE
Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 13(9):1088–1095, Sept. 1994.
[8] M. Coenen, S. Murali, A. Radulescu, K. Goossens, and
G. D. Micheli. A buffer-sizing algorithm for networks
on chip using TDMA and credit-based end-to-end flow
control. In Proceedings of the Third IEEE/ACM/IFIP
International Conference on Hardware/Software
Codesign and System Synthesis (CODES+ISSS’06),
pages 130–135, Seoul, Korea, Oct. 22–25, 2006.
[9] H. Elmiligi, A. A. Morgan, M. W. El-Kharashi, and
F. Gebali. A topology-based design methodology for
networks-on-chip applications. In Proceedings of the
second International Design and Test Workshop
(IDT’07), pages 61–65, Cairo, Egypt, Dec. 16–18,
2007.
[10] H. Elmiligi, A. A. Morgan, M. W. El-Kharashi, and
F. Gebali. Power-aware topology optimization for
networks-on-chips. In Proceedings of the IEEE
International Symposium on Circuits and Systems
(ISCAS’08), pages 360–363, Seattle, WA, USA, May
18–21, 2008.
[11] C. Fiduccia and R. Mattheyses. A linear-time
heuristic for improving network partitions. In
Proceedings of the 19th Annual IEEE/ACM Design
Automation Conference (DAC’82), pages 175–181, Las
Vegas, NV, USA, June 14–18, 1982.
[12] B. Hendrickson and R. Leland. The Chaco user’s
guide: Version 2. Technical Report SAND95-2344,
Sandia National Laboratories, Albuquerque, NM,
USA, July 1995.
[13] J. Hu and R. Marculescu. Energy-aware mapping for
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
tile-based NoC architectures under performance
constraints. In Proceedings of the IEEE Asia and
South Pacific Design Automation Conference
(ASP-DAC’03), pages 233–239, Kitakyushu, Japan,
Jan. 21–24, 2003.
A. Jantsch. NoCs: A new contract between hardware
and software. In Proceedings of the Euromicro
Symposium on Digital System Design (DSD’03), pages
10–16, Belek-Antalya, Turkey, Sept. 1–6, 2003.
G. Karypis and V. Kumar. Multilevel algorithms for
multi-constraint graph partitioning. In Proceedings of
the 1998 IEEE/ACM Conference on Supercomputing
(SC’98), pages 1–17, San Jose, CA , USA, Nov. 7–13
1998.
B. Kernighan and S. Lin. An efficient heuristic
procedure for partitioning graphs. Bell System
Technical Journal, 49(2):291–307, Feb. 1970.
N. Koziris, M. Romesis, P. Tsanakas, and
G. Papakonstantinou. An efficient algorithm for the
physical mapping of clustered taskgraphs onto
multiprocessor architectures. In Proceedings of the
Eighth IEEE Euromicro Workshop on Parallel and
Distributed Processing (EURO-PDP’2000), pages
406–413, Rhodos, Greece, Jan. 19–21, 2000.
B. Krishnamurthy. An improved min-cut algorithm for
partitioning VLSI networks. IEEE Transactions on
Computers, 33(5):438–446, May 1984.
F. Li, Y. Xiao, and Q. Zhang. Graph partition based
traffic balance in industrial network. In Proceedings of
the IEEE International Conference on
Communications and Networking in China
(CHINACOM’07), pages 438–442, Shanghai, China,
Aug. 22–24, 2007.
S. Murali and G. D. Micheli. Bandwidth-constrained
mapping of cores onto NoC architectures. In
Proceedings of the IEEE Design, Automation and Test
in Europe Conference and Exhibition (DATE’04),
volume 2, pages 896–901, Paris, France, Feb. 16–20,
2004.
S. Murali and G. D. Micheli. SUNMAP: A tool for
automatic topology selection and generation for NoCs.
In Proceedings of the 41st IEEE/ACM Design
Automation Conference (DAC’04), pages 914–919,
San Diego, CA, USA, June 7–11, 2004.
U. Ogras and R. Marculescu. It’s a small world after
all: NoC performance optimization via long-range link
insertion. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 14(7):693–706, July 2006.
P. Pande, C. Grecu, M. Jones, A. Ivanov, and
R. Saleh. Performance evaluation and design trade-offs
for network-on-chip interconnect architectures. IEEE
Transactions on Computers, 54(8):1025–1040, Aug.
2005.
W. Shen, C. Chao, Y. Lien, and A. Wu. A new
binomial mapping and optimization algorithm for
reduced-complexity mesh-based on-chip network. In
Proceedings of the IEEE International Symposium on
Networks-on-Chip (NOCS’07), pages 317–322,
Princeton, NJ, USA, May 7–9, 2007.