Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

SMT-Based Contention-Free Task Mapping and Scheduling on 2D/3D SMART NoC with Mixed Dimension-Order Routing

Published: 06 December 2021 Publication History

Abstract

SMART NoCs achieve ultra-low latency by enabling single-cycle multiple-hop transmission via bypass channels. However, contention along bypass channels can seriously degrade the performance of SMART NoCs by breaking the bypass paths. Therefore, contention-free task mapping and scheduling are essential for optimal system performance. In this article, we propose an SMT (Satisfiability Modulo Theories)-based framework to find optimal contention-free task mappings with minimum application schedule lengths on 2D/3D SMART NoCs with mixed dimension-order routing. On top of SMT’s fast reasoning capability for conditional constraints, we develop efficient search-space reduction techniques to achieve practical scalability. Experiments demonstrate that our SMT framework achieves 10× higher scalability than ILP (Integer Linear Programming) with 931.1× (ranges from 2.2× to 1532.1×) and 1237.1× (ranges from 4× to 4373.8×) faster average runtimes for finding optimum solutions on 2D and 3D SMART NoCs and our 2D and 3D extensions of the SMT framework with mixed dimension-order routing also maintain the improved scalability with the extended and diversified routing paths, resulting in reduced application schedule lengths throughout various application benchmarks.

1 Introduction

Network-on-Chip (NoC) is a widely used interconnection fabric that provides a highly scalable low-latency on-chip communication solution for multiprocessor system-on-chips (MPSoCs) [6]. However, since communications between cores in a regular NoC are achieved by routing messages hop-by-hop from the source to the destination, their effectiveness at non-local communications quickly diminishes due to long on-chip latencies, which degrades performance and limits the flexible usage of on-chip resources. Delays due to router pipelines, queuing, and serialization all contribute towards a much longer on-chip latency than an ideal point-to-point interconnect.
Recently, an NoC design called SMART (Single-cycle Multi-hop Asynchronous Repeated Traversal) NoC [14, 15] has been proposed that enables a flit to traverse many router hops within a single clock cycle, potentially from the source all the way to the destination. Such a multi-hop traversal is made possible by utilizing efficient router-bypass mechanisms and properly engineered wires with asynchronous repeaters. In particular, router data-paths can not only be dynamically but also statically [3] configured to enable multi-hop traversal so that flits can bypass the pipelines of intermediate routers entirely, resulting in ultra-low latency performance. Although SMART offers outstanding advantages, the performance benefits can only be fully realized if there is no contention among flows that share common links along their routing paths. When contention occurs, bypass paths must terminate early, and the corresponding flits must be stopped and buffered at intermediate routers for arbitration, degenerating in the worst-case to hop-by-hop communication. Without proper contention management, the benefits of SMART can easily vanish.
In this article, we address the aforementioned contention problem by developing an SMT (Satisfiability Modulo Theories)-based contention-free task mapping and scheduling framework for the embedded computing application in which an application task graph can be statically compiled to a multi-core or parallel processing platform for non-preemptive execution based on 2D and 3D SMART NoC architectures. In particular, tasks are mapped to processors and scheduled to ensure contention-free routing of all messages over a SMART NoC from their source to their destination in a single cycle. Our SMT formulation can find theoretically optimal solutions that minimize the application schedule length. In contrast to prior ILP-based formulation for 2D SMART NoCs [27], our SMT formulation is substantially more compact, thanks in part to SMT’s expressive power in capturing conditional constraints that constitute a large proportion of the task mapping and scheduling problem. Combined with efficient search-space reduction techniques, our SMT formulation is considerably more scalable, enabling our framework to find optimal solutions for far larger problem instances with dramatically faster runtimes.
In addition, we extend our SMT-based formulation to consider the 3D SMART NoC case. Three-dimensional (3D) IC integration is becoming increasingly important, and correspondingly, 3D NoCs are emerging as promising solutions that can deliver lower latency, higher throughput, and reduced energy consumption in comparison with their 2D counterparts [9]. In particular, recent SMART NoC advances [1, 5, 13] have substantially reduced the wiring and area overhead of SMART NoCs to enable 3D extensions. Moreover, the emergence of monolithic 3D (M3D) integration has opened up new possibilities for designing SMART 3D NoC architectures with monolithic inter-tier vias (MIVs) for vertical interconnections [17], which have much smaller dimensions than the more popular through silicon vias (TSVs) [25]. Further, a 3D SMART NoC can potentially operate at higher clock frequencies due to the utilization of vertical interconnects, which results in a reduction of the effective physical link distances [13]. In the context of our task mapping and scheduling problem, a 3D SMART NoC provides greater path diversity, which makes it easier for our SMT-based formulation to find better optimal solutions with contention-free routing.
The main contributions of our work are as follows:
We propose an SMT-based contention-free task mapping and scheduling framework for 2D/3D SMART NoCs, including support for mixed dimension-order routing.
We devise a concise model under SMT’s support for expressive modeling, resulting in the fast reasoning of conditional constraints.
We develop efficient search-space reduction techniques, e.g., adaptive boundary condition and breaking design symmetry to further improve scalability.
We demonstrate that our framework achieves smaller formulation complexity with 16.6×/ 12.1× and 26.2×/ 18.7× reductions of variables and constraints on average and 10× higher scalability with 931.1× (ranges from 2.2× to 1532.1×) and 1237.1× (ranges from 4× to 4373.8×) faster average runtimes for finding optimum solutions on 2D and 3D SMART NoCs, respectively.
Our experiments further demonstrate that the 3D extension with mixed dimension-order routing not only maintains the improved scalability but also helps to reduce the application schedule length by exploiting the greater path diversity.
The rest of this article is organized as follows: Section 2 provides preliminary information. Section 3 presents our SMT formulation and search-space reduction techniques for improving scalability. Section 4 validates the proposed frameworks with extensive experimental results. Section 5 outlines additional related work. Section 6 concludes the article.

2 Preliminaries

In this section, we first describe the basic concept of SMART and its latency model and define our problem. Then we review the expressiveness of SMT for conditional constraints.

2.1 Communication Latency in SMART NoC-based MPSoC

Fig. 1.
Fig. 1. SMART router microarchitecture.
Fig. 2.
Fig. 2. Multi-hop Traversal over a SMART-hop path.
This section describes the basic concept and communication latency model of SMART NoC. Figure 1 depicts the microarchitecture of a 5-port SMART router for a mesh network. For simplicity, only the (), (), and () ports are shown in detail.1 All other input ports are identical to , and all other output ports are identical to .
In a SMART NoC, asynchronous repeaters, replaced with conventional clocked link drivers at every hop allow a flit to traverse multiple hops in a single clock cycle. An alternative data-path in each router allows a flit to bypass the entire router pipeline and go directly to the next router. The bold line going from to in Figure 1 illustrates this bypass operation. An example of a multi-hop traversal (i.e., SMART-hop) is depicted in Figure 2. A flit travels three hops from router R20 to R23 within a single-cycle via a SMART-hop path created by appropriately controlled , , and at intermediate routers.
To set up SMART-hops, SMART performs a two-stage switch allocation: local switch allocation (SA-L) and global switch allocation (SA-G). In the SA-L stage, buffered flits at a start router arbitrate among themselves to gain access to the output ports. For each winning flit, the start router broadcasts a SMART-hop Setup Request (SSR), which carries the information about the route. Each SSR is sent through dedicated multi-drop wires that are repeated from the start router to all intermediate routers up to hops away. refers to the maximum number of hops that a flit can traverse in a single cycle. Upon receiving the SSRs, the recipient routers set up the control signals (i.e., , , and ) to operate in bypass or stop mode in SA-G stage. When multiple SSRs from multiple start routers arrive at the same time (i.e., contention on the bypass path), only a winning flit determined by priority policies at the recipient router can proceed and bypass the intermediate routers. Thus, the transmission latency of the SMART-hop path can be formulated as inspired by [27]:
(1)
where and respectively denote the number of stages in a start router and the link latency between two routers; refers to the amount of data units (i.e., flits); denotes the number of links having flit contention on the bypass path; refers to the delay due to the contention. In this article, we employ contention-free task mapping and scheduling. Consequently, the communication latency without contention (i.e., = 0) can be expressed as:
(2)

2.2 Problem Definition

Fig. 3.
Fig. 3. Example task graph (TG) and two-dimensional/three-dimensional mesh topology graph.
We represent an application as a task graph (TG) which is a directed acyclic graph (DAG), , where T is a set of all tasks in the application and E is a set of edges , representing precedence relations between tasks and . In our problem, the target application domain is embedded computing in which an application task graph can be statically compiled to a multi-core or parallel processing platform for non-preemptive execution, and tasks correspond to the blocks of significant computations, like signal processing tasks, not at the level of individual instructions. Each task and edge are respectively associated with the task execution time and the amount of data transferred between each pair of tasks. For example, Figure 3(a) shows an example task graph with five tasks and five edges including the task execution time and the amount of data.
Fig. 4.
Fig. 4. Example of task mapping and scheduling.
The target architecture is SMART NoC-based homogeneous MPSoCs [14, 15] with 2D/3D mesh topology graph (MTG), , where each node represents a process element (PE) with a router and L is a set of edges representing bi-directional communication paths between adjacent PEs. Figure 3(b) and Figure 3(c), respectively, show an example 3 × 3 2D mesh topology with 9 PEs and 3 × 3 × 2 3D mesh topology with 18 PEs. Within this topology, each PE can accommodate more than one task. The location of assigned PEs for each task is determined by a pair of coordinates () and () for 2D and 3D. If two consequential tasks are mapped to the same PE, data transmission can be skipped without spending transmission latency.
The scheduling of tasks is performed by determining the release/completion time of task execution and data transmission in such a way to achieve the minimum application schedule length. Note that we assume static task execution time and a fixed amount of data for each transmission. Also, we assume that tasks are non-preemptive and no deadline requirements are enforced in our application. Therefore, for a pair of tasks having precedence relation, the produced data from task u is transmitted to the subsequent task v after the completion of the task u.
We assume XY-routing in 2D topology and XYZ-routing in 3D topology as a default routing path. In this work, to achieve better latency through the extended path diversification, we explore the impact of mixed dimension-order routing on the application schedule length (). For 2D and 3D topology, we employ all possible routing paths (i.e., XY/YX routing paths for 2D, also known as O1TURN [23], and XYZ/YXZ, ZXY/ZYX, and XZY/YZX routing paths for 3D). Then, in our formulation, we decide one of the dimension-order routing paths that can guarantee flit contention-free routing by detecting and avoiding temporal and spatial overlaps of SMART-hop paths. When we inject the traffic to the network in the scheduled time slot, we only activate the SSR signals along the path we already have decided.
Figure 4 illustrates the impact of the flit contention on application schedule lengths. For the same example TG in Figure 3(a), the mapping and scheduling solution in Figure 4(a) provides a larger than that of Figure 4(b) due to the flit contention on its routing paths (i.e., / and /). Based on the above definitions, our task mapping and scheduling problem can be defined as:
Given a TG and MTG, find a contention-free mapping and scheduling function from TG to MTG so that the overall end-to-end latency of the designed NoC application (i.e., application schedule length ) is minimized.

2.3 Expressiveness of SMT for Conditional Constraints

SMT provides the feature of OMT (Optimization Modulo Theories) [2] to obtain an optimal solution, on top of the efficient problem-solving ability of SAT. Besides, SMT provides much more expressive modeling language (e.g., “if-then-else” for the “Either-Or” constraint, built-in Boolean cardinality functions such as “at-most k” and “at-least k”, etc.) than is possible with SAT or ILP formulas. For example, to formulate a simple conditional constraint, “”, SMT only requires a single-statement as:
(3)
while ILP needs six constraints with one auxiliary variable as:
(4)
where M is a large constant chosen as an upper bound on , is a small number stating the tolerance for when a is considered to exceed b, and z is an auxiliary Boolean (0,1) variable which indicates if . Therefore, as the conditional constraints become more complicated, more auxiliary variables and the corresponding constraints are necessary for the ILP formulation. Since our problem contains several complex conditional constraints as well as Boolean decision variables that have a significant impact on exploring the feasible solutions, SMT can solve our problem more efficiently than ILP under (i) the powerful expressiveness (i.e., the lower formulation complexity) and (ii) the faster reasoning ability of SAT solver.

3 SMT Formulation for Joint Task Mapping, Scheduling, and SMART Routing

In this section, we describe basic SMT formulation and scalability improvement constraints for 2D/3D SMART NoC. We formulate task mapping and scheduling of 2D/3D SMART NoC as a constraint satisfaction problem (CSP) with variables and constraints. Thus, the release time of task execution and a data transmission (i.e., scheduling), as well as the task assignment (i.e., task mapping), are both determined by our constraints. The formulations that can be adopted for both 2D and 3D mesh topologies by simple adjustments of the conditions related to the corresponding coordinate variables (i.e., x, y, and z) are expressed based on the 3D mesh topology. For the constraints which have to be carefully revised according to the dimension of topology and routing schemes, we provide separate expressions and algorithms. The notations are shown in Table 1.
Table 1.
TermDescription
Set of Tasks, Edges, and PEs (processing elements)
ttth task
eeth edge
ppth PE
x/y-coordinates of a processor on which a task t is mapped (for 2D)
x/y/z-coordinates of a processor on which a task t is mapped (for 3D)
A directed edge from task u to v,
The execution time of task t
The amount of data transferred on edge e
The release/completion time of task t
The release/completion time of data transmission on
0-1 indicator if edges , have a transmission time overlap
0-1 indicator if edges , have shared routing paths
The type of dimension-order routing on edge e (for mixed routing, 0-1 for 2D/0-5 for 3D)
Table 1. Notations for the Proposed SMT Formulation

3.1 Basic Formulation

3.1.1 Objective.

As described in Section 2.2, our goal is to find a contention-free mapping and scheduling solution so that the overall end-to-end latency of the designed NoC application (i.e., application schedule length ) is minimized. Thus, our objective function minimizes the maximum completion time of tasks that have no out-going edges.
(5)

3.1.2 Boundary Condition for Processor Coordinates.

Given an m × n × l PE tiles, the x/y/z-coordinates of a task t are bounded by m/n/l, respectively. Each task can only be mapped to one PE on (, , ), while each PE can accommodate multiple tasks without limitation on the number of assigned tasks.
(6)

3.1.3 Scheduling Constraints of Tasks.

Constraint (7) represents the quantitative timing relation of tasks. The source task u between two consequential tasks have to be released prior to the destination task v
(7)

3.1.4 Non-Overlap of Tasks.

No pairs of tasks , mapped on the same PE, can overlap:
(8)

3.1.5 Scheduling Constraints of Data Transmissions.

Constraint (9) represents the quantitative timing relation between tasks and data transmissions. We assume that the data transmission can be released and completed in any time slot between the completion of the precedent task and the release of the succeeding task. If the source and destination tasks are mapped on the same PE, the data can be directly transmitted without spending additional latency. Otherwise, the minimum data transmission latency is required as expressed in Constraint (10).
(9)
(10)

3.1.6 Non-Overlap of Data Transmission.

No pairs of data transmissions on edges can overlap in time and space at the same time.
(11)

3.1.7 Overlap of Data Transmission in Time.

Constraint (12) determines the overlap of data transmissions on any pairs of edges in time:
(12)

3.1.8 Overlap of Data Transmission in Space.

The overlap of data transmissions in space is determined by the shared routing paths between any pairs of edges . Since we assume bi-directional links, only the shared links in the same direction are detected as an overlap. Constraint (13) determines the horizontal and vertical sharing of 2D routing paths under XY-routing. Constraint (14) determines the sharing of 3D routing paths under XYZ-routing. For simplification, detailed constraints for detecting overlaps in the x and y directional links that are the same as the Constraint (13) are not expressed in Constraint (14).
(13)
(14)

3.1.9 Non-Overlap of Data Transmission on the Same PE.

No pairs of outgoing data transmissions from the same PE with different source tasks on edges can overlap.
(15)

3.1.10 Maximum Number of Hops (.

The number of hops between any source/destination pairs of tasks have to be less than or equal to . Manhattan distance is used to estimate the number of hops between two consequential tasks as expressed in Constraint (16)
(16)

3.2 Scalability Improvements

3.2.1 Adaptive Boundary Condition.

The adaptive boundary condition reduces the search-space by narrowing the feasible time-ranges of task release-time variables. For each task, we set the low-bound as the maximum sum of task execution times on the paths from any starting tasks (i.e., tasks with no incoming edges) to the target task. For example, in Figure 5, the low-bound of is defined as the sum of and ’s execution times. To set the upper bound, we first respectively define feasible solution boundaries and as the sum of task execution times on the longest path and the sum of and the offset. The offset is empirically determined by examining the maximum gap between and the estimated maximum upper-bound of each test case.2 Note that the offset needs to be increased if there exists a specific test case that has the larger actual upper-bound than +offset (i.e., infeasible condition). Since our goal is minimizing the application schedule length, the offset does not affect the optimality of solutions if there exist any feasible solution with the given offset. In this work, we set the offset to 500 satisfying all the experimental cases. Once the and the offset are determined, the upper bound of each task is defined as a subtraction of the maximum sum of task execution times on the reversed paths from any finishing tasks (i.e., tasks with no outgoing edges) to the target task from . For example, the upper bound of in Figure 5 is the sum of , , and ’s execution times.
Fig. 5.
Fig. 5. Example of adaptive boundary condition.
Fig. 6.
Fig. 6. Example of the symmetric task mappings on 2D m × n topology.
Fig. 7.
Fig. 7. Restriction of PE assignments for excluding symmetric mapping patterns on 2D/3D topologies.

3.2.2 Breaking Design Symmetry.

The breaking design symmetry constraint excludes redundant exploration of the symmetric solutions by restricting PE assignments of tasks to the specific region, resulting in the reduction of search-space. Figure 6 depicts examples of symmetric task mapping patterns in an m × n 2D mesh topology. The example mapping in Figure 6(a) has several symmetric task mappings that are equivalent to their rotated (i.e., Figure 6(b)) and flipped shapes (i.e., Figure 6(c)). In an asymmetric topology (i.e., ), the symmetric cases are limited to the rotation in 180 and flip in horizontal/vertical direction.
To exclude symmetric mapping patterns, we first sort task elements by the descending order of the number of incoming/outgoing edges of each task so that the following symmetry breaking constraint can reduce as many search-spaces of related tasks as possible. Then, we set the PE assignment boundary condition for the first task element to be assigned to the specific region of the topology as illustrated in Figure 7. For the remaining task elements, we recursively set the conditional constraints to keep excluding symmetric cases even if previously constrained tasks are mapped to the PEs on a line or plane that can cut the entire topology in half. Algorithm 1 describes conditional constraints excluding symmetric task mapping cases for 2D/3D mesh topology. Given a sorted queue of tasks in T as descending number of in/out-degree, we first set the boundary condition for the PE assignment of the first task element (i.e., the task with the largest in/out-degree) to the 4th octant (Lines 4–5). Then, we recursively set conditional constraints for the next task elements according to the intermediate PE assignments of previous task elements. If previous task elements are assigned on a diagonal line or plane in the topology with symmetric XY-plane (i.e., ), the location of the next task elements is restricted to the half area divided by the diagonal line or plane (Lines 7–9). When previous task elements are on one of horizontal line or plane (i.e., x, y, and z directions) dividing the entire topology in half, the location of the next task elements is restricted to the half area divided by the horizontal line or plane (Lines 10–18).

3.3 Mixed 2D/3D Dimension-order Routing

3.3.1 Overlap of Data Transmission in Space for 2D Mixed Routing.

For 2D NoC, we allow the mixed-use of XY/YX routing paths. Routing type indicator is determined for each edge so that the application schedule length is minimized. is set to 1 if the determined routing type is XY-routing. Otherwise (i.e., YX-routing), is set to 0. Constraint (17) determines the horizontal and vertical sharing of routing paths under mixed XY/YX-routing.
(17)

3.3.2 Overlap of Data Transmission in Space for 3D Mixed Routing.

For 3D NoC, we allow the mixed-use of XYZ/YXZ, ZXY/ZYX, and XZY/YZX routing paths. The routing type indicator is, respectively, set as 0 to 5 for XYZ, YXZ, ZXY, ZYX, XZY, and YZX routing paths. Constraint (18) determines the sharing of links under mixed routing paths. Detailed constraints in the y and z directional links that can be defined similar to the x-directional constraint are not described in the Constraint (18).
(18)

4 Experimental Results

4.1 Experimental Setup

We have implemented the proposed framework in both ILP/SMT formulas including (i) the basic formulations and (ii) the scalability improvement constraints for 2D topology. Then, we have extended SMT framework to 3D topology and implemented mixed dimension-order routing scheme for both 2D/3D frameworks. Our frameworks are validated on a workstation with Intel Xeon E5-2650L at 1.8 GHz and 128 GB memory. The Gurobi (version 9.0.2) [10] and Z3 (version 4.8.5) [7] solvers are used to produce the optimized solutions for ILP and SMT, respectively.
We employ applications from (i) randomly generated cases by TGFF tool [8] and (ii) real benchmarks, including MWD (multi-window display), H263 encoder/MP3 decoder, H263 decoder/MP3 decoder, MP3 encoder/decoder, MMS(multi-media system) [11], Robot (Newton-Euler dynamic control calculation), Sparse (Random sparse matrix solver), and RS-32 encoder (Reed-Solomon code encoder) [20]. We consider 4 × 4, 6 × 6, 8 × 4, 8 × 8, and 16 × 16 2D Mesh and 4 × 4 × 4 and 8 × 8 × 4 3D Mesh for 2D/3D SMART NoC architecture and assume the homogeneous PEs with the same execution efficiency. The random applications are generated with the maximum in/out-degree of 2/2 or 2/3 (suffixed with “_a”). We use HPC = 8 that is the best achievable for 2D configurations when energy is taken into consideration in [14]. may affect the resource utilization if the number of outgoing edges from a task exceeds the number of available PEs within (i.e., ), resulting in the diminished parallelism and performance. In our experiments, the maximum number of outgoing edges from one task is not restricted by = 8 for all cases.
Table 2.
Main Variables#Variables (ILP/SMT 2D/3D)
Tasks (2D) / (3D)
Data Transmissions
Overlap Flags
ConstraintsILP 2DILP 3DSMT 2D/3D
#Var (Aux)#Constraints#Var (Aux)#Constraints#Var (Aux)#Constraints
Timing (tasks)---
Non-overlap (tasks)
Timing (data trans.)
Non-overlap (data trans.)--
Overlap in time
Overlap in space
Non-overlap on same PE
Breaking symmetry
TotalILP 2DILP 3DSMT
#Variables (Main + Aux)
#Constraints
Table 2. Formulation Complexity of ILP and SMT on 2D/3D SMART NoC
1
= #Tasks, = #Edges, #Var (Aux) = # of auxiliary variables.
Table 3.
TestCaseTasksEdges2D3D
#Variables#Constraints#Variables#Constraints
ILPSMTinc.ILPSMTinc.ILPSMTinc.ILPSMTinc.
TotalAuxTotalAux
tgff11091,6751,54513012.9×2,77625610.8×2,4742,33414017.7×3,96225615.5×
tgff2222410,3459,65768815.0×17,3581,48511.7×15,71415,00471022.1×25,5781,48517.2×
tgff3313420,51219,1981,31415.6×34,4062,91811.8×31,15829,8131,34523.2×50,6722,91817.4×
tgff4414333,82631,7702,05616.5×56,6534,70212.0×51,95249,8552,09724.8×84,6554,70218.0×
tgff5516264,80860,6984,11015.8×109,4019,21011.9×100,07895,9174,16124.1×164,0939,21017.8×
tgff202012451,013,036951,96261,07416.6×1,712,204141,18312.1×1,592,7591,531,48461,27526.0×2,624,757141,18318.6×
tgff353514353,171,8622,980,798191,06416.6×5,363,446441,70012.1×4,995,9154,804,500191,41526.1×8,237,709441,70018.7×
tgff505016066,224,0825,854,236369,84616.8×10,519,070862,29612.2×9,823,5199,453,172370,34726.5×16,202,741862,29618.8×
tgff3_a303622,07720,6251,45215.2×37,2953,18211.7×34,19732,7151,48223.1×56,1693,18017.7×
tgff4_a404738,04935,6332,41615.7×64,2045,39911.9×59,29856,8422,45624.1×97,4585,39918.1×
tgff5_a516264,72860,6184,11015.7×109,5529,22511.9×101,38897,2274,16124.4×167,1749,22518.1×
tgff10_a102118239,515225,06514,45016.6×403,23633,47412.0×369,642355,09014,55225.4×604,75733,47418.1×
MWD12122,6832,47920413.2×4,53441111.0×4,1123,89621619.0×6,76641116.5×
H263encMP3dec12122,6672,46320413.1×4,48341010.9×3,9693,75321618.4×6,45241215.7×
MP3encMP3dec13133,1482,91423413.5×5,32347711.2×4,8534,60624719.6×8,00047916.7×
H263decMP3dec14143,6373,37126613.7×6,14754711.2×5,5535,27328019.8×9,13054916.6×
MMS404838,80936,2972,51215.4×66,0015,60211.8×61,45058,8982,55224.1×102,0305,60018.2×
Robot88131263,483245,83917,64414.9×449,88838,62011.6×421,139403,40717,73223.8×700,96938,62018.2×
Sparse9667104,50799,5674,94021.2×164,11713,90711.8×151,931146,8955,03630.2×229,84813,92716.5×
RS-32_28_8_enc2623481,990,3331,867,833122,50016.2×3,354,633277,79512.1×3,187,6363,064,874122,76226.0×5,254,697277,79518.9×
Average16.6×12.1×26.2×18.7×
Table 3. Formulation Complexity Evaluation on 2D/3D SMART NoC
2
inc. = increment ratio (ref. = SMT).
Table 4.
TestCaseTasksEdgesMin. Simulation Runtime(s)
ILPSMTILPSMTSpd.Up
4 × 46 × 68 × 48 × 8Avg.4 × 46 × 68 × 48 × 8Avg.
tgff11091731730.370.730.160.210.370.060.070.070.070.075.5×
tgff2222424024053.21163.9496.1142.8489.031.461.771.271.661.5457.8×
tgff33134248248746.28524.371,332.022,221.271,205.994.034.624.214.384.31279.7×
tgff4414333433410,400.943,396.4012,181.2310,608.739,146.839.7510.0910.1010.049.99915.3×
tgff55162-365t.o.t.o.t.o.t.o.-18.2918.7920.5318.3518.99-
tgff20201245-7641,519.11857.79754.49842.35993.44-
tgff35351435-854t.o.29,894.46t.o.6,209.8818,052.17-
tgff50501606-903t.o.t.o.42,755.0942,755.09-
tgff3_a30362942949,241.543,119.837,450.776,977.986,697.535.636.216.376.276.121094.1×
tgff4_a404736636611,949.997,692.0210,293.863,818.108,438.499.3110.1511.159.7910.10835.6×
tgff5_a5162449449t.o.t.o.30,221.3837,003.0033,612.1920.5921.4524.3621.3521.941532.1×
tgff10_a102118-383t.o.t.o.-414.41125.6271.44145.88189.34-
MWD12122812819.3411.0919.9316.6914.260.840.770.850.790.8117.6×
H263encMP3dec12122352350.760.940.620.620.740.310.390.300.350.342.2×
MP3encMP3dec13132512510.740.450.490.450.530.210.280.190.240.232.3×
H263decMP3dec14142192194.423.440.983.623.120.530.640.510.570.565.5×
MMS4048325325417.33254.15281.47487.79360.197.447.839.097.487.9645.2×
Robot88131-617t.o.t.o.t.o.t.o.-146.59151.49138.60138.43143.78-
Sparse9667-24010,366.5216,721.7015,150.0238,713.8921,934.04-
RS-32_28_8_enc262348-17042,524.372,608.052,456.072,600.792,547.32-
Average931.1×
Table 4. Minimum Application Schedule Length and Simulation Runtime of ILP and SMT on 2D SMART NoC
Min. = minimum application schedule length. Spd.Up = average runtime speed up ratio (ref. = ILP), t.o. = optimization not completed within 12 hours.
Table 5.
TestCaseTasksEdgesSolutionSimulation Runtime (s)
ILPSMTILPSMTSpd.Up
4 × 4 × 48 × 8 × 4Avg.4 × 4 × 48 × 8 × 4Avg.
tgff11091731730.161.500.830.100.110.117.9×
tgff22224240240150.0376.24113.141.231.851.5473.5×
tgff331342482485,874.832,260.494,067.664.184.784.48908×
tgff4414333433412,673.5811,948.7712,311.1810.4911.2210.861134.1×
tgff55162-365t.o.t.o.-32.8030.4331.62-
tgff20201245-764t.o.t.o.-1,077.301,060.851,069.08-
tgff35351435-854t.o.t.o.-5,229.474,872.555,051.01-
tgff50501606-903t.o.t.o.-36,580.0020,005.6128,292.81-
tgff3_a303629429428,641.7735,916.1532,278.966.618.157.384373.8×
tgff4_a40473663666,493.939,944.428,219.1811.0112.2611.64706.4×
tgff5_a5162-449t.o.t.o.-32.2033.4132.81-
tgff10_a102118-383t.o.t.o.-112.70123.90118.30-
MWD121228128114.8721.7418.310.740.950.8521.7×
H263encMP3dec12122352350.951.261.110.230.290.264.3×
MP3encMP3dec13132512511.490.961.230.220.240.235.3×
H263decMP3dec14142192191.284.422.850.580.840.71
MMS4048325325851.10822.47836.798.129.338.7395.9×
Robot88131-617t.o.t.o.-167.08163.21165.15-
Sparse9667-228t.o.t.o.-167.43120.81144.12-
RS-32_28_8_enc262348-1703t.o.t.o.-2,510.113,396.162,953.14-
Average1237.1×
Table 5. Minimum Application Schedule Length and Simulation Runtime of ILP and SMT on 3D SMART NoC
Min. = minimum application schedule length. Spd.Up = average runtime speed up ratio (ref. = ILP), t.o. = optimization not completed within 12 hours.

4.2 ILP vs. SMT for 2D/3D SMART NoC

Fig. 8.
Fig. 8. Comparison of the formulation complexity and runtime scalability (8 × 8 mesh) between ILP and SMT for 2D SMART NoC.
Fig. 9.
Fig. 9. Comparison of the formulation complexity and runtime scalability (8 × 8 × 4 mesh) between ILP and SMT for 3D SMART NoC.

4.2.1 Formulation Complexity Analysis.

Table 2 presents the formulation complexity of ILP/SMT frameworks for 2D/3D SMART NoCs. The number of variables and constraints is significantly related to the number of tasks and edges . The main variables of our framework consist of -coordinates, release/completion time of each task/data transmission, and overlap indicators in time/space. The coordinate and time variables are proportional to and while the overlap indicators are related to the number of combinations between edges (i.e., ). The conditional constraints which describe the overlap between pairs of tasks and edges are proportional to the number of combinations of tasks and edges. ILP requires auxiliary variables and the corresponding constraints for conditional constraints. In particular, as the dimension of the structure increases from 2D to 3D, the number of auxiliary variables and constraints in ILP also shows an increment because of the more complicated conditional constraints for determining the overlap in tasks, data transmissions, whereas the SMT keeps the same formulation complexity. The number of additional variables and constraints is represented as the multiplication of the number of corresponding SMT constraints. Note that the final estimated complexity of ILP is further reduced by around 50% across the benchmarks because we remove the duplicated literals in conditional constraints.

4.2.2 Evaluation-Formulation Complexity.

Table 3 presents the comparison of formulation complexity between ILP and SMT for 2D and 3D structures. Compared to ILP, SMT has 16.6× and 12.1× smaller number of variables and constraints on average for 2D, respectively. For 3D, SMT respectively shows 26.2× and 18.7× smaller number of variables and constraints. The difference is mainly due to the auxiliary variables which occupy 92% to 95% of total variables and the corresponding constraints of ILP. Figures 8(a) and 8(b) and Figures 9(a) and 9(b) visualize the estimated and measured complexity of ILP and SMT for in 2D and 3D structures, respectively. Note that we assume the same and for the estimation. As and increase, the estimated number of variables and constraints in ILP has respectively saturated to 17.1 ×/ 11.9 × and 25.3 ×/ 16.9 × larger values than those of SMT for 2D and 3D.
Fig. 10.
Fig. 10. Example task mapping and scheduling of tgff2 case in Table 4 for 4 × 4 mesh. (a) SMT, (b) ILP.

4.2.3 Evaluation-Solutions and Runtime.

Tables 4 and 5 present the comparison of the minimum application schedule length and runtime between ILP and SMT with 12 hours of the time-limit for 2D and 3D SMART NoC, respectively. We observe that the application schedule length of both ILP and SMT are the same (i.e., equal application performance) for all cases regardless of mesh sizes. Figure 10 illustrates examples of detailed task mapping and scheduling solutions provided by ILP and SMT. The detailed solution includes information on the PE assignment and a designated release time for each task. Though the detailed composition of task mapping and scheduling of these solutions can be different from each other due to the possible existence of several feasible solutions, both ILP and SMT provided the same minimum application schedule length of 240. The runtime trend tends to increase as the size of the mesh and the number of in/out-degree increases. For the cases that the ILP can provide an optimal solution, SMT solves problems with 931.1× (ranges from 2.2× to 1532.1×) and 1237.1× (ranges from 4× to 4373.8×) faster runtime on average than ILP for 2D and 3D NoCs, respectively. Figure 8(c) and Figure 9(c), which respectively visualize the comparison of the scalability for 8 × 8 and 8 × 8 × 4 meshes, show that SMT achieves 10× higher scalability up to 500 tasks than that of ILP up to 50 tasks within 12 hours. For the largest case in ILP (i.e., tgff5_a with 8 × 8 mesh and tgff4_a, MMS with 8 × 8 × 4 mesh), SMT provides the solution up to 1532.1× faster than ILP.
Table 6.
TestCase#Tasks#EdgesMinimum Application Schedule Length ()Simulation Runtime (s)
2D (XY Only)2D (mixed)3D (XYZ Only)3D (mixed)2D (XY Only)2D (mixed)3D (XYZ Only)3D (mixed)
8 × 816 × 168 × 816 × 164 × 4 × 48 × 8 × 44 × 4 × 48 × 8 × 48 × 816 × 168 × 816 × 164 × 4 × 48 × 8 × 44 × 4 × 48 × 8 × 4
tgff5516236536536536536536536536518.3537.0630.2231.0632.8030.4359.9759.60
tgff10101124664664664664664664664664206.32261.02212.92246.56225.21208.07444.76523.50
tgff20201245764764764764764764764764842.351,302.411,260.401,294.331,077.281,060.851,978.022,140.43
tgff303013697707707707707707707707707,339.497,106.806,910.837,057.296,212.256,819.457,625.037,345.88
tgff4040149581881881881881881881881815,961.3514,131.3530,817.4122,887.2721,563.2416,004.1433,160.4522,708.54
tgff50501606903903-903903903-90342,755.0924,422.54t.o.34,112.0536,579.9820,005.61t.o.23,687.59
tgff5_a516244944944044044944944044021.3543.9528.0923.0232.1933.4149.5261.47
tgff10_a102118383383383383383383383383145.88159.12129.38137.02112.74123.90195.40224.38
tgff20_a2032444924924924924924924924921,025.821,655.521,260.401,319.781,206.361,193.702,301.112,275.93
tgff30_a301359528528-52852852852852835,734.009,799.09t.o.5,982.259,876.865,628.8813,287.177,407.61
MWD12122812812812812812812812810.791.931.252.120.740.953.203.42
H263encMP3dec12122352352352352352352352350.350.550.700.580.230.290.830.96
MP3encMP3dec13132512512512512512512512510.240.580.330.330.220.240.460.90
H263decMP3dec14142192192192192192192192190.570.991.020.790.580.840.681.42
MMS40483253253253253253253253257.4817.3615.5514.618.129.3329.3937.07
Robot88131617617615615617617615615138.43273.21133.66193.48167.08163.21215.62248.07
Sparse9667240---22822822222238,713.89t.o.t.o.t.o.167.43120.811,338.582,047.77
RS-32_28_8_enc262348170417041704170417041704170417042,600.793,734.782,672.554,862.262,510.113,396.162,957.084,464.72
Average556.0574.6573.9573.9555.3555.3533.9554.48,084.033,702.846,332.974,597.934,431.863,044.463,743.964,068.85
Table 6. Minimum Application Schedule Length and Simulation Runtime on 2D/3D SMART NoC
t.o. = optimization not completed within 12 hours.

4.3 2D vs. 3D SMART NoC

In Section 4.2, we have demonstrated the superior scalability of SMT over ILP for 2D and 3D SMART NoC structures. Therefore, we use only the SMT framework for implementing a mixed dimension-order routing in the following experiments because the scalability of ILP is expected to be decreased for the mixed routing which involves more complicated conditional constraints as described in Section 3.3. 8 × 8/16 × 16 and 4 × 4 × 4/8 × 8 × 4 mesh topologies are used for 2D and 3D structures, respectively.
Table 6 presents the comparison of minimum application schedule length and simulation runtime of 2D and 3D SMART NoCs. Unlike the hop-by-hop transmission-based regular NoCs, of SMART NoC is not affected by the reduced average number of hops by the extension to the 3D routing structure. Therefore, most of cases in Table 6 have the same for 2D XY-only and 3D XYZ-only routing NoCs except for the random sparse matrix solver (i.e., “Sparse”) case. Figure 11 depicts a task mapping solution of “Sparse” case in 4 × 4 × 4 3D mesh. The arrow lines in the black and red colors illustrate incoming data transmissions from six PEs (i.e., P3, P17, P20, P22, P28, and P45) to P21 at a certain clock cycle in the detailed scheduling solution. The black and red arrow lines respectively indicate the transmissions on the same and different XY-planes. From this solution, we observe that the extended incoming paths up to six (i.e., four from the same and two from the different XY planes) on each PE by the extension of 2D to 3D routing paths enables further reduction of the minimum from 240 to 228.
The simulation runtime of 3D XYZ-only routing shows a reduced trend compared to 2D XY-only routing as depicted in Figure 12(a) and Figure 13(a). We observe that the “tgff30_a” and “Sparse” cases require extraordinary simulation runtime to find an optimal solution in 8 × 8 2D topology while the runtime of their 3D counterparts follows the normal trend. Note that the TGs of these two cases have many more parallel task routing paths compared to the other randomly generated or real application TGs as shown in Figure 14. This high-level task parallelism can cause a larger fluctuation of runtime due to the existence of multiple combinations of symmetric paths with the same application schedule length as well as the nature of the exact method to solve NP-hard problems [21].
Fig. 11.
Fig. 11. Example of the extension of routing paths from 2D to 3D routing (Sparse case in Table 4, ).
Fig. 12.
Fig. 12. Runtime scalability (2D@8 × 8 mesh, 3D@4 × 4 × 4 mesh). (a) 2D (XY-only) vs. 3D (XYZ-only), (b) 3D (XYZ-only) vs 3D (mixed).
Fig. 13.
Fig. 13. Runtime scalability (2D@16 × 16 mesh, 3D@8 × 8 × 4 mesh). (a) 2D (XY-only) vs. 3D (XYZ-only), (b) 3D (XYZ-only) vs 3D (mixed).
Fig. 14.
Fig. 14. Task graph of (a) tgff30, (b) tgff30_a, (c) Robot, and (d) Sparse cases in Table 4 [8, 24].

4.4 Mixed Dimension-Order Routing

In this section, we explore the impact of the mixed dimension-order routing in both 2D/3D topologies presented in Table 6. Three cases (i.e., “tgff5_a”, “Robot”, and “Sparse”) show the reduction in the minimum application schedule length due to the diversified routing paths of mixed dimension-order routing. Figure 15 illustrates the example of path diversification in the “Sparse” case with 4 × 4 × 4 3D mesh. The arrow lines in the black and red colors illustrate incoming data transmissions from 5 PEs (i.e., P9, P20, P25, P33, and P55) to P21 at a certain clock cycle in the detailed scheduling solution. The red arrow lines indicate two data transmissions with different routing paths between the same source/destination PEs. From this solution, we observe that the diversified paths enable further reduction of the minimum from 228 to 222.
The overall runtime scalabilities of the 2D/3D mixed routing show decreased trends compared to the 2D/3D single routing as depicted in Figures 12(b) and 12(c) and Figures 13(b) and 13(c). The additional variables for the routing path type indicator, the more complicated conditional constraints for detecting the link overlap in space, and the increased search-space due to the diversified routing paths have contributed to the increment in the simulation runtime. However, despite the slightly diminished scalability, the results demonstrate that our framework still successfully finds optimal task mapping and scheduling solutions up to 500 tasks within 12 hours.
Fig. 15.
Fig. 15. Example of path diversification from 3D XYZ-only to mixed routing (Sparse case in Table 4, ).

5 Related Work

The problem of task mapping and scheduling has been extensively studied in the literature [4, 12, 18, 19, 22, 28, 29]. Traditional approaches to task mapping [4, 12, 22, 28, 29] do not consider SMART NoCs where contention-free routing is required to fully realized the benefits of single-cycle multi-hop traversal. An optimal algorithm based on an integer linear programming (ILP) formulation was proposed in [27] for the 2D SMART NoC case. Although their ILP formulation can find optimal solutions, the runtimes are prohibitive for large problem instances. This is in part due to ILP’s inability to express conditional constraints directly. On the other hand, our SMT-based formulation enables us to leverage SMT’s ability to expresss conditional constraints succinctly, which enables us to derive a much more compact formulation and harness the logical reasoning power of SMT solvers. SMT’s ability to capture conditional constraints also enables us to easily consider the 3D SMART NoC case and mixed dimension-order routing in our formulations, which were not considered in [27]. A polynomial-time heuristic algorithm was also proposed in [27] for 2D SMART NoCs. Although their heuristic algorithm often achieves good results, exploration of optimal solutions still plays an essential role in calibrating and evaluating heuristic approaches for more advanced and complicated system configurations. Recently, SMT-based scheduling optimization frameworks for 2D NoCs have been proposed [16, 26] to overcome the limited expressiveness of ILP for the conditional constraints that constitute a large proportion of the task mapping and scheduling problem, but these works did not consider mixed dimension-order routing nor the 3D SMART NoC case.

6 Conclusion

In this article, we develop an SMT-based task mapping and scheduling framework that guarantees contention-free data transmissions to achieve the optimal latency for 2D/3D SMART NoCs. Also, we develop link overlap detection constraints for the mixed dimension-order routing. We have reduced the formulation complexity by utilizing SMT’s efficient modeling capability for the conditional constraints and also improved the scalability by introducing efficient search-space reduction techniques. We demonstrated that our SMT framework achieves 10× higher scalability than ILP, solving the problem within 12 hours up to 500 tasks for 2D and the 3D extension. Also, the 2D and 3D extensions of our SMT framework with the mixed dimension-order routing maintain the improved scalability with the diversified routing paths, resulting in the reduced latency through various application benchmarks. Lastly, we find that there are still rooms to further improve, e.g., the static task execution and data transmission time calls future research topics to accommodate the variability of real systems.

Acknowledgments

The authors thank reviewers for providing valuable comments.

Footnotes

1
A SMART router consists of five input/output ports (i.e., West, East, North, South, and Core) for connections on a mesh topology.
2
In this work, we estimate the maximum upper-bound of each test case as the sum of all task execution time divided by the minimum number of PEs in our experiments (i.e., 16 for 4 × 4 mesh) assuming full PE resource utilization with direct data transmissions.

References

[1]
Yashar Asgarieh and Bill Lin. 2019. Smart-Hop arbitration request propagation: Avoiding quadratic arbitration complexity and false negatives in SMART NoCs. ACM Transactions on Design Automation of Electronic Systems 24, 64 (2019), 64:1–64:25. https://doi.org/10.1145/3356235
[2]
Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. Z-an optimizing SMT solver. In Proceedings of TACAS. 194–199.https://link.springer.com/chapter/10.1007/978-3-662-46681-0_14.
[3]
Chia-Hsin Owen Chen, Sunghyun Park, Tushar Krishna, Suvinay Subramanian, Anantha P. Chandrakasan, and Li-Shiuan Peh. 2013. SMART: A single-cycle reconfigurable NoC for SoC applications. In Proceedings of DATE. 1–6. https://doi.org/10.7873/DATE.2013.080
[4]
G. Chen, F. Li, S.W. Son, and M. Kandemir. 2008. Application mapping for chip multiprocessors. In Proceedings of DAC. 620–625. https://doi.org/10.1145/1391469.1391628
[5]
Xianmin Chen and Niraj K. Jha.2016. Reducing wire and energy overheads of the SMART NoC using a setup request network. IEEE Transactions on Very Large Scale Integration 24, 10 (2016), 3013–3026. https://doi.org/10.1109/TVLSI.2016.2538284
[6]
William James Dally and Brian Patrick Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, Inc.
[7]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of TACAS. 337–340. https://dl.acm.org/doi/10.5555/1792734.1792766.
[8]
R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of CODES. 1–6. https://doi.org/10.1109/HSC.1998.666245
[9]
Brett Stanley Feero and Partha Pratim Pande. 2008. Networks-on-Chip in a three-dimensional environment: A performance evaluation. Transactions on Computers 58, 1 (2008), 32–45. https://doi.org/10.1109/TC.2008.142
[10]
LLC Gurobi Optimization. 2021. Gurobi Optimizer Reference Manual. Retrieved 2021 from http://www.gurobi.com.
[11]
Jingcao Hu and R. Marculescu. 2003. Energy-aware mapping for tile-based NoC architectures under performance constraints. In Proceedings of ASP-DAC. 1–6. https://dl.acm.org/doi/10.1145/1119772.1119818.
[12]
Jingcao Hu and R. Marculescu. 2004. Energy-aware communication and task scheduling for Network-on-Chip architectures under real-time constraints. In Proceedings of DATE. 234–239. https://doi.org/10.1109/DATE.2004.1268854
[13]
Biresh Kumar Joardar, Karthi Duraisamy, and Partha Pratim Pande. 2018. High performance collective communication-aware 3D Network-on-Chip architectures. In Proceedings of DATE. 1351–1356. https://doi.org/10.23919/DATE.2018.8342223
[14]
Tushar Krishna, Chia-Hsin Owen Chen, Woo Cheol Kwon, and Li-Shiuan Peh. 2013. Breaking the on-chip latency barrier using SMART. In Proceedings of HPCA. 378–389. https://doi.org/10.1109/HPCA.2013.6522334
[15]
Tushar Krishna, Chia-Hsin Owen Chen, Woo Cheol Kwon, and Li-Shiuan Peh. 2014. SMART: Single-cycle multihop traversals over a shared Network on Chip. IEEE Micro 34, 3 (2014), 43–56. https://doi.org/10.1109/MM.2014.48
[16]
Daeyeal Lee, Bill Lin, and Chung-Kuan Cheng. 2021. SMT-based contention-free task mapping and scheduling on SMART NoC. Embedded Systems Letters 1, 1 (2021), 1–4. https://doi.org/10.1109/LES.2021.3049774
[17]
Dongjin Lee, Das Sourav, Janardhan Rao Doppa, Partha Pratim Pande, and Krishnendu Chakrabarty. 2018. Performance and thermal tradeoffs for energy-efficient monolithic 3D Network-on-Chip. ACM Transactions on Design Automation of Electronic Systems 23, 5 (2018), 1–25. https://dl.acm.org/doi/10.1145/3223046
[18]
Edward Ashford Lee and David G. Messerschmitt. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 100, 1 (1987), 24–35. https://doi.org/10.1109/TC.1987.5009446
[19]
Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow. Proc. IEEE 75, 9 (1987), 1235–1245. https://doi.org/10.1109/PROC.1987.13876
[20]
Weichen Liu, Jiang Xu, Xiaowen Wu, Yaoyao Ye, Xuan Wang, Wei Zhang, Mahdi Nikdast, and Zhehui Wang. 2011. A NoC traffic suite based on real applications. In Proceedings of ISVLSI. 66–71. https://doi.org/10.1109/ISVLSI.2011.49
[21]
Andrea Lodi and Andrea Tramontani. 2014. Performance variability in mixed-integer programming. In TutORials in Operations Research. 1–12. https://doi.org/10.1287/educ.2013.0112
[22]
S. Murali and G. De Micheli. 2004. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of DATE. 896–901. https://doi.org/10.1109/DATE.2004.1269002
[23]
Daeho Seo, Akif Ali, Won-Taek Lim, and Nauman Rafique. 2005. Near-optimal worst-case throughput routing for two-dimensional mesh networks. In Proceedings of ISCA. 1–12. https://doi.org/10.1109/ISCA.2005.37
[24]
Takao Tobita and Hironori Kasahara. 2002. A standard task graph set for fair evaluation of multiprocessor scheduling algorithms. Journal of Scheduling 5, 5 (2002), 379–394. https://doi.org/10.1002/jos.116
[25]
Geert Van der Plas, Paresh Limaye, Igor Loi, Abdelkarim Mercha, Herman Oprins, Cristina Torregiani, Steven Thijs, Dimitri Linten, Michele Stucchi, Guruprasad Katti, Dimitrios Velenis, Vladimir Cherman, Bart Vandevelde, Veerle Simons, Ingrid De Wolf, Riet Labie, Dan Perry, Stephane Bronckers, Nikolaos Minas, Miro Cupac, Wouter Ruythooren, Jan Van Olmen, Alain Phommahaxay, Muriel de Potter de ten Broeck, Ann Opdebeeck, Michal Rakowski, Bart De Wachter, Morin Dehan, Marc Nelis, Rahul Agarwal, Antonio Pullini, Federico Angiolini, Luca Benini, Wim Dehaene, Youssef Travaly, Eric Beyne, and Paul Marchal. 2010. Design issues and considerations for low-cost 3-D TSV IC technology. IEEE Journal of Solid-State Circuits 46, 1 (2010), 293–07. https://doi.org/10.1109/JSSC.2010.2074070
[26]
Rongjie Yan, Anyu Cai, Hongyu Gao, Feifei Ma, and Jun Yan. 2019. SMT-based multi-objective optimization for scheduling of MPSoC applications. In Proceedings of TASE. 160–167. https://doi.org/10.1109/TASE.2019.000-5
[27]
Lei Yang, Weichen Liu, Peng Chen, Nan Guan, and Mengquan Li. 2017. Task mapping on SMART NoC: Contention matters, not the distance. In Proceedings of DAC. 1–6. https://doi.org/10.1145/3061639.3062323
[28]
Lei Yang, Weichen Liu, Nan Guan, and Nikil Dutt. 2019. Optimal application mapping and scheduling for Network-on-Chips with computation in STT-RAM based router. IEEE Trans. Comput. 68, 8 (2019), 1174–1189. https://doi.org/10.1109/TC.2018.2864749
[29]
Heng Yu, Yajun Ha, and Bharadwaj Veeravalli. 2010. Communication-aware application mapping and scheduling for NoC-based MPSoCs. In Proceedings of ISCAS. 3232–3235. https://doi.org/10.1109/ISCAS.2010.5537920

Cited By

View all
  • (2024)A survey on mapping and scheduling techniques for 3D Network-on-chipJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103064147:COnline publication date: 17-Apr-2024
  • (2023)DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-ChipACM Transactions on Architecture and Code Optimization10.1145/363152721:1(1-24)Online publication date: 15-Dec-2023

Index Terms

  1. SMT-Based Contention-Free Task Mapping and Scheduling on 2D/3D SMART NoC with Mixed Dimension-Order Routing

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 19, Issue 1
      March 2022
      373 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3492449
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 December 2021
      Accepted: 01 September 2021
      Revised: 01 August 2021
      Received: 01 March 2021
      Published in TACO Volume 19, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. SMT
      2. SMART NoC
      3. task mapping
      4. scheduling

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • NSF

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)383
      • Downloads (Last 6 weeks)39
      Reflects downloads up to 16 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A survey on mapping and scheduling techniques for 3D Network-on-chipJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2024.103064147:COnline publication date: 17-Apr-2024
      • (2023)DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-ChipACM Transactions on Architecture and Code Optimization10.1145/363152721:1(1-24)Online publication date: 15-Dec-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media