3 - SoC & NoC
3 - SoC & NoC
3 - SoC & NoC
System-on-Chip (SoC)
&
Network-on-Chip (NoC)
Outline
❑ What is SoC?
❑ SoC Challenges, Application
❑ SoC Design Flow
❑ What is NoC?
❑ NoC Topologies
❑ NoC Design Flow
1
Modern Car
Safety Chassis
Introduction
❑ System on Chip (SoC) is basically a circuit embedded on a small coin-sized
chip & integrated with a microcontroller or microprocessor. The design of SoC
usually includes CPU, memory, I/O ports, secondary storage devices, &
peripheral interfaces such as I2C, SPI, UART, CAN, Timers, etc. depending
upon the requirement such as digital or analog signal processing system or
floating-point unit
❑ SoC connects CPU, hard disk, RAM, ROM, USB connectivity, & other
secondary storage devices with circuit embedded on the chip whereas the
motherboard does this using expansion cards
2
Types of SoC
❑ Few types of SoC design:
– SoC used in microprocessors
– Built for microcontrollers
– SoC designed for a special dedicated application & cannot be used for any
other work
– Programmable SoC, not all but few components of such SoC are
programmable
❑ Advantages of SoC:
– Reduction in overall system design as compare to motherboard designs
– Compact & small size chips even the size of a fingertip
– Better efficiency & performance
– Less power consumption
– Less time to market
Introduction
❑ SoC is an IC that implements most or all function of complete electronic system
❑ SoC is IC that integrates software & hardware Intellectual Property (IP) using
more than one design methodology for the purpose of defining functionality &
behavior of proposed system
❑ SoC technology put max. amount of technology into smallest possible space
❑ SoC bridges the gap b/w software & hardware
❑ Four vital areas of SoC:
❑ Higher levels of abstraction
❑ IP and platform re-use
❑ IP creation – ASIPs, interconnect and algorithm
❑ Earlier software development and integration
3
Three forms of SoC Design
❑ ASIC Vendor Design:
– Design in which all components in chip are designed as well as fabricated
by ASIC vendor
❑ Integrated Design:
– Design by ASIC vendor in which all components are not designed by that
vendor. It implies the use of cores obtained from some other source such as
a core/IP vendor or a foundry
❑ Desktop Design:
– Design by fabless company that uses cores which for most part obtained
from other source such as IP companies, EDA companies, design services
companies, or foundry
On-Chip bus
Off-Chip bus 128/512 bits
32-bits
Drivers
Drivers
ASIC DRAM
I/O
I/O
ASIC DRAM
SoC
4
SoC Design Challenges
Verification Definition Library Issues
32 Bit Memory
ARM uP
Arbiter, Bridge,
DMA, Address SRAM
ARM9 Decoder, SRAM IP Integration
HW/SW Microcontroller
I/F
co-verification
Communication
Application Interfaces Interfaces
Application (USB, Enet, Smartcard)
Logic with Logic
DSP pre-
processing Standard Peripherals Standard
Analog sensor (Watchdog, Timers, Peripherals
UART, etc)
Performance Productivity
300MHz to multi-GHz… Gates/designer-day,
Design reuse
Density
Metal layer tradeoff, Quality / Reliability
library selection, DPM, FIT, SER,
optimization Hot Carrier Effect
Complexity
2M to 5M to 10M gates…
10
10
5
Migration from ASICs to SoCs
❑ Application Specific Integrated Circuits (ASIC): is logic chips designed by end
customers to perform a specific function for a desired application
❑ ASIC vendors supply libraries for each technology they provide. These libraries
contain pre-designed & pre-verified logic circuits
❑ ASIC technologies are:
– Full-Custom, Semi Custom, Platform based, Gate array, Standard Cell
11
11
12
12
6
SoC Benefits
❑ Typical approach:
– Define requirements SoC Consists of
– Design with off-the shelf chips
❑ Digital ❑ Analog
– at 0.5 year mark : first prototypes
– 1 year : ship with low margins/loss – Control ❑ RF
– Start ASIC integration – µP ❑ Power
– 2 years : ASIC-based prototypes – DSP ❑ MEMs
– 2.5 years : ship, make profits – Interfaces
❑ IP
❑ With SoC ❑ Memory
– Define requirements – SRAM
– Design with off-the shelf cores – DRAM
– at 0.5 year mark : first prototypes – FLASH
– 1 year : ship with high margin and market share
mem Proc IP cores
mem
Ip- USB
CPU Sec hub USB
CPU DSP
hub
13
14
14
7
SoC Architecture
❑ CPU, Direct Memory Access (DMA), DSP engine all share same bus
❑ Lot of control wires b/w blocks & peripheral buses b/w subsystems there is
interdependency b/w blocks & lot of wires in chip
❑ Intelligent system integration is used on-chip interconnect that unifies all traffic
into single entity
15
15
SoC Architecture
❑ Example: Basic WiseNET
❑ Architecture includes: Ultralow-power dual-band radio transceiver (Tx & Rx)
❑ Sensor interface with signal conditioner & two A/D converters (ANA_FE)
❑ Digital control unit based on Cool-RISC microcontroller (μC) with on-chip low-
leakage memory, several time basis & digital interfaces
❑ Power management block (POW)
16
16
8
SoC Architecture
❑ Example: Typical gateway VoIP (Voice over Internet Protocol) system-on-chip
❑ Gateway VoIP SoC is a device used for functions such as vocoders, echo
cancellation, data/fax modems, and VoIP protocols
17
17
SoC Example
❑ Beaglebone black development board
– This board contains a AM335x microprocessor based SoC. This block
diagram shows the all internal components of AM335x SoC. This SoC
contains all peripherals inside single chip along with Cortex A8 processor
18
18
9
SoC Architecture
❑ Architecture means operational structure of users view of system with time it
evolves functional specification & hardware implementation.
❑ System architecture defines system-level building blocks such as processors
and memories interconnection between them
❑ Processor architecture determines processor’s IS, associated programming
model, its detailed implementation, includes hidden registers, branch prediction
circuits and specification details concerning ALU.
❑ Implementation of a processor is called microarchitecture
19
19
19
20
20
10
SoC Processor Execution Sequence
❑ SoC processors directly implemented sequential execution model. Next
instruction is not processed until all execution for current instruction is complete
and results have been committed:
❑ 1. Fetch instruction into instruction register (IF)
❑ 2. Decode the opcode of instruction (ID)
❑ 3. Generate the address in memory of any data item residing there (AG)
❑ Fetch data operand into executable registers (DF)
❑ Executing specified operation (EX)
❑ Write back the result to register file (WB)
21
22
22
11
SoC Superscalar processor execution
❑ All instructions are independent to execute on parallel basis
❑ Dynamic pipelined processors remain limited to execute a single operation per
cycle by virtue of their scalar nature. This limitation can be avoided with addition
of multiple functional units and dynamic scheduler to process more than one
instruction per cycle
23
24
12
SoC SIMD Processors
❑ Single Instruction Stream Multiple Data Stream (SIMD) class of processor
includes both array and vector processors
❑ SIMD processor is a natural response to the use of certain regular data
structures such as vectors & matrices
❑ Assembly based programming SIMD architecture is similar to programming
processor except some operations perform computations on aggregate data
❑ Array processor consists of many interconnected processor elements, each
having their own local memory space
❑ Vector processing consists of a single processor that references a global
memory space and has special function units that operate on vectors
25
SoC Multiprocessors
❑ Multiprocessors can cooperatively execute to solve a single problem by using
some form of interconnection for sharing results
❑ Each processor executes completely independently however, most applications
requires some form of synchronization during execution to pass information and
data b/w processors
❑ It share memory and execute separate program tasks (MIMD: multiple
instruction stream, multiple data stream), their proper implementation is
significantly more complex than array processor
26
26
13
SoC Memory & Addressing
❑ SoC applications vary with memory requirements. It can be simple with only
ROM & RAM or can be support to elaborate operating system requiring large
off-chip memory with a memory management unit and cache hierarchy
27
27
28
28
14
Memory for SoC Operating System
❑ Critical decision in SoC about selection of Operating system & memory
management functionality such as virtual memory
❑ If system is restrict to real memory with MBs, system can be implemented on
true system on SoC (all memory on-die)
❑ Virtual memory is often slower and more expensive requires complex memory
management unit
❑ Users has limited ways of creating new processes & expanding application
base of system
29
Virtual Memory
❑ Virtual memory, or virtual storage provides an "idealized abstraction of storage
resources that are actually available on a given machine which "creates illusion
to users of a very large (main) memory
❑ The operating system using a combination of hardware & software,
maps memory addresses used by program, called virtual addresses,
into physical addresses in computer memory.
❑ Operating system manages virtual address spaces & assignment of real
memory to virtual memory. Address translation hardware in CPU, often referred
to as memory management unit (MMU), automatically translates virtual
addresses to physical addresses
30
15
SoC Memory Management Unit
❑ A memory management unit (MMU), or paged memory management
unit (PMMU) has all memory references passed through itself, primarily
performing the translation of virtual memory addresses to physical addresses
❑ MMU effectively performs virtual memory management, handling at same
time memory protection, cache control, bus arbitration and, in simpler computer
architectures (especially 8-bit systems), bank switching
❑ Modern MMUs typically divide virtual address space (range of addresses used
by processor) into pages, each having a size which is a power of 2, usually a
few KB, but they may be much larger. The bottom bits of address (offset within
a page) are left unchanged. Upper address bits are virtual page number
31
31
Examples
32
32
16
Core-based Design
❑ Cores are pre-designed, verified but untested functional blocks
❑ Core is the intellectual property of vendor
– Internal details not available to user
❑ Core-vendor supplied tests must be applied to embedded cores
33
33
34
34
17
Types of Intellectual Property
❑ Firm IP cores: (Gate-Level Netlist)
– Technology-dependent gate-level netlist that meets timing constraints
– Layout is flexible
– Best of both cores balance high performance & optimization of hard IPs with
flexibility of soft IPs
– Performance & area are more predictable
– May be encrypted to protect IP
– Technology dependent delivered in targeted netlists to specific physical
libraries after going through synthesis without performing the physical layout
– Internal implementation of core cannot be modified
– User can parameterize I/O to remove unwanted functionality
35
35
36
18
Design for Reuse
❑ Reusing macros/cores/IP that have already been designed & verified helps to
address all of the problems above
❑ To overcome the design gap, design reuse - use of pre-designed & pre-verified
cores, or reuse of existing designs becomes vital concept in design
methodology
❑ Effective block-based design methodology requires extensive library of
reusable blocks, or macros based on following principles:
– Macro must be extremely easy to integrate into overall chip design
– Macro must be so robust that integrator has to perform essentially no
functional verification of internals of the macro
37
37
38
38
19
Design for Reuse
❑ Verified independently of the chip in which it will be used:
– Often, macros are designed & only partially tested before being integrated
into a chip for verification. Reusable designs must have full, stand-alone
test benches & verification suites that afford high levels of test coverage
❑ Verified to a high level of confidence:
– Very rigorous verification as well as building a physical prototype that is
tested in actual system running real software
❑ Fully documented in terms of appropriate applications & restrictions:
– In particular, valid configurations & parameter values must be documented.
Any restrictions on configurations or parameter values must be clearly
stated. Interfacing requirements and restrictions on how the macro can be
used must be documented
39
39
40
40
20
Virtual Components (VC) for Reuse
41
41
What is MPSoC?
❑ Multiprocessor System-on-Chip (MPSoC) is SoC that uses multiple instruction-
set processors (CPUs) usually targeted for embedded applications
❑ Used by platforms that contain multiple, usually heterogeneous, processing
elements with specific functionalities reflecting the need of expected
application domain, a memory hierarchy & I/O components
❑ MPSoCs often require large amounts of memory
42
42
21
Homo/Heterogenous SoC
CPU CPU CPU CPU CPU DSP MEM
MEM MEM MEM MEM
Interconnection network (BUS)
43
43
44
44
22
Hardware-Software Codesign
❑ SoC design process is a hardware-software codesign in which design
productivity is achived by design reuse
– Design exploration at behavioral level (C, C++, etc.) by system architects
– Creation of Architecture Specification
– RTL Implementation (Verilog/VHDL) by hardware designers
❑ Drawbacks
– Specification Errors - susceptible to late detection
– Correlating validations at behavioral & RTL difficult
– Common interface b/w system & hw designers based on natural language
Generic C/C++/Java SW
Applications
µC specific Assembly
Compiler Opcode
Assembler
I/O
45
Design Methodologies
46
46
23
Time Driven Design (TDD) Methodology
❑ Best for moderately sized & complex ASICs
– Consists primarily of new logic on DSM processes
– Doesn’t have a significant utilization of hierarchical design
❑ Problems:
– Looping between synthesis & placement
– Long turnaround times for each loop
– Unanticipated chip-size growth late in design process
– Repeated area, power, & timing, optimizations
– Late creations of adequate manufacturing test vector
47
47
48
24
Platform-based Design (PDB) Methodology
❑ TDD + BDD + extensive & planned design reuse
❑ PBD separates design into two areas of focus:
1. Block authoring: Blocks are generated so they interface easily with
multiple target designs. Two new design concepts must be established:
– Interface standardization:
• Block authoring may be distributed among different design teams if
under same interface specifications & design methodology guidelines
– Virtual system design:
• To establish system constraints necessary for block design. For
example: Power profile, Test options, Hard, firm, or soft,…
2. System-chip integration: Focus on designing & verifying system architecture,
& interfaces b/w blocks. Starts with partitioning system around preexisting
block-level functions & identifying new or differentiating functions needed
• System-level partitioning along with performance analysis, HW/SW
design tradeoffs & functional verification
49
49
50
50
25
Platform-based Design (PDB) Methodology
51
51
Waterfall Design
❑ Project transitions from phase to phase in a step function, never returning to
activities of previous phase
❑ Worked well up to 100k gates & down 0.5um
❑ H/W & S/W developments are serialized
❑ High investment at each level
❑ Requirements must be well defined first
❑ Domain & solution to the problem are extremely stable
❑ Client must wait until the end to see any product
❑ Inherent to data administration, because data persists
❑ Product failure signals process failure, promoting fear
❑ Upstream investments save on downstream costs
❑ Little or no prototyping
52
52
26
Spiral Design
❑ Process for developing a system in steps & throughout the various steps,
feedback is obtained & incorporated back into the process
❑ As complexity increases, geometry shrinks & time-to-market pressures continue
to escalate, chip designers are moving from waterfall to spiral model
❑ Design team works on multiple aspects of design simultaneously, incrementally
improving in each area as design converges on completion
❑ Parallel concurrent H/W /S/W development
❑ Parallel verification & synthesis
❑ Develop modules only if not available
❑ Planned iteration throughout
❑ Smaller investment at each level, iteration
❑ Requirements & problem can shift
❑ Success criteria determined for each
iteration
❑ Client may see product sooner, product is
likely lower quality initially
❑ Inherent to world, because world is
influenced by design
❑ Product failure is just part of design
process
❑ Costs similar across board
❑ Prototype dependent 53
53
Spiral
Waterfall
54
54
27
Top-Down Vs Bottom-Up Design
❑ Top-Down Design:
– Choice of algorithm (optimization)
– Choice of architecture (optimization)
– Definition of functional modules
– Definition of design hierarchy
– Split up in small boxes or units (modules)
– Define required units ( adders, state machine, etc.)
– Map into chosen technology (synthesis, schematic, layout)(change
algorithms or architecture if speed or chip size problems)
– Behavioral simulation tools
❑ Bottom-Up Design:
– Build gates in given technology
– Build basic units using gates
– Build generic modules of use
– Put modules together
– Hope that you arrived at some reasonable architecture
– Gate level simulation tools
Old fashioned design methodology a la discrete logic
55
55
56
56
28
SoC Design Flow
Concept definition
Specification System specifications
Design
Improvements System simulation
Design
Measurements
Digital Spec
(Compl.& Perf.& Power)
Design Cycle
Modeling System simulation (II)
FPGA
HDL Simulation HDL Design
Implementation
Simulation &
Verification FPGA Synthesis FPGA prototype
57
SoC Flow
Specify
Generic SoC Generic
HW IP blocks Electronic system level design SW IP blocks
Architecture Partition
platform HW/SW
Application
Specific
Integrate SW IP Modules
Application Integrate SW
Application
Specific Specific IP Modules
Specific IP block Operating
HW IPs
into Arch. platform system
HW/SW Low level
Co-simulation SW design
HW design Functional SW simulation flow
flow simulation
58
58
29
SoC Flow
HW & SW (low level)
FPGA platform
Physical Application
design SW development
HW/SW Verification on
Prototype application prototype
or development board SW test
IC fabrication
Prototype
IC fabrication
59
Design Methodologies
Front-End ASIC Design Flow Back-End Design Flow or Generic Physical Flow
60
60
30
SoC Design Flow
61
61
62
62
31
EDA/CAD Tools
63
63
ρD : Defect density
64
64
32
SoC Processor Area
65
65
66
66
33
Baseline die floor plan
❑ Summary of area design rule:
1. Compute target chip size from target yield &
defect density
2. Compute die cost
3. Compute net available area. Allow 10-20%
for pins, guard ring, power etc.
4. Determine rbe size from minimum size
5. Allocate area based on trial system
architecture until base size is calculate
6. Allocate area for cache & storage
67
67
68
34
SoC Future
❑ Have 100s of hardware blocks
❑ Have billions of transistors
❑ Have multiple processors
❑ Have large wire-to-gate delay ratios
❑ Handle large amounts of high-speed data
❑ Need to support “plug-and-play” IP blocks
❑ Future NoC needs to be ready for these SoCs...
69
69
Networks-on-Chip
70
70
35
SoC Nightmare
System Bus
The
MPEG architecture is
I o o tightly coupled
C
Control Wires Peripheral Bus
71
71
Network
link
Network
router
Computing
module
Bus
72
72
36
Bus based On-Chip Communication
❑ Buses are collection of wires to which one or more components are
connected
❑ Master (or Initiator): IP component that initiates a read or write data
transfer
❑ Slave (or Target): IP component that does not initiate transfers & only
responds to incoming transfer requests
❑ Arbiter: Controls access to the shared bus. Uses arbitration scheme to
select master to grant access to bus
❑ Decoder: Determines which component a transfer is intended for
❑ Bridge: Connects two busses & acts as slave on one side & master on
other
73
73
Types of Buses
❑ Bus typically consists of three types of signal lines
❑ Address
– Carry address of destination for which transfer is initiated
– Can be shared or separate for read, write data
❑ Data
– Carry information b/w source & destination components
– Can be shared or separate for read, write data
– Choice of data width critical for application performance
❑ Control
– Requests and acknowledgements
– Specify more information about type of data transfer
• Byte enable, burst size, cacheable/bufferable, write-back/through, …
Address Bus
Data Bus
Control Bus
74
74
37
Bus Standards
❑ Bus Standards: Microprocessors have adopted number of bus
standards such as VME bus, Intel Multibus-II, ISA bus, PCI bus etc. to
connect together ICs on PCBs in System-on-board implementation
❑ Arbitration & Protocols: Bus master is the unit that initiate
communication on computer bus or I/O path. In SoC, bus master is
processor and slaves are I/O devices & memory components
❑ Bus master controls the bus paths using slave address, control signal &
flow of data signals b/w master & slaves such process called arbitration
❑ Bus protocol is an agreed set of rules for transmitting information b/w
two or more devices over a bus. Protocols determine:
❑ 1. Type & order of data being sent
❑ 2. How sending device indicates that it has finished sending information
❑ 3. Data compression method used
❑ 4. How receiving device acknowledge reception information
❑ 5. How arbitration is performed to resolve contention on bus & in what
priority & type of error checking to be used
75
75
Bus Standards
❑ Bus Bridge: Module that connects together two buses but not necessary
❑ Three Functions:
– 1. If two buses use different protocols, a bus bridge provides necessary
format & standard conversion
– A bridge is inserted b/w two buses to segment them & keep traffic contained
within segments. This improves concurrency: both buses operate same time
– A bridge often contains memory buffers & associated control circuits that
allow write posting
❑ Physical Bus Structure: depends on no. of wires, paths, cycle time etc. &
protocol
❑ Bus Varieties: Buses may be unified or split. Unified bus address is initially
transmitted in bus cycle followed by one or more data cycles. Split bus has
separate buses for each of these functions
76
76
38
SoC AMBA Buses
❑ Two commonly used SoC bus standards are Advanced Microcontroller
Bus Architecture (AMBA) bus developed by ARM & CoreConnect bus
by IBM. AMBA: Two buses are defined in AMBA specification:
❑ Advanced High-Performance Bus (AHB) is designed to connect
embedded processors, peripherals, Direct Memory Access (DMA)
controllers, on-chip memory & interfaces. It has high speed, high
bandwidth bus architecture, 32-bit data operation with extendable
1024-bits, concurrent multiple master/slave operations
❑ Advanced Peripheral Bus (APB) has lower performance than AHB bus
for slower peripheral modules. It is designed for low power & reduced
interface complexity
❑ Third bus Advanced System Bus (ASB) is designed for low
performance 16/32-bit uCs
77
77
CoreConnect-based SoC 78
78
39
SoC PLB/Split Transaction Buses
❑ PLB Bus: It is used for high-bandwidth, high performance & low-latency
interconnections b/w processors, memory & DMA controllers
❑ It is fully synchronous, split transaction bus with separate address, read,
& write data buses allowing two simultaneous transfers per clock cycle
❑ All masters/slaves have their own Address Read/Write Data & control
signals also called transfer qualifier signals
❑ Split transaction Bus: PLB address Read/Write data buses are
decoupled, allowing for address cycles to be overlapped with Read or
Write data cycles and read data cycles to be overlapped with write data
cycle
79
80
80
40
Bus Topologies
❑ Reduces impact of capacitance across two segments
❑ Reduces contention & energy
Shared Bus 81
81
82
41
Bus Physical Structure
83
83
Wire-area
d n d n
O (n)
O n3( n)
( n)
n n
O (n) O n
d n
Point-to Point: Segmented Bus: d
O n2( n ) O n2( n )
( ) ( )
n
O n n O n n
84
42
On-Chip Communication
❑ Bus Structure: Highly connected multi-core system. Largely power
consumes and increase capacitive load
❑ Point-to-point Structure: Optimal in bandwidth, availability, latency &
power usage, simple to design. Occupy hardware area & routing
problems
❑ NoC Structure: Lower complexity & cost, can be reuse of components,
efficient & high-performance interconnect, Scalable. May have latency,
software need clear synchronization in uP
❑
Bus Vs NoCs
86
86
43
Typical NoC Router
H Buffer
Buffer H
H Buffer
Buffer H
Routing Arbitration
87
87
Layered Approach
Software Traffic Queuing
Modeling Theory
Transport Architectures
Network Separation of
concerns
Wiring
Networking
PE PE PE
PE PE PE Router PE
PE PE PE
Regular NoC
88
88
44
Networks-on-Chip (NoC)
❑ Network-on-chip (NoC) is a communication subsystem on IC b/w IP cores in SoC
❑ NoC can span synchronous & asynchronous clock domains or use unclocked
asynchronous logic
❑ NoC technology applies networking theory & methods to on-chip communication
& brings notable improvement over conventional bus & crossbar interconnections
❑ NoC improves the scalability of SoC & power efficiency of complex SoC
compared to other designs
❑ Significant design effort for bus-style interconnect is not going to sufficient for
large SoCs:
– Physical implementation does not scale: bus fanout, loading, arbitration
depth all reduce operating frequency
– Available bandwidth does not scale: the single bus must be shared by all
masters & slaves
89
89
NoC Characteristics
❑ Communication latency: Time delay b/w module requesting data &
receiving a response to request
❑ Master & Slave: A unit can initiate or react to communicate requests. A
Master such as processor, controls transactions b/w itself & other
modules. A Slave such as memory responds to requests from master
❑ Concurrency Requirement: Number of independent simultaneous
communication channels operating in parallel
❑ Packet or Bus Transaction: The size and definition of information
transmitted in a single transaction. For bus: address with control bits
(R/W) and data. Packet consists of header (address & control) data
❑ ICU: It manages interconnect protocol & physical transaction. If IP core
requires protocol translation to access bus, the unit called bus wrapper
❑ Multiple Clock Domains: Different IP modules may operate at different
clock and data rates
90
90
45
NoC Properties
❑ NoC is more than a single shared bus
❑ NoC provides point-to-point connections b/w any two hosts attached to
network either by crossbar switches or through node-based switches
❑ NoC provides high aggregate bandwidth through parallel links
❑ NoC, communication is separate from computation
❑ NoC uses a layered approach to communications, although with few
network layers due to complexity and expense
❑ NoCs support pipelining and provide intermediate data buffering b/w
sender & receiver
91
91
NoC Properties
❑ Regular geometry that is scalable 1 ns (1 GHz) 0.1 ns (10 GHz)
❑ Flexible QoS guarantees B B
❑ Higher bandwidth
❑ Reusable components
– Buffers, arbiters, routers, protocol stack A
❑ No long global wires (or global clock tree) A
92
46
Routing Algorithms
❑ NoC routing algorithms should be simple
– Complex routing schemes consume more device area (complex
routing/arbitration logic)
– Additional latency for channel setup/release
– Deadlocks must be avoided
❑ Deadlock can occur if it is impossible for any messages to move (without
discarding one)
– Buffer deadlock occurs when all buffers are full in a store and forward
network. This leads to a circular wait condition, each node waiting for space
to receive the next message.
– Channel deadlock is similar, but will result if all channels around a circular
path in a wormhole-based network are busy (recall that each “node” has a
single buffer used for both input and output)
❑ Some additional features are highly desirable
– QoS, fault-tolerance
93
93
94
94
47
Optimizing NoC Particular Application
❑ NoC architecture has to flexible & parametric
• Parameters allow customization
• Parameters: Buffers depth, number of virtual channels, NoC size, etc.
❑ Application Specific Optimization
– Buffers Application Architecture Library
– Routing
– Topology Architecture / Application Model
– Mapping to topology
– Implementation and Reuse NoC Optimisation
95
95
Module Module
R R R R R
Extract inter-
module traffic Module Module Module Module
Module R R R R R
Module
Module
Module
Allocate link R R R R
R Module R
Module Module
R R R R R
Verify QoS and
Module Module Module Module
cost
Module R R R R
Module
R Module R
Module
Module
96
96
48
NoC Structure
❑ NoC is a packet switched on-chip communication network designed using a
layered methodology
– “Routes packets, not wires”
❑ NoCs use packets to route data from the source to the destination PE via a
network fabric that consists of
• Switches (routers)
• Interconnection links (wires)
❑ NoCs are an attempt to scale down the concepts of largescale networks, &
apply them to embedded SoC domain
❑ ISO/OSI network protocol stack model
Processing
Elements (PEs)
97
97
NoC Topologies
❑ Famous NoC topologies are:
a) SPIN,
b) CLICHE’
c) Torus
d) Folded torus
e) Octagon
f) BFT
98
98
49
NoC Topologies
❑ Direct Topologies:
– Each node has direct point-to-point link to a subset of other nodes in
system called neighboring nodes
– Nodes consist of computational blocks and/or memories, as well as a NI
block that acts as a router
– When number of nodes in system increases, total available communication
bandwidth also increases
– Fundamental trade-off is b/w connectivity & cost
– Most direct network topologies have an orthogonal implementation, where
nodes can be arranged in n-dimensional orthogonal space
– Routing for such networks is fairly simple
• e.g. n-dimensional mesh, torus, folded torus, hypercube, & octagon
❑ 2D Mesh Topology:
– All links have same length
• Eases physical design
– Area grows linearly with number of nodes
– Must be designed to avoid traffic accumulating in center of mesh
99
99
NoC Topologies
❑ Torus/k-ary /n-cube:
– n-dimensional grid with k nodes in each dimension
– k-ary 1-cube (1-D torus) is essentially a ring network with k nodes
• Limited scalability as performance decreases when more nodes
100
100
50
NoC Topology
❑ Folding Torus Topology:
– Overcomes the long link limitation of a 2-D torus
– Links have the same size
– Meshes and tori can be extended by adding bypass links to increase
performance at the cost of higher area
❑ Octagon Topology:
– Another example of a direct network
– Messages being sent between any 2 nodes require at most two hops
– More octagons can be tiled together to accommodate larger designs
• by using one of nodes is used as a bridge node
101
NoC Topology
❑ Indirect Topologies:
– Each node is connected to an external switch, and switches have point-to-
point links to other switches
– Switches do not perform any information processing, and correspondingly
nodes do not perform any packet switching
– e.g. SPIN, crossbar topologies
❑ Fat tree topology:
– Nodes are connected only to the leaves of the tree
– More links near root, where bandwidth requirements are higher
❑ k-ary n-fly butterfly network:
– Blocking multi-stage network – packets may be temporarily blocked or
dropped in the network if contention occurs
– kn nodes, and n stages of kn-1 k x k crossbar
– e.g. 2-ary 3-fly butterfly network
102
51
NoC Topology
❑ (m, n, r) Symmetric Clos network
– Three-stage network in which each stage is made up of a number of
crossbar switches
– m is the no. of middle-stage switches
– n is the number of input/output nodes on each input/output switch
– r is the number of input and output switches
– e.g. (3, 3, 4) Clos network
– Non-blocking network
– Expensive (several full crossbars)
❑ Benes network
– Rearrangeable network in which paths may have to be rearranged to
provide a connection, requiring an appropriate controller
– Clos topology composed of 2 x 2 switches
– e.g. (2, 2, 4) re-arrangeable Clos network constructed using two (2, 2, 2)
Clos networks with 4 x 4 middle switches
103
NoC Topology
❑ Irregular or ad hoc network topologies
– Customized for an application
– Usually a mix of shared bus, direct & indirect network topologies
– e.g. reduced mesh, cluster-based hybrid topology
104
104
52
NoC Example
Processor Processor Processor
Master Master Master
Routing Global
Routing Routing Memory
Node Node Node Slave
105
105
FR
CR CR
SERDES
CNI CNI CNI CNI CNI CNI
Routers R R R R R R
FR FR
FR FR
PCI CR D/A
DSP CPU
A/D
CNI CNI CNI CNI CNI CNI Configurable
R R R R R R
region – User
logic
CR CR CR CR
Configurable
network CNI CNI CNI CNI CNI CNI
interface R R R R R R
FR
FR
CR CR ETH
DRAM
I/F
CNI CNI CNI CNI CNI CNI
R R R R R R
106
106
53
Networks Processor
107
107
These are competing requirements: Design a network that is the “best” fit
108
108
54
SoC Interconnect Switches
❑ The units on chip, link bandwidth has 16-128 wires
❑ Node fanout is the number of channels connect a node to its neigbors
❑ Nodes can altered both to establish connectivity & improve n/w bandwidth
109
109
110
55