Asic Prototyping Aldec
Asic Prototyping Aldec
Asic Prototyping Aldec
Rev. 1.9
September 14, 2012
Co-authored by:
Slawek Grabowski and Zibi Zalewski, Aldec, Inc.
Kirk Saban, Xilinx, Inc.
Abstract
This paper highlights possibilities of ASIC verification
using FPGA-based prototyping, considering the latest
Virtex®-7 devices and Aldec HES-7 dual Virtex-7 2000T
FPGA ASIC prototyping board. In addition, the most
common partitioning issues and resolutions are
described.
Aldec White Paper HES-7 ASIC Prototyping
Table of Contents
HES-7 ASIC Prototyping ................................................................................................................................. 1
Table of Contents ...................................................................................................................................... 2
New opportunities for ASIC prototyping ................................................................................................... 3
Partitioning ................................................................................................................................................ 3
Balancing FPGA resources while creating design partitions .................................................................. 4
Placing and routing partitions on FPGA board ...................................................................................... 5
Meeting required timing........................................................................................................................ 5
Debugging of prototyped design ............................................................................................................... 6
Xilinx design example ................................................................................................................................ 7
Stacked silicon interconnect .................................................................................................................. 7
Breaking through ................................................................................................................................... 8
Conclusion ................................................................................................................................................. 8
Aldec HES-7 ASIC prototyping board – key features ................................................................................. 9
About Aldec, Inc. ..................................................................................................................................... 10
Rev. 1.8
www.aldec.com Page 2
Aldec White Paper HES-7 ASIC Prototyping
Partitioning
When design partitioning, there are two choices: manual partitioning done by the engineer (on the RTL
sources) and semi- or fully-automatic partitioning performed by EDA tools (on both the RTL source and
the design netlist). See figure 1.
Manual
RTL Partitions
Design RTL Partitioning
(engineer) for Synthesis
Automatic
Design RTL NetlistPartitions
or Netlist Partitioning
(tool) for Implementation
In cases where an ASIC design needs to be partitioned and shared between six or more FPGAs, manual
partitioning requires tremendous effort and frequently results in poor speed (of the design under test)
and functionality problems. In such cases the only solution is to use the EDA partitioning tools available
Rev. 1.8
www.aldec.com Page 3
Aldec White Paper HES-7 ASIC Prototyping
on the market. Of course, even for those tools the problem increases when the design size grows -
which means the number of FPGAs also increases, further complicating the partitioning process.
Clearly, the fewer partitions (i.e. FPGAs) the easier the implementation of the ASIC prototyping will be.
The Virtex-7 2000T FPGA, with its 12M ASIC capacity, should accommodate most designs. A single
Virtex-7 FPGA will be ideal for small- to medium-size designs; and all that’s required for the prototyping
is the Virtex-7 FPGA board and Xilinx Vivado™ Design Suite. For bigger designs, two Virtex-7 devices
should suffice and partitioning is greatly simplified, as there is only a single partition with which to
contend.
Assuming multiple FPGA devices are employed for ASIC prototyping the engineer must:
1. Balance FPGA resources while creating design partitions
2. Place and route the partitions on the FPGA board
3. Meet timing requirements
4. Be able to debug of the prototyped design
These problems grow in severity with the number of FPGAs. We will discuss in the next sections how to
divide the problems into smaller ones that are easier to solve. However, the complexity to create an
optimal prototype using a multi-FPGA platform is very high. Most users settle with a suboptimal solution
for the above problems because finding a better solution requires too much effort. It is essential to use
the largest FPGA devices and avoid the complexity that rapidly grows when using multiple small FPGAs
for prototyping.
Resource balancing is also a part of these partitions during placing and routing. Frequently, while trying
to find the ideal balance between partitions we run into routing problems. Optimally balanced partitions
usually do not have optimal interfaces and require a high number of PCB tracks. Going further, we will
run also into timing issues if routing is not optimal. In conclusion, the partitioning, partition placement
and routing, and timing problems cannot be really solved in separation. Focusing on one of these
problems may lead to trouble with the others.
It is essential to map big memories to on-board DDR2/DDR3 devices. Offloading FPGAs with some
memories always helps with partitioning. However, it does not help too much with placing and routing.
Each FPGA platform has limited DDR2/DDR3 resources and usually the memories are connected only to
selected FPGAs on the board. Mapping a memory to on-board RAM also requires the placement of its
associated logic close to the memory. This leads to additional considerations for placing and routing.
Rev. 1.8
www.aldec.com Page 4
Aldec White Paper HES-7 ASIC Prototyping
Most of the partitioning tools on the market try to find an optimal solution by following design
structure. Partitions are created from instances of different modules. However, separating some
modules may lead to poor logic optimization and speed that is too low to run the prototype efficiently.
Some redundant or unused logic may not be removed and complicate partitioning and resources
balancing. This reinforces the message that the larger the device the better.
It is very difficult not to exceed an FPGA’s pin count while mapping partitions, especially for big ASIC
designs. In most cases pin multiplexing is required. Multiplexers are used to share the same connection
in time between multiple signals. Transferring multiple signals over the same connection usually
requires design clock frequencies to be reduced, typically in accordance with the multiplexing ratio. It is
essential to keep the multiplexing ratio as small as possible and avoid performance problems caused by
reduced clock frequencies.
A partition is usually created from multiple design modules grouped together. It is difficult to find a
single module instance that fits perfectly to given partition resources. Some designs are implemented to
be partition-friendly and engineers intentionally create hierarchy that corresponds with FPGA platform
architecture. However, design hierarchy usually is created according to design functionality and does not
fit well to the architecture of any prototyping board. In such cases, multiple modules must be grouped
to create the partition. Frequently, there is a group of the same signals connected to several modules. It
is essential to find such signals and avoid routing them multiple times between partitions; which may
increase the multiplexing ratio and reduce performance.
As mentioned in the previous section, partitions that contain modules mapped to on-board DDR2/DDR3
RAM must be placed in the FPGAs that have access to the required memory. This is an additional
constraint for partition place and route because such partitions can be placed only in specific FPGA chips
and cannot be moved to other chips where memory is not available.
Separating design modules and juggling with them to create partitions may again lead to optimization
problems. During routing the most important are constant signals or source/load less signals. They
should not be routed between partitions and consume critical routing resources.
Rev. 1.8
www.aldec.com Page 5
Aldec White Paper HES-7 ASIC Prototyping
All clocks must be distributed using dedicated resources on the board. Clock lines are the critical
resource on each FPGA board. Especially when the number of global clock lines is limited. All internal
clocks should be implemented inside the FPGA devices. Gated clocks are not recommended in FPGAs
and should be converted to clock enables.
Routing internal clocks between FPGA devices is usually not possible because of the lack of sufficient
clock lines and clock buffers. This is why logic clocked by an internal clock should be grouped and placed
inside a single FPGA device. If the logic is too large and must be partitioned between multiple devices, it
is better to replicate the clock generators in each FPGA device to avoid routing the clocks over PCB. The
best solution is to distribute global clocks only to each FPGA device using dedicated clock lines and
replicate local clock generators in each FPGA to avoid routing any additional clocks between FPGAs.
Similarly it is not recommended to route any clock domain crossings between FPGAs. All signals crossing
clock domains should be implemented inside FPGA device.
Signals routed between FPGA devices should be registered. Partitioning big combinatorial logic between
FPGAs complicates constraining and may result in poor performance. Each path routed between FPGAs
should be properly constrained in each FPGA to make sure that it will work at its required speed.
Rev. 1.8
www.aldec.com Page 6
Aldec White Paper HES-7 ASIC Prototyping
The single chip Virtex-7 2000T FPGA implementation also resulted in dramatic improvements in
productivity. The design no longer needs to be partitioned across six FPGAs; the entire RTL for the full
design can be implemented in the Virtex-7 2000T FPGA. This eliminates the need for IO multiplexing and
dramatically reduces time to market for end users. The implementation of the Zynq-7000 All
Programmable SoC on the Virtex-7 2000T FPGA utilized the new Vivado Design Suite from Xilinx. With
the new analytical place and route engine in the Vivado Design Suite, no manual placement constraints
were required to achieve 50MHz performance across the device. See figure 2 below:
and IO signals from each FPGA slice though tens of thousands of routes in the interposer.
The 65nm passive silicon interposer contains no active components, which increases the reliability of
both the interposer and the TSVs. Due to the fact that the interposer is also manufactured from silicon,
it provides thermal conductivity and helps to mitigate package stress.
The side-by-side layout of the FPGA die allows for efficient heat dissipation and also reduces warping.
The layout maintains a conventional FPGA topology that minimizes the impact to software place and
route algorithms. All these technologies individually have been proven to be very reliable and their
application in stacked silicon interconnect (SSI) is equally as reliable The resulting Virtex-7 2000T device,
the first FPGA using SSI technology has been shipping since October 2011.
Breaking through
ASIC prototyping with FPGAs enables fast and accurate SoC system modeling and verification as well as
accelerated software and firmware development. Xilinx takes FPGA-based prototyping one step further
with the introduction of the Virtex-7 2000T device which allows end users to break through multi-chip
partitioning barriers with the capacity and performance needed to prototype complex ASICs in a single
FPGA.
Conclusion
Implementing a high-speed prototype using multiple FPGAs is very challenging and requires an engineer
to find optimal solutions to several technical problems. Anyone who had to deal with multi-FPGA
prototyping can confirm that. To help ease this difficulty, it is essential to use a prototyping board with
the largest possible FPGA devices. The number of FPGAs determines the complexity of the prototype
implementation and debug. Previously, using fewer FPGAs was impossible due to FPGA capacity, at the
time, but with the latest Virtex-7 2000T device most ASIC designs can be accommodated with one or
two devices; which will make the design bring-up less painful and much quicker. To accommodate, that
Aldec provides the HES-7 product with the largest Virtex-7 FPGA devices. The next section provides
more information about the main features of HES-7.
Rev. 1.8
www.aldec.com Page 8
Aldec White Paper HES-7 ASIC Prototyping
Rev. 1.8
www.aldec.com Page 9
Aldec White Paper HES-7 ASIC Prototyping
Customizable Clocking:
5 customizable, low skew, global clocks from backplane
o User programmable PLLs (2KHz to 1.4GHz)
o MMCX connectors for external clock inputs (0 - 450MHz)
60 differential local clock inputs from backplane
o 20 multi-region and single region differential pairs to FPGA1
o 40 multi-region and single region differential pairs to FPGA2
Memory:
Up to 16GB of DDR3 memory supported with SO-DIMM sockets
UP to 64GB of DDR3 with 4 HES-7 boards connected through backplane
8GB Flash memory connected to FPGA0
Micro SD card socket available for stand-alone configuration
Protocol Support:
PCI-Express finger connector for inside PC operation
PCI-Express cable connector for outside PC operation
USB 2.0/3.0 connector
JTAG connector
2 SATA connectors (Device/Host)
Diagnostics:
Self -diagnostic button and status LEDs for hardware diagnostic
Daughterboard Support:
Leverages non-proprietary 25 Gbps backplane connectors
Technical specification on the interface available for daughterboard development
Rev. 1.8
www.aldec.com Page 10