VPR User's Manual
VPR User's Manual
VPR User's Manual
3.3 Placement File Format:
The first line of the placement file lists the netlist (.net) and architecture (.arch) files used to create
this placement. This information is used to ensure you are warned if you accidentally route this placement
with a different architecture or netlist file later. The second line of the file gives the size of the logic
block array used by this placement.
All the following lines have the format:
bl ock_name x y subbl ock_number
The block name is the name of this block, as given in the input .net formatted netlist. X and y are the
row and column in which the block is placed, respectively. The subblock number is meaningful only for
I/O pads. Since we can have more than one pad in a row or column when io_rat is set to be greater than 1
in the architecture file, the subblock number specifies which of the several possible pad locations in row x
and column y contains this pad. Note that the first pads occupied at some (x, y) location are always those
with the lowest subblock numbers -- i.e. if only one pad at (x, y) is used, the subblock number of the I/O
placed there will be zero. For clbs, the subblock number is always zero.
The placement files output by VPR also include (as a comment) a fifth field: the block number.
This is the internal index used by VPR to identify a block -- it may be useful to know this index if you are
modifying VPR and trying to debug something.
Figure shows the coordinate system used by VPR via a small 2 x 2 clb FPGA. The number of clbs
in the x and y directions are denoted by nx and ny, respectively. Clbs all go in the area with x between 1
and nx and y between 1 and ny, inclusive. All pads either have x equal to 0 or nx +1 or y equal to 0 or ny
+1.
An example placement file is given below.
Net l i st f i l e: xor 5. net Ar chi t ect ur e f i l e: sampl e. ar ch
Ar r ay si ze: 2 x 2 l ogi c bl ocks
#bl ock name x y subbl k bl ock number
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
a 0 1 0 #0 - - NB: bl ock number i s a comment .
b 1 0 0 #1
c 0 2 1 #2
d 1 3 0 #3
e 1 3 1 #4
out : xor 5 0 2 0 #5
xor 5 1 2 0 #6
[ 1] 1 1 0 #7
The blocks in a placement file can be listed in any order.
3.4 Routing File Format
The first line of the routing file gives the array size, nx x ny. The remainder of the routing file lists
the global or the detailed routing for each net, one by one. Each routing begins with the word net,
followed by the net index used internally by VPR to identify the net and, in brackets, the name of the net
given in the netlist file. The following lines define the routing of the net. Each begins with a keyword that
identifies a type of routing segment. The possible keywords are SOURCE (the source of a certain output
pin class), SINK (the sink of a certain input pin class), OPIN (output pin), IPIN (input pin), CHANX
(horizontal channel), and CHANY (vertical channel). Each routing begins on a SOURCE and ends on a
SINK. In brackets after the keyword is the (x, y) location of this routing resource. Finally, the pad number
(if the SOURCE, SINK, IPIN or OPIN was on an I/O pad), pin number (if the IPIN or OPIN was on a
clb), class number (if the SOURCE or SINK was on a clb) or track number (for CHANX or CHANY) is
listed -- whichever one is appropriate. The meaning of these numbers should be fairly obvious in each
case. If we are attaching to a pad, the pad number given for a resource is the subblock number defining to
which pad at location (x, y) we are attached. See Figure for a diagram of the coordinate system used by
VPR. In a horizontal channel (CHANX) track 0 is the bottommost track, while in a vertical channel
(CHANY) track 0 is the leftmost track. Note that if only global routing was performed the track number
for each of the CHANX and CHANY resources listed in the routing will be 0, as global routing does not
assign tracks to the various nets.
For an N-pin net, we need N-1 distinct wiring paths to connect all the pins. The first wiring path
will always go from a SOURCE to a SINK. The routing segment listed immediately after the SINK is the
part of the existing routing to which the new path attaches. It is important to realize that the first pin after
a SINK is the connection into the already specified routing tree; when computing routing statistics be
sure that you do not count the same segment several times by ignoring this fact. An example routing for
one net is listed below.
Net 5 ( xor 5)
SOURCE ( 1, 2) Cl ass: 1 # Sour ce f or pi ns of cl ass 1.
OPI N ( 1, 2) Pi n: 4
CHANX ( 1, 1) Tr ack: 1
CHANX ( 2, 1) Tr ack: 1
I PI N ( 2, 2) Pi n: 0
SI NK ( 2, 2) Cl ass: 0 # Si nk f or pi ns of cl ass 0 on a cl b.
CHANX ( 1, 1) Tr ack: 1 # Not e: Connect i on t o exi st i ng r out i ng!
CHANY ( 1, 2) Tr ack: 1
CHANX ( 2, 2) Tr ack: 1
CHANX ( 1, 2) Tr ack: 1
I PI N ( 1, 3) Pad: 1
SI NK ( 1, 3) Pad: 1 # Thi s si nk i s an out put pad at ( 1, 3) , subbl ock 1.
Nets which are specified to be global in the netlist file (generally clocks) are not routed. Instead, a
list of the blocks (name and internal index) which this net must connect is printed out. The location of
each block and the class of the pin to which the net must connect at each block is also printed. For clbs,
the class is simply whatever class was specified for that pin in the architecture input file. For pads the
pinclass is always -1; since pads do not have logically-equivalent pins, pin classes are not needed. An
example listing for a global net is given below.
Net 146 ( pcl k) : gl obal net connect i ng:
Bl ock pcl k ( #146) at ( 1, 0) , pi ncl ass - 1.
Bl ock pksi _17_ ( #431) at ( 3, 26) , pi ncl ass 2.
Bl ock pksi _185_ ( #432) at ( 5, 48) , pi ncl ass 2.
Bl ock n_n2879 ( #433) at ( 49, 23) , pi ncl ass 2.
3.5 SDC File Format
Synopsys Design Constraints (SDC) is the industry-standard format for specifying timing
constraints. The following subset of SDC syntax is supported by VPR (italicized portions are optional):
create_clock -period <float> -waveform {rising_edge falling_edge} <netlist clock list or regexes>
create_clock -period <float> -waveform {rising_edge falling_edge} -name <virtual clock name>
Assigns a desired period (in nanoseconds) and offset to one or more clocks in the netlist (if the name
token is omitted) or to a single virtual clock (used to constrain input and outputs to a clock external to the
design). Netlist clocks can be referred to using regular expressions, while the virtual clock name is taken
as-is. Omitting the waveform creates a clock with a rising edge at 0 and a falling edge at the half period,
and is equivalent to using -waveform {0 <period/2>}. Non-50% duty cycles are supported but behave no
differently than 50% duty cycles, since falling edges are not used in analysis. If a virtual clock is assigned
using a create_clock command, it must be referenced elsewhere in a set_input_delay or set_output_delay
constraint. If a netlist clock is not specified with a create_clock command, paths to and from that
clock domain will not be analysed.
set_clock_groups -exclusive -group {<clock list or regexes>} -group {<clock list or regexes>}
-group {<clock list or regexes>} ...
Tells the timing analyser to not analyse paths between the specified groups of clock domains, in either
direction. May be used with netlist or virtual clocks in any combination. A set_clock_groups constraint is
equivalent to a set_false_path constraint (see below) between each clock in one group and each clock in
another. For example, the following sets of commands are equivalent:
set_clock_groups -exclusive -group {clk} -group {clk2 clk3}
and
set_false_path -from [get_clocks{clk}] -to [get_clocks{clk2 clk3}]
set_false_path -from [get_clocks{clk2 clk3}] -to [get_clocks{clk}]
set_false_path -from [get_clocks <clock list or regexes>] -to [get_clocks <clock list or regexes>]
Cut paths unidirectionally from each clock after -from to each clock after to. Otherwise equivalent to
set_clock_groups. Note that false paths are supported between entire clock domains, but not between
individual registers.
set_max_delay <delay> -from [get_clocks <clock list or regexes>] -to [get_clocks <clock list or regexes>]
Overrides the default timing constraint calculated using the information from create_clock with a user-
specified delay. Be aware that this may produce unexpected results.
set_multicycle_path -setup -from [get_clocks <clock list or regexes>] -to [get_clocks <clock list or
regexes>] <num_multicycles>
Creates a multicycle at the clock domain level: adds (num_multicycles - 1) times the period of the
destination clock to the default setup time constraint. Note that multicycles are supported between entire
clock domains, but not between individual registers.
set_input_delay -clock <virtual or netlist clock> -max <max_delay> [get_ports {<I/O list or regexes>}]
set_output_delay -clock <virtual or netlist clock> -max <max_delay> [get_ports {<I/O list or regexes>}]
Use set_input_delay if you want timing paths from input I/Os analyzed, and set_output_delay if you want
timing paths to output I/Os analyzed. If these commands are not specified in your SDC, paths from and to
I/Os will not be timing analyzed.
These commands constrain each I/O pad specified after get_ports to be timing-equivalent to a register
clocked on the clock specified after -clock. This can be either a clock signal in your design or a virtual
clock that does not exist in the design but which is used only to specify the timing of I/Os. The command
also adds the delay max_delay through each pad, thereby tightening the setup time constraint along paths
travelling through the I/O pad; this additional delay can be used to model board level delays. For single-
clock circuits, -clock can be wildcarded using * to refer to the single netlist clock, although this is not
supported in standard SDC. This allows a single SDC command to constrain I/Os in all single-clock
circuits.
Regular expressions may be used to refer to I/Os with this command.
# (comment), \ (line continued) and * (wildcard)
#starts a comment everything remaining on this line will be ignored. \ at the end of a line indicates that
a command wraps to the next line. * is used in a get_clocks command or at the end of create_clock to
match all netlist clocks, and partial wildcarding (e.g. clk* to match clk and clk2) is also supported. As
mentioned above, * can be used in set_input_delay and set_output delay to refer to the netlist clock for
single-clock circuits only, although this is not supported in standard SDC.
Regular expressions may be used to refer to one or more netlist (but not virtual) clocks in all
commands except set_input_delay and set_output_delay. Regular expressions may be used to refer to
I/Os in set_input_delay and set_output_delay.
By default, VPR looks for a file named circuitname.sdc in the parent directory, where circuitname is
the specified circuit name. An alternate SDC file name can be specified using the --sdc_file command-
line option. If VPR is invoked from run_vtr_task.pl, it looks for a file named circuitname.sdc in the
directory /vtr_flow/sdc. An alternate directory can be specified in an individual tasks /config/config.txt
file, using sdc_dir=path/to/folder/inside/vtr_flow (e.g. sdc_dir =sdc will mimic default behaviour).
3.5.1 Default behaviour and sample SDC files
In this section, examples for multiple clocks assume the circuit has two clocks called clk and clk2.
If an SDC file named circuitname.sdc is not found, VPR uses the following defaults:
Combinational circuits
Constrains all I/Os on a virtual clock virtual_io_clock, and optimizes this clock to run as fast as possible.
Equivalent SDC file:
create_clock -period 0 -name virtual_io_clock
set_input_delay -clock virtual_io_clock -max 0 [get_ports{*}]
set_output_delay -clock virtual_io_clock -max 0 [get_ports{*}]
Single-clock circuits
Constrains all I/Os on the netlist clock, and optimizes this clock to run as fast as possible.
Equivalent SDC file:
create_clock -period 0 *
set_input_delay -clock * -max 0 [get_ports{*}]
set_output_delay -clock * -max 0 [get_ports{*}]
Multi-clock circuits
Constrains all I/Os a virtual clock virtual_io_clock. Does not analyse paths between netlist clock
domains, but analyses all paths from I/Os to any netlist domain. Optimizes all clocks, including I/O
clocks, to run as fast as possible.
Equivalent SDC file:
create_clock -period 0 *
create_clock -period 0 -name virtual_io_clock
set_clock_groups -exclusive -group {clk} -group {clk2}
set_input_delay -clock virtual_io_clock -max 0 [get_ports{*}]
set_output_delay -clock virtual_io_clock -max 0 [get_ports{*}]
Here are sample SDC files for common non-default use cases:
A. Cut I/Os and analyse only register-to-register paths, including paths between clock domains;
optimize to run as fast as possible.
create_clock -period 0 *
B. Same as A, but with paths between clock domains cut. Separate target frequencies are specified.
create_clock -period 2 clk
create_clock -period 3 clk2
set_clock_groups -exclusive -group {clk} -group {clk2}
C. Same as B, but with paths to and from I/Os now analyzed. (Same as the multi-clock default, but
with custom period constraints.)
create_clock -period 2 clk
create_clock -period 3 clk2
create_clock -period 3.5 -name virtual_io_clock
set_clock_groups -exclusive -group {clk} -group {clk2}
set_input_delay -clock virtual_io_clock -max 0 [get_ports{*}]
set_output_delay -clock virtual_io_clock -max 0 [get_ports{*}]
D. Changing the phase between clocks, and accounting for delay through I/Os with set_input/output
delay constraints.
create_clock -period 3 -waveform {1.25 2.75} clk #rising edge at 1.25, falling at 2.75
create_clock -period 2 clk2
create_clock -period 2.5 -name virtual_io_clock
set_input_delay -clock virtual_io_clock -max 1 [get_ports{*}]
set_output_delay -clock virtual_io_clock -max 0.5 [get_ports{*}]
E. Sample using all supported SDC commands. Inputs and outputs are constrained on separate
virtual clocks.
create_clock -period 3 -waveform {1.25 2.75} clk
create_clock -period 2 clk2
create_clock -period 1 -name input_clk
create_clock -period 0 -name output_clk
set_clock_groups -exclusive -group input_clk -group clk2
set_false_path -from [get_clocks{clk}] -to [get_clocks{output_clk}]
set_max_delay 17 -from [get_clocks{input_clk}] -to [get_clocks{output_clk}]
set_multicycle_path -setup -from [get_clocks{clk}] -to [get_clocks{clk2}] 3
set_input_delay -clock input_clk -max 0.5 [get_ports{in1 in2 in3}]
set_output_delay -clock output_clk -max 1 [get_ports{out*}]
4. Debugging Aids
After parsing the netlist and architecture files, VPR dumps out an image of its internal data structures
into net.echo and arch.echo. These files can be examined to be sure that VPR is parsing the input files as
you expect. The critical_path.echo file lists details about the critical path of a circuit, and is very useful
for determining why your circuit is so fast or so slow. Various other data structures can be output if you
uncomment the calls to the output routines; search the code for echo to see the various data that can be
dumped.
If the preprocessor flag DEBUG is defined in vpr_types.h, some additional sanity checks are
performed during a run. I normally leave DEBUG on all the time, as it only slows execution by 1 to 2%.
The major sanity checks are always enabled, regardless of the state of DEBUG. Finally, if VERBOSE is
set in vpr_types.h, a great deal of intermediate data will be printed to the screen as VPR runs. If you set
verbose, you may want to redirect screen output to a file.
The initial and final placement costs provide useful numbers for regression testing the netlist parsers
and the placer, respectively. I generate and print out a routing serial number to allow easy regression
testing of the router.
Finally, if you need to route an FPGA whose routing architecture cannot be described in VPRs
architecture description file, dont despair! The router, graphics, sanity checker, and statistics routines all
work only with a graph that defines all the available routing resources in the FPGA and the permissible
connections between them. If you change the routines that build this graph (in rr_graph*.c) so that they
create a graph describing your FPGA, you should be able to route your FPGA. If you want to read a text
file describing the entire routing resource graph, call the dump_rr_graph subroutine.
5. VPR Contributors
Professors:
J ason Anderson, Vaughn Betz, J onathan Rose
Graduate Students:
J ason Luu, J effrey Goeders, Ian Kuon, Alexander Marquardt, Andy Ye, Wei Mark Fang, Tim Liu
Summer Students:
Opal Densmore, Ted Campbell, Cong Wang, Peter Milankov, Scott Whitty, Michael Wainberg,
Suya Liu, Miad Nasr, Nooruddin Ahmed, Thien Yu
Companies:
Altera Corporation, Texas Instruments
6. References
[1] J . Pistorius, M. Hutton, A. Mishchenko, and R. Brayton. "Benchmarking method and designs targeting
logic synthesis for FPGAs", Proc. IWLS '07, pp. 230-237.
[2] S. Cho, S. Chatterjee, A. Mishchenko, and R. Brayton, "Efficient FPGA mapping using priority cuts".
(Poster.) Proc. FPGA '07.
[3] V. Betz, J . Rose and A. Marquardt, Architecture and CAD for Deep-Submicron FPGAs, Kluwer
Academic Publishers, 1999.
[4] V. Betz, Architecture and CAD for the Speed and Area Optimization of FPGAs, Ph.D. Dissertation,
University of Toronto, 1998.
[5] V. Betz and J . Rose, Cluster-Based Logic Blocks for FPGAs: Area-Efficiency vs. Input Sharing and
Size, CICC, 1997, pp. 551 - 554.
[6] A. Marquardt, V. Betz and J . Rose, Using Cluster-Based Logic Blocks and Timing-Driven Packing
to Improve FPGA Speed and Density, ACM/SIGDA Int. Symp. on FPGAs, 1999, pp. 37 - 46.
[7] V. Betz and J . Rose, Directional Bias and Non-Uniformity in FPGA Global Routing Architectures,
ICCAD, 1996, pp. 652 - 659.
[8] V. Betz and J . Rose, On Biased and Non-Uniform Global Routing Architectures and CAD Tools for
FPGAs, CSRI Technical Report #358, Department of Electrical and Computer Engineering,
University of Toronto, 1996. (Available for download from http://www.eecg.toronto.edu/~vaughn/
papers/techrep.ps.Z).
[9] V. Betz and J . Rose, VPR: A New Packing, Placement and Routing Tool for FPGA Research,
Seventh International Workshop on Field-Programmable Logic and Applications, 1997, pp. 213 -
222.
[10] A. Marquardt, V. Betz and J . Rose, Timing-Driven Placement for FPGAs, ACM/SIGDA Int. Symp.
on FPGAs, 2000, pp. 203 - 213.
[11] V. Betz and J . Rose, Automatic Generation of FPGA Routing Architectures from High-Level
Descriptions, ACM/SIGDA Int. Symp. on FPGAs, 2000, pp. 175 - 184.
[12] S. Brown, R. Francis, J . Rose, and Z. Vranesic, Field-Programmable Gate Arrays, Kluwer Academic
Publishers, 1992.
[13] S. Wilton, Architectures and Algorithms for Field-Programmable Gate Arrays with Embedded
Memories, Ph.D. Dissertation, University of Toronto, 1997. (Available for download from
http://www.ece.ubc.ca/~stevew/publications.html).
[14] Y. W. Chang, D. F. Wong, and C. K. Wong, Universal Switch Modules for FPGA Design, ACM
Trans. on Design Automation of Electronic Systems, J an. 1996, pp. 80 - 101.
[15] G. Lemieux, E. Lee, M. Tom, and A. Yu, Direction and Single-Driver Wires in FPGA
Interconnect, International Conference on Field-Programmable Technology, 2004, pp. 41-48
[16] P. J amieson, K. Kent, F. Gharibian, and L. Shannon. Odin II-An Open-Source Verilog HDL
Synthesis Tool for CAD Research. In IEEE Annual Intl Symp. on Field-Programmable Custom
Computing Machines, pages 149156. IEEE, 2010.