7. Hardware Compilation
7 An Algebraic Approach to
Hardware Compilation
Authors
Jonathan P. Bowen1 and He Jifeng2
1
London South Bank University
Centre for Applied Formal Methods, Institute for Computing Research
Faculty of Business, Computing and Information Management
Borough Road, London SE1 0AA, UK
Email: jonathan.bowen@lsbu.ac.uk
URL: http://www.jpbowen.com/
2
The United Nations University
International Institute for Software Technology
P.O. Box 3058, Macau, China
Email: jifeng@iist.unu.edu
URL: http://www.iist.unu.edu/~jifeng/
Summary
This chapter presents a provably correct compilation scheme that
converts a program into a network of abstract components that
interact with each other by exchanging request and
acknowledgement signals. We provide a systematic and modular
technique for correctly realizing the abstract components in
hardware device, and use a standard programming language to
describe both algorithms and circuits. The resulting circuitry, which
behaves according to the program, has the same structure as the
program. The circuit logic is asynchronous, with no global clock.
Keywords: compilation; formal methods; hardware design
1
7. Hardware Compilation
2
7.1 Introduction
With chip sizes consisting of millions of transistors, the complexity of VLSI
algorithms – i.e., algorithms implemented as a digital VLSI circuits – is
approaching that of software algorithms – i.e., algorithms implemented as code
for a stored-program computer. For many applications, particularly where
speed of execution or security is important, a customer-built circuit is better
than the traditional processor-and-software combination. The speed is
improved by the absence of the machine language layer and also by introducing
parallelism, whereas security is improved by the impossibility of
reprogramming. Moreover, there are spacing saving compared to a
combination of software and processor.
Yet design methods for VLSI algorithms lag far behind the potential of the
technology. The design methods for digital circuits that are commonly found in
textbooks resemble the low-level machine-language programming methods.
Selecting individual logic gates in a circuit is something like selecting
individual machine instruction in a program. These methods may have been
adequate for small circuit design when they were introduced, and they may still
be adequate for large circuits that are simply repetitions of a small circuit (such
as a memory), but they are not adequate for circuits that perform complicated
customer algorithm.
Since a VLSI system is a highly concurrent computation, we propose an
approach to VLSI design based on concurrent computing. The circuit to be
designed is first implemented as a concurrent program that fulfils the logical
specification of the circuit. The program is then compiled into a circuit by
applying semantics-preserving transformations. Hence, the circuit obtained is
correct by construction.
Communication in VLSI is becoming increasingly expensive, compared to
switching, as the size of the wire determines both the switching costs and the
area of a chip. In order to reflect those cost ratios, a model in which
communication is explicit is more appropriate to control the cost of
communication. Hence, we opted for a notation based on the notion of
concurrent processes communicating by explicit message-passing and
assignments to variables. We adopt a high level programming language like
Occam [7] as a behavior specification language for hardware device. Occam is
a language for designing and describing concurrent systems, and hardware
designers exploit concurrency in their pursuit of increased performance. For
example, today’s fastest microprocessors typically contain a number of
cooperating agents: a bus interface that directs traffic among the main memory
and caches; an instruction fetch unit that reads and decodes instructions from
the cache; and several execution units that carry out the decoded instructions.
The components of the microprocessor synchronize when they need to
7. Hardware Compilation
3
exchange information, but otherwise proceed at their own pace. Such a system
is naturally described as a set of communicating processes.
This chapter presents a provably correct compilation scheme that converts
a program into a network of abstract components that interact with each other
by exchanging request and acknowledgement signals. We provide a systematic
and modular technique for correctly realizing the abstract components in
hardware device, and use a standard programming language to describe both
algorithms and circuits. The resulting circuits, which behave according to the
program, have the same structure as the program. The circuit logic is
asynchronous, with no global clock.
Why is it significant that our compilation scheme is verified? Highly
concurrent systems are notoriously difficult implement correctly; there is little
chance of getting them right unless a disciplined approach is taken early in
their specification and design. Occam’s concise notation makes it easy to see
whether a given description captures the designer’s intent. Furthermore, Occam
has a well-understood mathematical model [4] and a complete set of algebraic
laws [13], allowing potential system misbehavior to be detected by analysis
rather than simulation. However, rigorous reasoning at the source level is for
naught if the compilation scheme itself introduces misbehavior. The
importance of detecting flaws before a product goes to market was understood
by the Pentium disaster of 1994: Intel was forced to write off $475 million due
to an obscure bug in the Pentium’s floating-point division unit [8].
The VHDL [10] and Verilog [15] languages are presently being used by
industry. They provide a way to express formally and symbolically the
constituent components of a hardware circuit and their interconnections, and
allow circuits designers to describe circuits more conveniently, but they are not
translated automatically to circuits. There are interactive synthesis tools to aid
in the construction of synchronous circuits from a subset of these languages.
The circuits are then verified by simulation.
There are other high-level circuit design methods that have been developed
and reported in the literature. Martin at CalTech developed a method of
compiling a communicating process [6] into a set of transistors via an
intermediate mapping to production rules. In [1, 2], a similar approach (and a
similar circuit design language) was taken, except that specifications are
mapped into connections of small components for which standard transistor
implementations exist. In [16], circuits are modeled as networks of finite state
machines, using their formalism to assist in proving the correctness of their
compiled circuits. Page at Oxford developed a prototype compiler in the
functional language SML, which converted an Occam-like language to a netlist
[1]. After further processing by vendor software the netlist can be loaded into
Xilinx FPGA chips [17]. This work is most similar to ours, but their designs
have a global clock; ours do not. Moreover, the algebraic approach in this
chapter offers the significant advantages of providing a provably compiling
7. Hardware Compilation
4
method, and it is also expected to support a wide range of design optimization
strategies.
The rest of this chapter is organized as follows. Section 7.2 presents a
simple language of communicating processes as the source language. An
overview of our compilation strategy is given in Section 7.3. Section
7.4introduces the concepts of handshake protocol and context-dependent
refinement. Section 7.5 is devoted to the implementation of program variables
and Boolean expressions. In Section 7.6, we convert the sequential subset of
the language into a network of abstract components, and validate the
compilation scheme. Section 7.7 shows how to use a set of primitive circuits to
implement the control processes generated by our hardware compilation
scheme.
7.2 A Language of Communicating Processes
7.2.1 Syntax
Our language of communicating processes contains the features that are typical
of Occam and Occam-like languages:
•
Boolean state variables can be used in expressions and updated by
assignment.
• Programs can be composed in sequence,in parallel, or made to execute
conditionally or repeatedly.
• Concurrently executing programs can synchronize through shared
objects called channels.
• Programs offer several channels on which to synchronize may choose
among them.
The syntax for the language is given by the following BNF rules, where x
stands for a program variable of Boolean type, ch for a channel name, b for a
Boolean expression, and P for a process, and for sequence catenation.
Informally, the process terms stand for the following processes:
1. Execution of skip does nothing, and leaves all variables unchanged.
2. Execution of x := b assigns the value of expression b to variable x.
7. Hardware Compilation
5
3.
4.
ch?x is input from a channel named ch to variable x.
ch!b is output to a channel named ch of the value of expression b.
5.
The construct P Q executes either P or Q, where the choice between P
and Q is made non-deterministically, without consent of its
environment.
In general, let P be a finite non-empty set of processes. We use the
notation
of P.
P to denote the non-deterministic choice over the members
We define a relation between programs such that P holds whenever,
for any purpose, the observable behavior of P is good as, or better than,
that of Q:
P Q =df (P Q) = Q
6.
7.
8.
9.
The sequential composition P;Q executes P first, and then executes Q
after P terminates; it terminates when Q terminates.
P || Q stands for the parallel composition of P and Q, wherein all
communications between P and Q are concealed from the environment.
Let E be the set of communication events between P and Q. The
synchronization construct P || E Q behaves like the parallel
composition P || Q except that the communication events of E remain
visible to the external environment
P || Q = ( P || E Q ) \ E
where \ denotes the Communicating Sequential Processes (CSP) hiding
operator [6, 14].
The Boolean guarded process b → P cannot appear outside of a
conditional or iterative statement; it is triggered only when the initial
value of b is true and its execution completes when P terminates.
Several guarded processes may be composed into a sequence.
The conditional if BG fi examines its guarded processes in order,
selects the first one whose Boolean guard is true, and executes its
associated process. It terminates when that process terminates. If none
of its guards is true, the behavior of the conditional becomes totally
unpredictable like a chaotic program.
For notational simplicity, we will use P ◁b ▷Q to describe a program
which behaves like P if the initial value of b is true, or like Q if the
initial value of b is false:
10. The iterative construct do BG od is similar to the conditional if BG fi,
except that it repeatedly evaluates its guarded processes until none of
its guards is true, and then it terminates.
11. The guarded alternation alt G tla offers to its environment a choice
over the input guards of its alternatives. If the environment performs a
7. Hardware Compilation
6
communication on c, and c?x → P is the unique element of sequence G
with the input on c as the guard, then the construct alt G tla will behave
like process P. If there are a number of guarded processes in G with the
input on c as their guards, then the choice among them will be made
non-deterministically.
We adopt the CSP notation P [] Q to stand for the external choice
between P and Q, which offers the environment the choice of the first
events of P and Q and then behaves accordingly. The alternation
construct can be rewritten as external choice:
Let A = (a1,…,an}be a finite set of communication events. The notation
x : A → P (x)
abbreviates
12. The Boolean expressions include true (“high voltage” or “power”) and
false (“low voltage” or “ground”). The Boolean unary operator ¬ is
negation. The Boolean binary operators include ∧ (conjunction) and ∨
(disjunction).
A local variable x can be introduced by the declaration command var x • P. We
will also use chaos to represent the chaotic program whose behavior is totally
uncontrollable and unpredictable.
In the rest of this chapter, we take a more general form of recursion and use
the notation µX • P(X) to stand for the weakest fixed point (with respect to the
relation ) of the equation X = P(X).
Example 2.1 (clock)
A clock can be modeled by a recursive process CLOCK which can repeatedly
engage in the event tick
Example 2.2 (a single place buffer)
A singe place buffer, which inputs messages on channel left and outputs them
on channel right, can be modeled by the recursive process COPY:
Example 2.3 (wire)
The process WIRE(a, c) can emit an output event c in response to an input event
a. It is also receptive in the sense that it does not refuse input, but may diverge
if given an input that it is not prepared to handle.
7. Hardware Compilation
7
Example 2.4 (iteration)
The iteration do(b1 → P1 ,..., bn → Pn )od can be rewritten as the tail
recursion:
7.2.2 Algebraic laws
The basic laws defining Occam-like programs are given in [13]. This section
gives a number of algebraic laws that are used in the design and verification of
the hardware compilation scheme presented later in this chapter.
The non-deterministic choice operator is idempotent, symmetric,
associative and disjunctive. It has chaos as its zero.
Law 1 (non-deterministic choice)
1.1 P
P=P
1.2 P
Q=Q
1.3 P
(Q
R) = (P Q)
R
1.4 P
(Q
R) = (P Q)
(P R)
1.5 P
chaos = chaos
P
The relation
induced by the non-deterministic choice operator is a partial
order; i.e., it is reflexive, transitive and anti-symmetric. The chaotic program
chaos is the bottom of the relation .
Law 2 (refinement)
2.1 P
chaos
Sequential composition is associative and disjunctive, and has unit skip and left
zero chaos.
Law 3 (sequence)
3.1 (P;Q);R = P;(Q;R)
3.2 (P
Q );R = (P;R)
3.3 P; (Q
R) = (P;Q)
3.4 skip;Q = Q = Q;skip
3.5 chaos;Q = chaos
(Q;R)
(P;R)
7. Hardware Compilation
8
Conditionals are product. It is idempotent, skew-symmetric, associative and
disjunctive. Sequential composition distributes leftward over the conditional.
Law 4 (conditional)
4.1 P ◁b ▷P = P
4.2 P ◁true ▷Q = P
4.3 P ◁b ▷Q = Q ◁¬b ▷P
4.4 (P ◁b ▷Q) ◁c ▷R = P ◁b ∧ c ▷ (Q ◁c ▷R)
4.5 P ◁b ▷ (Q ◁c ▷R) = (P ◁b ▷Q) ◁c ▷(P ◁b ▷R)
4.6 P ◁b ▷ (Q
R) = (P ◁b ▷Q) (P ◁b ▷R)
4.7 (P ◁b ▷ Q); R = (P;R) ◁b ▷(Q;R)
The following law connects the “if” construct with conditional.
Law 5 (if)
5.1 if b → P fi = P ◁b ▷chaos
5.2 if <b → P>⁀ BG fi = P ◁b ▷(if BG fi)
The external choice operator [] is idempotent, symmetric associative and
disjunctive, and has chaos as its zero. It distributes over sequential composition
when all its components are guarded.
Law 6 (external choice)
6.1 P [] P = P
6.2 P [] Q = Q [] P
6.3 P [] (Q [] R) = (P [] Q) [] R
6.4 P [] (Q R) = (P [] Q) (P [] R)
6.5 (a → P) [](a → Q) = a → (P Q)
6.6 P [] chaos = chaos
6.7 (x:A → P(x));Q = x:A → (P(x);Q)
7. Hardware Compilation
9
The parallel operators || and || are symmetric, associative and disjunctive, and
E
have chaos as zero.
Law 7 (parallel)
7.1 P || Q = Q || P
7.2 P || (Q || R) = (P || Q) || R
7.3 P || (Q R) = (P || Q) (P || R)
7.4 P || chaos = chaos
When P can perform any events A, being equivalent to a process of the form
x:A → P(x), and Q can perform the events in B, respectively y:B → Q(y), then
P ||E Q can perform any event of
The first component of this union are the events P can perform on its own
(because they are not in E); similarly, the second are the events Q can do by
itself. The final components are the events on which they can synchronize (i.e.,
the ones that they both can do). The law expressing this is the following:
Law 8 (expansion law of || )
E
Let P = x:A → P(x) and Q = y:B → Q(y).
8.1 P ||E Q = (x:A\E → (P(x) ||E Q) [] (y:B\E → (P ||E Q(y)) [] (z:A∩B → (P(z) ||E
Q(z))
The expansion law of || is more complicated because any event in A ∩ B can
occur silently and will be treated as an internal communication, and thus is
concealed from the environment:
Law 9 (expansion law of || )
9.1 P || Q = (x:A\E → (P(x) ||E Q) [] (y:B\E → (P ||E Q(y))
[] (
z∈A∩B
→ (P(z) ||E Q(z))) (
z∈A∩B
→ (P(z) ||E Q(z)))
Both expansions laws enable us to transfer a parallel program into a sequential
one, and are widely used in the later proof.
The program µX • P(X) can be implemented as a single non-recursive call
of a parameterless procedure with name X and with body P(X). Occurrences of
X within P(X) are implemented as recursive calls on the same procedure. The
7. Hardware Compilation
10
following laws [15] state that µX • P(X) is indeed a fixed point of P, and that it
is the weakest one.
Law 10 (weakest fixed point)
10.1 µX • P(X) = P(µX • P(X))
10.2 if Y
P(Y) then Y
µX • P(X)
All the recursion we have seen (such as CLOCK, COPY and WIRE) and all
most all recursions one meets in practice) have a property that makes them
easier to understand and reason about. They are guarded, i.e., each recursive
call comes after a communication that is introduced by the recursive definition.
The point about a guarded recursion is that the first-step behavior does not
depend on at all on a recursive call, and when a recursive call is reached, the
first step of its behavior, in turn, can be computed without any deeper call. This
leads to the principle of unique fixed point for guarded recursion.
Law 11 (unique fixed point)
If µX • P(X) is a guarded recursion and Y satisfies the equation Y = P(Y) then Y
= µX • P (X).
The following laws express the basic properties of assignment: that variables
not mentioned on the left of “:=” remain unchanged, that the order of the listing
is immaterial, and that evaluation of an expression or a condition uses the value
most recently assigned to its variables.
Law 12 (assignment)
12.1 (x, y := e, y) = x := e
12.2 (x, …, y, … := e, …, f,…) = (y, …, x, … := f, …, e,…)
12.3 (x := e; x := f(x)) = x := f(e)
12.4 (x := e); (P ◁b(x) ▷Q) = (x := e; P); ◁b(e) ▷(x := e; Q)
Let Σ be the set of events which the process P can perform. Given L ⊆ Σ and a
divergence-free process P, the process P \\ L represents an abstract view of P in
the set Σ \ L, and is defined in [14] by:
Divergences(P \\ L) =df ⌀
Failures(P \\ L) =df {(s ↓(Σ \ L), X) | (s, X ∩ (Σ \ L)) Failures(P)}
The process projection construct P \\ L differs from the hiding construct P \ L
in the following way: in the failure-divergence model the latter will introduce a
divergence whenever P can perform an infinite sequence of events of L, but the
former becomes deadlock.
7. Hardware Compilation
11
Law 13 (projection)
13.1 If the alphabet of P is disjoint from the set L
i
1-i
for i = 0, 1, then
(P0 || P1) \\ (L0 ∪ L1) = (P0 \\ 0) || (P1 \\ 1)
13.2 If P is a process without containing the process variable X, then
µX • (a → P [] b → X) \\ {a} = a → (P \\ {a})
As an example to show how to apply algebraic laws, we are going to establish
the following fact: two wires can be connected into a single one.
Lemma 2.5 (connecting wires)
WIRE(a, h) || WIRE(h, c) = WIRE(a, c)
Proof: Let FULL(a,c) =df (a? → chaos [] c! → WIRE(a, c)). We have
WIRE(a, c) = a? → FULL(a,c)
which together with the unique fixed point theorem Law 10 implies that
LHS = µX • (a? → (a? → chaos [] c! → X)) = RHS
□
7. Hardware Compilation
12
7.3 Compiling Strategy
The main difference between software programming and VLSI programming is
that in VLSI, concurrency is free and sequencing is costly, whereas it is just the
opposite in software. In hardware, concurrency is implemented by mere
juxtaposition of circuits. Sequencing requires synchronization. We therefore
avoid sequencing as much as possible, and implement it as a restricted form of
concurrency. This is the compilation strategy adopted in this chapter.
One small technicality is that rather than implement P directly, we choose
r
to implement a reusable version of P, specified as Фa (P) below
r
Фa (P) =df µX • ((r? → P); (a! → X))
where
• r, the request signal, activates the process P, and
• a, an acknowledgement signal, indicates that P has terminated.
r
The process Фa (P) can be activated many times, while P cannot.
Essentially, a source program P is split into communicating processes M
(P) and D (P), where
1. M (P) models the control flow of P.
2. D (P) implements variables, expressions and communication channels.
r
The compilation function Ca is simply defined by
Our main goal is to prove that the target program produced by the compilation
scheme serves as a refinement of the reusable version of P:
The verification task is broken into three steps
•
•
•
First, we prove that the parallel composition of M (P) and D (P) is a
legal replacement of the source program P.
Then we show:
Finally, we are going to prove that the composite mapping Фa ∘M is
indeed a homomorphism. Taking sequential composition as an example
r
7. Hardware Compilation
13
Фa ∘M is a homomorphism means that the control process of P0;P1 can
be implemented by the parallel composition of those of its components:
r
where SEQ is designed to model the sequential composition operator:
In this way, all the programming operators will eventually be replaced
by parallel operator.
The control process M (P) involves only synchronized communications with D
(P) which maintains the state of variables and channels. To avoid deadlock on
the internal links between M (P) and D (P), we shall ensure that the
communications on these channels satisfy the related handshake protocol. This
is the topic of the next section.
7.4 Handshake protocol
In languages such as Occam, communication between processes is synchronous
in the sense that a communication event can take place only when both the
sender and the receiver agree on it. It is the synchronous nature of
communication that gives these languages the expression power. In our
approach, source programs are compiled into networks of primitive
components, and synchronous communications of source programs are
implemented by a communication protocol involving multiple asynchronous
communication events.
Each primitive component has one or more ports that are used to connect it
to its neighbors. Within each port, the components and their neighbors obey a
handshaking protocol: events, called request, start the execution of the module
or its neighbors, while events, called acknowledgement, indicate that the
execution has completed. The compilation scheme ensures that no component
attempts to send either a request or acknowledgement event to its partner unless
the latter is waiting for such an input. With this guarantee, synchronous
communications can be replaced by asynchronous ones.
The translation from synchrony to asynchrony is a necessary step in
compiling our language to hardware, because the basic hardware building
blocks cannot refuse input events from their environment, whereas
synchronous communication implies the ability of a process to refuse to
participate in a communication event. Any unexpected inputs to a circuit may
lead to aberrant behavior; it is the obligation of the circuit’s environment to
provide input only when the circuit is waiting for it.
7. Hardware Compilation
14
The simplest signal interface is the two-wire interface, which consists of
one request and one acknowledgement signal. A handshake begins when a user
module sends an event to a server along the wire r (for request), the user then
waits a response on the wire a (for acknowledgement). When it has received an
acknowledgement from the server, the handshake is complete.
7.4.1 Definition 4.1 (two wire control interface)
The handshake protocol HP(r, a), used in the control interface of the target
processes, behaves like a one-place buffer, repeatedly engaging in an
acknowledgement event a after receipt of a request event r.
where the alternative skip is present to enable HP(r, a) to respond to its parallel
partners’ termination request.
□
A sequence of HP(r, a) is still a handshake protocol.
Lemma 4.2 (sequence of handshake protocols)
HP(r, a) ; HP(r, a) = HP(r, a)
Proof: Algebraically using Laws 6.4 & 6.5, 11 (twice), 1.1 and 10.1.
Definition 4.3
A process Q satisfies the handshake protocol on (r, a) if
(Q ||{r,a} HP(r,a)) = Q
This fact will be denoted by ⊦HP(r,a) Q.
□
Lemma 4.4
If ⊦HP(r,a) Q, then
□
(Q ; R) ||{r,a} HP(r,a) = Q ; (R ||{r,a} HP(r,a))
Lemma 4.5
(1)
⊦HP(r,a) Q skip.
(2)
⊦HP(r,a) Q stop.
(3) If both P and Q satisfy the handshake protocol, so do P ; Q and P
Q.
7. Hardware Compilation
⊦HP(r,a) (P ; Q) and ⊦HP(r,a) (P
15
Q).
(4) If Q satisfies the handshake protocol, so does the guarded recursion
µX • (Q ; X).
Proofs:
Of (3):
Of (4):
□
In the next section, we will present the handshake protocols to pass data
between the master control process M (P) and the data process D (P). For
example, the handshake protocol for access of a Boolean variable has a single
event req for requesting the current value of that variable, but contains two
acknowledgement events, val.0 and val.1, for returning the value. Thus it is
necessary to generalize HP(r, a).
Let I be a finite set, and let A be an I-indexed family of finite set of
events, and let B = {(r(i), A(i)) | i ∈ I}. The handshake protocol on B can then
be defined as follows:
This definition allows a handshake beginning with event r(i) to be completed
by one of the events of A(i). A process Q is said to obey the handshake protocol
on the set B if
Definition 4.6 (handshake refinement)
7. Hardware Compilation
The handshake refinement relation
16
HP(B)
between processes is defined by
i.e., the process S is a refinement of R in any environment which obeys the
handshake protocol HP(B).
□
Lemma 4.7
(R
HP(B)
S) ⋀(⊦HP(B) Q) ⇒(R || Q)
(S || Q)
Proof:
□
7.5 Data processes
In order to simplify the presentation, we will only deal with the sequential
subset of the language in this section, and postpone the treatment of
communication and concurrency to section 8.
7.5.1
Variables
A Boolean program variable x is realized by a communication process V (x),
which has two handshake ports, one for reading and one for writing. To read
the value of variable x, a reader send a request on x.req and the variable
responds wither either x.val0 or x.val1, depending upon the stored value. To
response a read request correctly, the process V (x) has to comply with the
following requirement
(Req1) (x.req! → x.val?v → Q ) || V (x) = (v := x); (Q || V (x))
To write a value, a writer sends a request on either x.write.0 or x.write1. After
the stored value has been updated, the variable responds on x.ack. Accordingly,
V (P) needs to meet the second requirement
(Req2) (x.write!v → x.ack? → Q ) || V (x) = (x := v); (Q || V (x))
The construction of V(x) starts with design of a process CELL(x) acting as the
state holder of variable x
7. Hardware Compilation
17
CELL(x) =df READ(x) [] WRITE(x) [] skip
where
• READ(x) is a simple handshake protocol, acting as the read interface.
READ(x) =df x.req? → x.val!x → CELL(x)
To avoid deadlock on the newly introduced channels x.req and x.val, it
is required that the user of READ(x) obeys the handshake protocol
HP(x.req, x.val).
• The processWRITE(x) plays the role of the write interface, and is also a
handshake protocol
WRITE(x) =df x.write?x → x.ack! → CELL(x)
The user of WRITE(x) is required to obey the handshake protocol on the
channels (x.write, x.ack).
CELL(x) provides single user the desirable service of variable x.
Lemma 5.1 (Read and write)
Let Chan(CELL(x)) ⊆ Chan(Q). Then
(1) (x.req! → x.val?v → Q ) || V (x) = (v := x); (Q || V (x))
(2) (x.write!v → x.ack? → Q ) || V (x) = (x := v); (Q || V (x))
Proof: Direct from the expansion law, Law 9.2.
□
One difficulty with building a process to implement a program variable is that
it must support multiple readers and writers. We accomplish this by introducing
multiplex processes RMUL and WMUL. In order to avoid interference among
multiple-user request, we shall treat the read and write actions as atomic ones.
For this purpose RMUL and WMUL are constructed as follows
where the sets I and J are both finite.
Putting the three processes in parallel, we finish the design of V (x)
which is equipped with the following channels:
Every user of the program variable x will be allocated a pair
7. Hardware Compilation
18
of channels for its read operation, and it is asked to obey the handshake
protocol over the corresponding channels. The writers of x will be treated in a
similar way.
Suppose that Var(P) = {x1, x2, …, xn}. Then the following process
is included in D (P) to represent the variable state of P.
7.5.2 Expressions
The Boolean constants true and false are realized by the TRUE and FALSE
modules. The modules use the read interface and always return a particular
acknowledgement – val0 for false and val1 for true.
For each Boolean expression b of P, the data process D (P) contains a
component process E (b) to model its behavior. The process E (b) operates in a
similar way as the read port process READ. For example, the process OR
(defined below to implement x∨ y) first waits for a request signal from its user,
and then reads the value of x from V(x) to its local variable w. If w has the
value true, then this value is passed to its user, otherwise it reads the value of y
from V (y) and then passes it to the user.
Taking the multiple-user issue into account, we end with the following design
where RMUL(req, val) is the process RMUL(x.req, x.val) after proper channel
renaming.
The module AND, used to implements the Boolean expression x ∧ y, is defined
by
The module NEG, used to realize the negation ¬x, is defined by
7. Hardware Compilation
19
A composite expression b = b1∨ b2 can be implemented in the same way as
x∨ y except that the former will communicate with the expression E (b1) and E
(b2) rather than the variable processes V (x) and V (y). To avoid the name clash
among the expression processes, we will rename the channels req and val in the
process E (b) by b.req and b.val respectively. The definition of modules for
b1∧ b2 and ¬b are similar.
Let Exp( P) = {b1 ,...., bm } , we define
For a sequential program P without communication, its data process D (P) is
formed by
The following theorems validate our design.
Lemma 5.2 (Evaluation of expression)
Let Chan (D (P)) ⊆ Chan (Q). If i ∈ RI (b), then
Proof: From the expansion law, Law 9.2.
To be able to execute the expression processes in parallel with the master
control process, we must make sure that the processes representing expressions
have disjoint sets of channels. In particular, since most expression processes
may need to access the variable processes, the allocation of the channels x.req
i
and x.val turns out to be an important issue in the hardware compilation
i
scheme. For simplicity, we assume that there are index functions RI and WI.
For each variable process V (x), in addition to the channels used by expression
processes the following set of channels
is available at the disposal of the control process M (P). For an expression
process E(b), we take a similar convention that all channels in the set
can be employed to access the current value of b. As a result, the set of free
handshake channels of the data process D (P) is
To simplify the proof of our compiling function in the later sections, we exploit
the regularity of handshake protocols to construct SD (P), a sequential version
of D (P), which does not contain parallel composition.
7. Hardware Compilation
20
Since all users of D (P) must obey the handshake protocol HP(B), from Lemma
4.3 we can replace D (P) by SD (P) within such a context.
Lemma 5.3 D (P) =HP(B) SD (P)
□
Lemma 5.4 SD (P); SD (P) = SD (P)
□
7.6 Control processes
Construction of control processes is based on a translator M whose task is to
replace each evaluation of a Boolean expression b by a sequence of
asynchronous communications with the process E (b), and to replace every
assignment to a program variable x by interactions with the variable process V
(x).
Control processes of the primitive commands have straightforward
definitions:
The definition of M (x := b) suggests that the choice of channels used to
communicate with V (x) and E (b) are irrelevant. In particular, it allows the use
of any available read interface to evaluate b and then write the result to x. This
non-determinism allows us later to allocate a specific pair (i, j) of channel
indices for implementation of M (x := b).
A control process of a sequential composition is formed by those of its
components:
The control process M (if b fi) evaluates the Boolean guard b by interacting
with the expression process E (b)
7. Hardware Compilation
21
Conditional with multiple branches behaves as if its Boolean guards are
calculated sequentially.
The process M(do BG od) evaluates its guards in sequel and terminates when
none of them is true. As expected, it is implemented by a tail recursion
where
From the definition of M and Lemma 4.5 we conclude that all the control
processes M (P) obey the handshaking protocol HP(B), where the set B was
defined in the previous section.
Lemma 6.1
(M (P);Q) || SD (P) = (M (P) || SD (P)); (Q || SD (P))
Our compilation strategy is validated by the following theorem.
□
Theorem 6.2 (Correctness of the control processes)
P
M (P) || D (P)
Outline of proof: The proof is based on structural induction using the
following base case and inductive steps.
(1) Basic case: P = (x := b)
(2) Inductive Step:
(2.1) P = Q ; R
(2.2) P = if (b → Q) fi
(2.3) P = if (b → Q [] BG) fi
(2.4) P = do (b → Q) od
(2.5) P = do (b → Q [] BG) od
□
Now we can establish the correctness of the compiling function M.
Theorem 6.3 (correctness of compiling function)
Proof: Using Lemmas 4.7 & 5.3, Law 9.1, Lemma 6.1, Lemmas 4.7 & 5.3
(again), Theorem 6.2 and Law 9.2.
7. Hardware Compilation
22
r
The process Фa (P) enjoys a number of algebraic laws. The first one says the
r
additional wires used to connect Фa (P) with its environment has no effect.
Lemma 6.4
Let r0 and r0 be fresh events. Then
The second law enables us to implement the sequential composition by the
parallel composition.
Lemma 6.5
Let Q and R be processes with disjoint sets of variables and channels. If h is a
fresh event, then
7.7 Hardware device
r
In this section we are going to divide the control process Фa (M (P )) into a set
of primitive handshake modules within an environment obeying both HP(r, a)
and HP(B), where B denotes the set of channels used to access the data process
D (P), and was defined in Section 4. We define
where S =df B ∪ {r, a}
and PROT =df HP(B) || HP(r,a) describes the behavior of the environment
which obeys the handshake protocols HP(B) and HP(r, a).
Lemma 7.1
If Q obeys the handshake protocols HP(B) and HP(r, a), then
The main objectives of our design are to preserve the modular structure of the
source program, and to generate a small number of hardware devices. We have
already presented a number of handshake modules such as CELL(x) and OR in
Section 4. The following includes four further examples.
First, the command skip can be implemented by a wire.
Theorem 7.2 (skip)
7. Hardware Compilation
23
Proof:
□
The control process of x := b can be implemented by a set of wires, where a pair
(b.req , b.val ) of channels is chosen as the read interface with E (b), and
i
i
(x.write , x.ack ) as the write interface with V (x).
j
j
Theorem 7.3 (assignment)
, where
Proof: Similar to Theorem 7.2.
Surprisingly, the sequential composition operator can also be implemented by a
set of wires.
Theorem 7.4 (sequential composition)
Let P0 and P1 be processes with disjoint channels. If h is not used by P0 and
P1, then
Proof: From Lemma 6.5.
The final two theorems show that the control processes of conditional and
iteration can be implemented by the hardware device M(in1, in2: out), which
merges signals received from its input ports in0 and in1
Theorem 7.5 (conditional)
7. Hardware Compilation
24
Let i ∈ RI(b). If none of {r, b.reqi, b.vali, a0, a1, a} is used by Q and R, then
Proof: Define
We are going to establish the following fact
NEW LHS
For notational convenience, we will drop the port parameters of M and Φ in the
proof below.
from which together with Law 10.2 it follows that
(2) Similar to (1).
Theorem 7.6 (iteration)
Let i ∈ RI(b). If none of {r , â, a, b.reqi , b.vali } is used by Q, then
Proof: Define
7. Hardware Compilation
25
We are going to show that (V1, W1) and (V2, W2) satisfy the same guarded
recursive equation.
From Laws 8 and 9 it follows that:
From the unique fixed point theorem Law 11 we obtain V1 = V2 as required.
7.8 Conclusion
Hardware compilation is an exciting development in the array of techniques
available to generate a implementation from a high-level description. It allows
hardware to be generated very quickly from software. What is more, it is
possible to formally prove the relationship between a high-level program
7. Hardware Compilation
26
(software) and a low-level netlist of components connected with wires
(hardware) is correct [3,5].
In this chapter we have presented a small programming language that can
be compiled into hardware. A set of algebraic laws are given for the language
that allow programs to be transformed in a provably correct manner. A
compilation strategy from this language to a hardware description in the form
of a netlist is given for each of the constructs in the language. These are posited
as theorems that can be proved correct in an algebraic style. Some sample
proofs are given.
For the future, it is expected that hardware will increasingly be generated
from high-level descriptions, especially when the available design time is
limited, costs need to be kept down and speed of execution is not an overriding
factor. In addition, this approach can help to raise the level of confidence in the
correctness of the implementation since, as demonstrated in this chapter, it is
possible to mathematically prove that the transformation from the high-level
description to the low-level implementation is correct, not just for individual
cases, but for all designs that are generated following such a compilation
scheme.
7.9 References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
K. van Berkel, J. Kessels, M. Roncken, R. W. J. J. Saeijs and F. Schalij.
The VLSI language Tangram and its translation into handshake circuits.
In Proceedings of the European Design Automation Conference, (1991).
K. van Berkel. Handshake Circuits: An Asynchronous Architecture for
VLSI Programming. Cambridge University Press, (1993).
J. P. Bowen and J. He. An approach to the specification and verification
of a hardware compilation scheme. The Journal of Supercomputing
19(1):23–39, (2001).
S. D. Brookes, C. A. R. Hoare and A. W. Roscoe. A theory of
communicating sequential processes. Journal of the ACM 31(3):560–
599, (1984).
J. He. An algebraic approach to the VERILOG programming. In Formal
Methods at the Crossroads, Lecture Notes in Computer Science 2757,
pages 65–80, Springer-Verlag, (2003).
C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall
International Series in Computer Science, (1985).
Inmos Limited. Occam 2 Reference Manual. Prentice Hall International,
(1988).
Intel Corporation. 1994 Annual Report. (1995).
7. Hardware Compilation
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
27
A. J. Martin. Programming in VLSI: From communicating processes to
delay-insensitive circuits. In Developments in Concurrency and
Communication, C. A. R. Hoare (ed.), pages 1–64, Addison-Wesley,
(1990).
I. Page and W. Luk. Compiling occam into field-programmable gate
arrays. In Field-Programmable Gate Arrays, W. Moore and W. Luk
(eds.) pages 271–283, Abingdon EECS Books, (1991).
S. Palnitkar. Verilog HDL, 2nd edition. Prectice Hall PTR, (2003).
V. A. Pedroni, Circuit Design with VHDL. The MIT Press, (2004).
A. W. Roscoe and C. A. R. Hoare. Laws of Occam programming.
Theoretical Computer Science 60:177–229, (1988).
A. W. Roscoe. The Theory and Practice of Concurrency. Prentice Hall
International Series in Computer Science, (1997).
A. Tarski. A lattice-theoretical fixpoint theorem and its applications.
Pacific Journal of Mathematics, 5:285–309, (1955).
S. W. Weber, B. Bloom and G. Brown. Compiling Joy into silicon. In
Advanced Research in VLSI and Parallel Systems, T. Knight and J.
Savage (eds.), MIT Press, (1992).
Xilinx Inc. Programmable Logic Devices, FPGA & CPLD.
www.xilinx.com, San Jose, California, USA, (2005).