Keywords

1 Introduction

As a field, secure compilation is the study of compilers that formally preserve abstractions across languages. Its roots can be tracked back to the seminal observation made by Abadi  [1], namely that compilers which do not protect high-level abstractions against low-level contexts might introduce security vulnerabilities. But it was the advent of secure architectures like the Intel SGX  [15] and an ever-increasing need for computer security that motivated researchers to eventually work on formally proving compiler security.

The most prominent  [16, 18, 32, 35, 37, 45, 49] formal criterion for compiler security is full abstraction: A compiler is fully abstract if it preserves and reflects Morris-style contextual equivalence  [31], i.e. indistinguishability under all program contexts, which are usually defined as programs with a hole. The intuition is that contexts represent the ways an attacker can interact with programs and so full abstraction ensures that such interactions are consistent between languages.

Full abstraction is arguably a strong and useful property but it is also notoriously hard to prove for realistic compilers, mainly due to the inherent challenge of having to reason directly about program contexts  [9, 18, 24, 37]. There is thus a need for better formal methods, a view shared in the scientific community  [10, 33]. While recent work has proposed generalizing from full abstraction towards the so-called robust properties  [2, 36], the main challenge of quantifying over program contexts remains, which manifests when directly translating target contexts to the source (back-translation). Other techniques, such as trace semantics  [35] or logical relations  [17], require complex correctness and completeness proofs w.r.t. contextual equivalence in order to be applicable.

In this paper we introduce a novel, categorical approach to secure compilation. The approach has two main components: the elegant representation of Structural Operational Semantics (SOS)  [38] using category-theoretic distributive laws  [48]Footnote 1 and also maps of distributive laws  [27, 40, 50] as secure compilers that preserve bisimilarity. Our method aims to be unifying, in that there is a general, shared formalism for operational semantics, and simplifying, in that the formal criterion for compiler security, the coherence criterion for maps of distributive laws, is straightforward and relatively easy to prove.

The starting point of our contributions is an abstract proof on how coalgebraic bisimilarity under distributive laws holds contextual meaning in a manner similar to contextual equivalence (Sect. 4.3). We argue that this justifies the use of the coherence criterion for testing compiler security as long as bisimilarity adequately captures the underlying threat model. We then demonstrate the effectiveness of our approach by appeal to four examples of compiler (in)security. The examples model classic, non-trivial problems in secure compilation:

  • An example of an extra processor register in the target language that conveys additional information about computations (Sect. 5).

  • A datatype mismatch between the type of variable (Sect. 6).

  • The introduction of illicit control flow in the target language (Sect. 7).

  • A case of incorrect local state encapsulation (Sect. 8).

For each of these examples we present an insecure compiler that fails the coherence criterion, then introduce security primitives in the target language and construct a secure compiler that respects it. We also examine how bisimilarity can be both a blessing and a curse as its strictness and rigidity sometimes lead to relatively contrived solutions. Finally, in Sect. 9, we discuss related work and point out potential avenues for further development of the underlying theory.

On the Structure and Style of the Paper. This work is presented mainly in the style of programming language semantics but its ideas are deeply rooted in category theory. We follow an “on-demand” approach when it comes to important categorical concepts: we begin the first example by introducing the base language used throughout the paper, While, and gradually present distributive laws when required. From the second example in Sect. 6 and on, we relax the categorical notation and mostly remain within the style of PL semantics.

2 The Basic While Language

2.1 Syntax and Operational Semantics

We begin by defining the set of arithmetic expressions.

  • \({\langle }expr{\rangle }\, {::}=\, \texttt {lit}\, \mathbb {N}\, |\, \texttt {var}\, \mathbb {N}\, |\, {\langle }expr{\rangle } \,{\langle }bin{\rangle }\, {\langle }expr{\rangle } \,| \,{\langle }un{\rangle } \,{\langle }expr{\rangle }\)

The constructors are respectively literals, a dereference operator var, binary arithmetic operations as well as unary operations. We let S be the set of lists of natural numbers. The role of S is that of a run-time store whose entries are referred by their index on the list using constructor var. We define function \(\texttt {eval} : S \times E \rightarrow \mathbb {N}\) inductively on the structure of expressions.

Definition 1

(Evaluation of expressions in While)

$$\begin{aligned}&\mathtt {eval}~store~(\mathtt {lit}~n) = n \\&\mathtt {eval}~store~(\mathtt {var}~l) = \mathtt {get}~store~l\\&\mathtt {eval}~store~(e_{1}~b~e_{2}) = (\mathtt {eval}~store~e_{1})~[[b]]~(\mathtt {eval}~store~e_{2})\\&\mathtt {eval}~store~(u~e) = [[u]]~(\mathtt {eval}~store~e) \end{aligned}$$

Programs in While language are generated by the following grammar:

  • \({\langle }prog{\rangle }\,{::}=\, \texttt {skip}\, |\, \mathbb {N}\, \texttt {:=} \,{\langle }expr{\rangle }\, |\, {\langle }prog{\rangle } \, ; \, {\langle }prog{\rangle }\, | \, \texttt {while} \, {\langle }expr{\rangle }\, {\langle }prog{\rangle }\)

The operational semantics of our While language are introduced in Fig. 1. We are using the notation \(s, x \Downarrow s^\prime \) to denote that program x, when supplied with s : S, terminates producing store \(s^\prime \). Similarly, \(s, x \rightarrow s^\prime , x^\prime \) means that program x, supplied with s, evaluates to \(x^\prime \) and produces new store \(s^\prime \).

Fig. 1.
figure 1

Semantics of the While language.

2.2 While, Categorically

The categorical representation of operational semantics has various forms of incremental complexity but for our purposes we only need to use the most important one, that of GSOS laws  [48].

Definition 2

Given a syntax functor \(\Sigma \) and a behavior functor B, a GSOS law of \(\Sigma \) over B is a natural transformation \(\rho : \Sigma (\mathrm {Id}\times B) \Longrightarrow B \Sigma ^{*}\), where \((\Sigma ^{*}, \eta , \mu )\) is the monad freely generated by \(\Sigma \).

Example 1

Let E be the set of expressions of the While-language. Then the syntax functor \(\Sigma : \mathbf {Set}\rightarrow \mathbf {Set}\) for While is given by \(\Sigma X = \top \uplus (\mathbb {N} \times E) \uplus (X \times X) \uplus (E \times X)\) where \(\uplus \) denotes a disjoint (tagged) union. The elements could be denoted as \(\mathtt {skip}\), \(l := e\), \(x_1 ; x_2\) and \(\mathtt {while}~e~x\) respectively. The free monad \(\Sigma ^{*}\) satisfies \(\Sigma ^{*} X \cong X \uplus \Sigma \Sigma ^{*} X\), i.e. its elements are programs featuring program variables from X. Since While-programs run in interaction with a store and can terminate, the behavior functor is \(BX = S \rightarrow (S \times \mathrm {Maybe}~X)\), where S is the set of lists of natural numbers and \(X \rightarrow Y\) denotes the exponential object (internal Hom) \(Y^X\).

The GSOS specification of While determines \(\rho \). A premise \(s, p \rightarrow s', p'\) denotes an element \((p, b) \in (\mathrm {Id}\times B) X\) where \(b(s) = (s', \mathrm {just}~p')\), and a premise \(s, p \Downarrow s'\) denotes an element (pb) where \(b(s) = (s', \mathrm {nothing})\). A conclusion \(s, p \rightarrow s', p'\) (where \(p \in \Sigma X\) is further decorated above the line to \(\bar{p} \in \Sigma (\mathrm {Id}\times B) X\)) specifies that \(\rho (\bar{p}) \in B\Sigma ^{*}X\) sends s to \((s', \mathrm {just}~p')\), whereas a conclusion \(s, p \Downarrow s'\) specifies that s is sent to \((s', \mathrm {nothing})\). Concretely, \(\rho _X : \Sigma (X \times BX) \rightarrow B \Sigma ^{*} X\) is the function (partially from [47]):

$$\begin{aligned} \begin{array}{l c l} \mathtt {skip} &{}\mapsto &{} \lambda ~s.(s, \mathrm {nothing}) \\ l~\mathtt {:=}~e &{}\mapsto &{} \lambda ~s.(\mathtt {update}~s~l~(\mathtt {eval}~s~e), \mathrm {nothing}) \\ \mathtt {while}~e~(x,f) &{}\mapsto &{} \lambda ~s. {\left\{ \begin{array}{ll} (s, \mathrm {just}~(x~\mathtt {;}~\mathtt {while}~e~x)) &{} \mathrm {if}~\mathtt {eval}~s~e \ne 0 \\ (s, \mathrm {just}~(\mathtt {skip})) &{} \mathrm {if}~\mathtt {eval}~s~e = 0 \end{array}\right. } \\ (x,f)~\mathtt {;}~(y,g) &{}\mapsto &{} \lambda ~s. {\left\{ \begin{array}{ll} (s^\prime , \mathrm {just}~(x^\prime ~\mathtt {;}~y)) &{} \mathrm {if}~f(s) = (s^\prime , \mathrm {just}~x^\prime ) \\ (s^\prime , \mathrm {just}~y) &{} \mathrm {if}~f(s) = (s^\prime , \mathrm {nothing}) \end{array}\right. } \end{array} \end{aligned}$$

\(\lrcorner \)

It has been shown by Lenisa et al.  [28] that there is a one-to-one correspondence between GSOS laws of \(\Sigma \) over B and distributive laws of the free monad \(\Sigma ^{*}\) over the cofree copointed endofunctor  [28] \(\mathrm {Id}\times B\).Footnote 2

Definition 3

(In  [26]). A distributive law of a monad \((T,\eta ,\mu )\) over a copointed functor \((H,\epsilon )\) is a natural transformation \(\lambda : TH \Longrightarrow HT\) subject to the following laws: \(\lambda \circ \eta = H\eta \), \(\epsilon \circ \lambda = T\epsilon \) and \(\lambda \circ \mu = H\mu \circ \lambda \circ T\lambda \).

Given any GSOS law, it is straightforward to obtain the corresponding distributive law via structural induction (In [50], prop. 2.7 and 2.8). By convention, we shall be using the notation \(\rho \) for GSOS laws and \(\rho ^{*}\) for the equivalent distributive laws unless stated otherwise.

A distributive law \(\lambda \) based on a GSOS law \(\rho \) gives a category \(\lambda \)-Bialg of \(\lambda \)-bialgebras  [48], which are pairs \(\Sigma X \xrightarrow {h} X \xrightarrow {k} B X\) subject to the pentagonal law \( k \circ h = Bh^{*} \circ \rho _{X} \circ \Sigma [id,k]\), where \(h^{*}\) is the inductive extension of h. Morphisms in \(\lambda \)-Bialg are arrows \(X \rightarrow Y\) that are both algebra and coalgebra homomorphisms at the same time. The trivial initial B-coalgebra \(\bot \rightarrow B\bot \) lifts uniquely to the initial \(\lambda \)-bialgebra \(\Sigma \Sigma ^{*}\bot \xrightarrow {a} \Sigma ^{*}\bot \xrightarrow {h_{\lambda }} B \Sigma ^{*}\bot \), while the trivial final \(\Sigma \)-algebra \(\Sigma \top \rightarrow \top \) lifts uniquely to the final \(\lambda \)-bialgebra \(\Sigma B^{\infty }\top \xrightarrow {g_{\lambda }} B^{\infty }\top \xrightarrow {z} B B^{\infty }\top \)Footnote 3. Since \(\Sigma ^{*}\bot \) is the set of programs generated by \(\Sigma \) and \(B^{\infty }\top \) the set of behaviors cofreely generated by B, the unique bialgebra morphism \(f : \Sigma ^{*}\bot \rightarrow B^{\infty }\top \) is the interpretation function induced by \(\rho \).

Remark 1

We write A for \(\Sigma ^{*}\bot \) and Z for \(B^{\infty }\top \), and refer to \(h_{\lambda } : A \rightarrow BA\) as the operational model for \(\lambda \) and to \(g_{\lambda } : \Sigma Z \rightarrow Z\) as the denotational model  [48]. Note also that \(a : \Sigma A \cong A\) and \(z : Z \cong BZ\) are invertible.

Example 2

Continuing Example 1, the initial bialgebra A is just the set of all While-programs. Meanwhile, the final bialgebra Z, which has the meaning of the set of behaviors, satisfies \(Z \cong (S \rightarrow S \times \mathrm {Maybe}~Z)\). In other words, our attacker model is that of an attacker who can count execution steps and moreover, between any two steps, read out and modify the state. In Sect. 9, we discuss how we hope to consider weaker attackers in the future.

3 An Extra Register (Part I)

Let us consider the scenario where a malicious party can observe more information about the execution state of a program, either because information is being leaked to the environment or the programs are run by a more powerful machine. A typical example is the presence of an extra flags register that logs the result of a computation  [8, 34, 35]. This is the intuition behind the augmented version of While with additional observational capabilities, .

The main difference is in the behavior so the notation for transitions has to slightly change. The two main transition types, \(s, \texttt {x} \Downarrow _{v} s^\prime \) and \(s, x \rightarrow _{v} s^\prime , x^\prime \) work similarly to While except for the label \(v : \mathbb {N}\) produced when evaluating expressions. We also allow language terms to interact with the labels by introducing the constructor \(\texttt {obs}~\mathbb {N}~\langle prog \rangle \). When terms evaluate inside an \(\texttt {obs}\) block, the labels are sequentially placed in the run-time store. The rest of the constructors are identical but the distinction between the two languages should be clear.

While the expressions are the same as before, the syntax functor is now , and the behavior functor is . The full semantics can be found in Fig. 2. As for While, they specify a GSOS law .

Fig. 2.
figure 2

Semantics of .

Traditionally, the (in)security of a compiler has been a matter of full abstraction; a compiler is fully abstract if it preserves and reflects Morris-style  [31] contextual equivalence. For our threat model, where the attacker can directly observe labels, it makes sense to define contextual equivalence in  as:

Definition 4

 

Where C is the set of one-hole contexts, denotes the plugging function and we write \(p\Downarrow \) when p eventually terminates. Contextual equivalence for While is defined analogously. It is easy to show that the simple “embedding” compiler from While to is not fully abstract by examining terms \(a \triangleq \texttt {while}~(\texttt {var}[0])~(0~\texttt {:=}~0)\) and \(b \triangleq \texttt {while}~(\texttt {var}[0] * 2)~(0~\texttt {:=}~0)\), for which \(a \cong b\) but . A context \(c \triangleq (\mathtt {obs}~1~\_) ; \mathtt {while}~(\mathtt {var[1]}-1)~\mathtt {skip}\) will log the result of the \(\mathtt {while}\) condition in and in \(\mathtt {var}[1]\) and then either diverge or terminate depending on the value of \(\mathtt {var}[1]\). An initial \(\mathtt {var}[0]\) value of 1 will cause to terminate but to diverge.

Securely Extending . To deter malicious contexts from exploiting the extra information, we introduce sandboxing primitives to help hide it. We add an additional constructor in , , and the following inference rules to form the secure version of .

We now consider the compiler from While to which, along with the obvious embedding, wraps the translated terms in sandboxes. This looks to be effective as programs a and b are now contextually equivalent and the extra information is adequately hidden. We will show that this compiler is indeed a map of distributive laws between While and but to do so we need a brief introduction on the underlying theory.

4 Secure Compilers, Categorically

4.1 Maps of Distributive Laws

Assume two GSOS laws \(\rho _{1} : \Sigma _{1} (\mathrm {Id}\times B_{1}) \Longrightarrow B_{1} \Sigma _{1}^{*}\) and \(\rho _{2} : \Sigma _{2} (\mathrm {Id}\times B_{2}) \Longrightarrow B_{2} \Sigma _{2}^{*}\), where \((\Sigma _{1}^{*}, \eta _{1}, \mu _{1})\) and \((\Sigma _{2}^{*}, \eta _{2}, \mu _{2})\) are the monads freely generated by \(\Sigma _{1}\) and \(\Sigma _{2}\) respectively. We shall regard pairs of natural transformations \((\sigma : \Sigma _{1}^{*} \Longrightarrow \Sigma _{2}^{*}, b : B_{1} \Longrightarrow B_{2})\) as compilers between the two semantics, where \(\sigma \) acts as a syntactic translation and b as a translation between behaviors.

Remark 2

If \(A_{1}\) and \(A_{2}\) are the sets of terms freely generated by \(\Sigma _{1}\) and \(\Sigma _{2}\), we can get the compiler from \(\sigma \). On the other hand, b generates a function \(d : Z_1 \rightarrow Z_2\) between behaviors via finality.

Remark 3

We shall be writing \(B^{c}\) for the cofree copointed endofunctor \(\mathrm {Id}\times B\) over B and \(b^{c} : B^{c}_{1} \Longrightarrow B^{c}_{2}\) for \(\mathrm {id}\times b\).

Definition 5

(Adapted from  [50]). A map of GSOS laws from \(\rho _{1}\) to \(\rho _{2}\) consists of a natural transformation \(\sigma : \Sigma _{1}^{*} \Longrightarrow \Sigma _{2}^{*}\) subject to the monad laws \(\sigma \circ \eta _{1} = \eta _{2}\) and \(\sigma \circ \mu _{1} = \mu _{2} \circ \Sigma _{2}^{*}\sigma \circ \sigma \) paired with a natural transformation \(b : B_{1} \Longrightarrow B_{2}\) that satisfies the following coherence criterion:

figure w

\(\lrcorner \)

Remark 4

A natural transformation \(\sigma : \Sigma _{1}^{*} \Longrightarrow \Sigma _{2}^{*}\) subject to the monad laws is equivalent to a natural transformation \(t : \Sigma _{1} \Longrightarrow \Sigma _{2}^{*}.\)

Theorem 1

If \(\sigma \) and b constitute a map of GSOS laws, then we get a compiler \(c : A_1 \rightarrow A_2\) and behavior transformation \(d : Z_1 \rightarrow Z_2\) satisfying \(d \circ f_1 = f_2 \circ c : A_1 \rightarrow Z_2\). As bisimilarity is exactly equality in the final coalgebra (i.e. equality under \(f_i : A_{i} \rightarrow Z_{i}\)), c preserves bisimilarity  [50]. If d is a monomorphism (which, under mild conditions, is the case in \(\mathbf {Set}\) if every component of b is a monomorphism), then c also reflects bisimilarity.

What is very important though, is that the well-behavedness properties of the two GSOS laws bestow contextual meaning to bisimilarity. Recall that the gold standard for secure compilation is contextual equivalence (Definition 4), which is precisely what is observable through program contexts. Bisimilarity is generally not the same as contextual equivalence, but we can instead show that in the case of GSOS laws or other forms of distributive laws, bisimilarity defines the upper bound (most fine-grained distinction) of observability up to program contexts. We shall do so abstractly in the next subsections.

4.2 Abstract Program Contexts

The informal notion of a context in a programming language is that of a program with a hole  [31]. Thus contexts are a syntactic construct that models external interactions with a program: a single context is an experiment whose outcome is the evaluation of the subject program plugged in the context.

Naïvely, one may hope to represent contexts by a functor H sending a set of variables X to the set HX of terms in \(\Sigma X\) that may have holes in them. A complication is that contexts may have holes at any depth (i.e. any number of operators may have been applied to a hole), whereas \(\Sigma X\) is the set of terms that have exactly one operator in them, immediately applied to variables. One solution is to think of Y in HY as a set of variables that do not stand for terms, but for contexts. This approach is fine for multi-hole contexts, but if we also want to consider single-hole contexts and a given single-hole context c is not the hole itself, then precisely one variable in c should stand for a single-hole context, and all other variables should stand for terms. Thus, in order to support both single- and multi-hole contexts, we make H a two-argument functor, where \(H_X Y\) is the set of contexts with term variables from X and context variables from Y.

Definition 6

Let \({\mathbb {C}}\) be a distributive category  [14] with products \(\times \), coproducts \(\uplus \), initial object \(\bot \) and terminal object \(\top \), as is the case for \(\mathbf {Set}\). A context functor for a syntax functor , is a functor (with application to (XY) denoted as \(H_X Y\)) such that there exist natural transformations \(\mathrm {hole}: \forall (X, Y) . \top \rightarrow H_X Y\) and \(\mathrm {con}: \forall X . X \times H_X X \rightarrow X \uplus \Sigma X\) making the following diagram commute for all X:

figure z

\(\lrcorner \)

The idea of the transformation \(\mathrm {con}\) is the following: it takes as input a variable \(x \in X\) to be plugged into the hole, and a context \(c \in H_X X\) with one layer of syntax. The functor \(H_X\) is applied again to X rather than Y because x is assumed to have been recursively plugged into the context placeholders \(y \in Y\) already. We then make a case distinction: if c is the hole itself, then \(i_1~x\) is returned. Otherwise, \(i_2~c\) is returned.

Definition 7

Let \(\mathbb {C}\) be a category as in Definition 6 and assume a syntax functor \(\Sigma \) with context functor H. If \(\Sigma \) has an initial algebra \((A,q_A)\) (the set of programs) and \(H_A\) has a strong initial algebra \((C_A,q_{C_A})\)  [23] (the set of contexts), then we define the plugging function as the “strong inductive extension”  [23] of the algebra structure \([\mathrm {id}_A,q_A] \circ \mathrm {con}_{A} : A \times H_A A \rightarrow A\) on A, i.e. as the unique morphism that makes the following diagram commute:

figure ab

\(\lrcorner \)

The above definition of contextual functors is satisfied by both single-hole and multi-hole contexts, the construction of which we discuss below.

Multi-hole Contexts. Given a syntax functor \(\Sigma \), its multi-hole context functor is simply \(H_X Y = \top \uplus \Sigma Y\). The contextual natural transformation \(\mathrm {con}\) is the obvious map that returns the pluggee if the given context is a hole, and otherwise the context itself (which is then a program):

$$\begin{aligned}&\mathrm {con}: \forall X . X \times (\top \uplus \Sigma X) \rightarrow X \uplus \Sigma X \\&\mathrm {con}\circ (\mathrm {id}\times i_1) = i_1 \circ \pi _1 : \forall X . X \times \top \rightarrow X \uplus \Sigma X \\&\mathrm {con}\circ (\mathrm {id}\times i_2) = i_2 \circ \pi _2 : \forall X . X \times \Sigma X \rightarrow X \uplus \Sigma X \end{aligned}$$

The ‘pattern matching’ is justified by distributivity of \({\mathbb {C}}\). For \(\mathrm {hole}= i_1 : \top \rightarrow \top \uplus \Sigma X\), we can see that \(\mathrm {con}\circ (\mathrm {id}\times \mathrm {hole}) = i_1 \circ \pi _1\) as required by the definition of a context functor.

Single-Hole Contexts. It was observed by McBride  [29] that for inductive types, i.e. least fixpoints/initial algebras \(\mu F\) of certain endofunctors F called containers  [4] or simply polynomials, their single-hole contexts are lists of \(\partial F(\mu F)\) where \(\partial F\) is the derivative of FFootnote 4. Derivatives for containers, which were developed by Abbott et al. in [5], enable us to give a categorical interpretation of single-hole contexts as long as the syntax functor \(\Sigma \) is a container.

It would be cumbersome to lay down the entire theory of containers and their derivatives, so we shall instead focus on the more restricted set of Simple Polynomial Functors  [22] (or SPF), used to model both syntax and behavior. Crucially, SPF’s are differentiable and hence compatible with McBride’s construction.

Definition 8

(Simple Polynomial Functors). The collection of SPF is the least set of functors satisfying the following rules:

figure ad

\(\lrcorner \)

We can now define the differentiation action by structural induction. Interestingly, it resembles simple derivatives for polynomial functions.

Definition 9

(SPF derivation rules)

$$\begin{aligned} \partial \mathrm {Id}= \top , \quad \partial K_J = \bot , \quad \partial (G \uplus H) = \partial G \uplus \partial H, \end{aligned}$$
$$\begin{aligned} \partial (G \times H) = (\partial G \times H) \uplus (G \times \partial H), \quad \partial (G \circ H) = (\partial G \circ H) \times \partial H. \end{aligned}$$

\(\lrcorner \)

Example 3

The definition of \(\mathrm {con}\) for single-hole contexts might look a bit cryptic at first sight so we shall use a small example from  [29] to shed some light. In the case of binary trees, locating a hole in a context can be thought of as traversing through a series of nodes, choosing left or right according to the placement of the hole until it is found. At the same time a record of the trees at the non-chosen branches must be kept so that in the end the entire structure can be reproduced.

Now, considering that the set of binary trees is the least fixed point of functor \(\mathtt {\top } \uplus (\mathrm {Id}\times \mathrm {Id})\), then the type of “abstract” choice at each intersection is the functor \(K_{\mathrm {Bool}} \times \mathrm {Id}\), where \(K_{\mathrm {Bool}}\) stands for a choice of left or right and the \(\mathrm {Id}\) part represents the passed structure. Lists of \((K_{\mathrm {Bool}} \times \mathrm {Id})~\mathrm {BinTree}\) are exactly the sort of record we need to keep, i.e. they contain the same information as a tree with a single hole. And indeed \(K_{\mathrm {Bool}} \times \mathrm {Id}\) is (up to natural isomorphism) the derivative of \(\mathtt {\top } \uplus (\mathrm {Id}\times \mathrm {Id})\)! \(\lrcorner \)

Using derivatives we can define the context functor \(H_X Y = \top \uplus ((\partial \Sigma ~X) \times Y)\) for syntax functor \(\Sigma \). Then the initial algebra \(C_A\) of \(H_A\) is indeed \(\mathrm {List}~((\partial \Sigma )~A)\), the set of single-hole contexts for \(A \cong \Sigma A\).

Plugging. Before defining \(\mathrm {con}\), we define an auxiliary function \(\mathrm {conStep} : \partial \Sigma \times \mathrm {Id}\Longrightarrow \Sigma \). We defer the reader to  [29] for the full definition of \(\mathrm {conStep}\), which is inductive on the SPF \(\Sigma \), and shall instead only define the case for coproducts. So, for \(\partial (F \uplus G) = \partial F \uplus \partial G\) we have:

$$\begin{aligned}&\mathrm {conStep}_{F \uplus G} : (\partial F \uplus \partial G) \times \mathrm {Id}\Longrightarrow F \uplus G \\&\mathrm {conStep}_{F \uplus G} \circ (i_1 \times \mathrm {id}) = i_1 \circ \mathrm {conStep}_F : \partial F \times \mathrm {Id}\Longrightarrow F \uplus G \\&\mathrm {conStep}_{F \uplus G} \circ (i_2 \times \mathrm {id}) = i_2 \circ \mathrm {conStep}_G : \partial G \times \mathrm {Id}\Longrightarrow F \uplus G \end{aligned}$$

We may now define \(\mathrm {con}: X \times H_X X \rightarrow X \uplus \Sigma X\) as follows:

$$\begin{aligned}&\mathrm {con}: \forall X . X \times (\top \uplus (\partial \Sigma ~X \times X)) \rightarrow X \uplus \Sigma X \\&\mathrm {con}\circ (\mathrm {id}\times i_1) = i_1 \circ \pi _1 : \forall X . X \times \top \rightarrow X \uplus \Sigma X \\&\mathrm {con}\circ (\mathrm {id}\times i_2) = i_2 \circ \mathrm {conStep}_{\Sigma } \circ \pi _2 : \forall X . X \times (\partial \Sigma ~X \times X) \rightarrow X \uplus \Sigma X \end{aligned}$$

By setting \(\mathrm {hole}= i_1 : \top \rightarrow \top \uplus (\partial \Sigma ~X \times X)\) we can see that \(\mathrm {con}\circ (\mathrm {id}\times \mathrm {hole}) = i_1 \circ \pi _1\) as required by Definition 6.

4.3 Contextual Coclosure

Having established a categorical notion of contexts, we can now move towards formulating contextual categorical arguments about bisimilarity. We assume a context functor H for \(\Sigma \) such that \(H_A\) has strong initial algebra \((C_A, q_{C_A})\) (the object containing all contexts).

First, since we prefer to work in more general categories than just \(\mathbf {Set}\), we will encode relations \(R \subseteq X \times Y\) as spans \(X \xleftarrow {r_1} R \xrightarrow {r_2} Y\). One may wish to consider only spans for which \((r_1, r_2) : R \rightarrow X \times Y\) is a monomorphism, though this is not necessary for our purposes.

We want to reason about contextually closed relations on the set of terms A, which are relations such that \(a_1 \mathrel {R} a_2\) implies for all contexts \(c \in C_A\). Contextual equivalence will typically be defined as the co-closure of equitermination: the greatest contextually closed relation that implies equitermination. For spans, this becomes:

Definition 10

In a category as in Definition 6, a span \(A \xleftarrow {r_1} R \xrightarrow {r_2} A\) is called contextually closed if there is a morphism making the following diagram commute:

figure ah

The contextual co-closure \(A \xleftarrow {\bar{r}_1} \bar{R} \xrightarrow {\bar{r}_2} A\) of an arbitrary span \(A \xleftarrow {r_1} R \xrightarrow {r_2} A\) is the final contextually closed span on A with a span morphism \(\bar{R} \rightarrow R\). \(\lrcorner \)

We call terms bisimilar if the operational semantics \(f : A \rightarrow Z\) assigns them equal behaviors:

Definition 11

We define (strong) bisimilarity \(\sim _{\mathrm {bis}}\) as the pullback of the equality span \((\mathrm {id}_Z, \mathrm {id}_Z) : Z \rightarrow Z \times Z\) along \(f \times f : A \times A \rightarrow Z \times Z\) (if existent).

Theorem 2

Under the assumptions of Definition 7, bisimilarity (if existent) is contextually closed.

Proof

We need to give a morphism of spans from \(C_A \times {(\sim _{\mathrm {bis}})}\) to \((\sim _{\mathrm {bis}})\):

figure ai

By definition of \((\sim _{\mathrm {bis}})\), it suffices to give a morphism of spans to the equality span on Z, i.e. to prove that . To this end, consider the following diagram (parameterized by \(i \in \{1, 2\}\)), in which every polygon is easily seen to commute:

figure ak

The bottom-right square stems from the underlying GSOS law: it is the algebra homomorphism part of the bialgebra morphism between the initial and the final bialgebras. Commutativity of the outer diagram reveals that is, regardless of i, the strong inductive extension of \([\mathrm {id}_Z, q_Z] \circ \mathrm {con}_Z \circ (w \times H_f \mathrm {id}_Z) : (\sim _{\mathrm {bis}})\times H_AZ \rightarrow Z\). Thus, it is independent of i.    \(\square \)

Corollary 1

In \(\mathbf {Set}\), bisimilarity is its own contextual coclosure: .

Corollary 2

In \(\mathbf {Set}\), bisimilarity implies contextual equivalence.Footnote 5

Proof

Bisimilarity implies equitermination. This yields an implication between their coclosures.    \(\square \)

Comparing Corollary 1 to contextual equivalence in Definition 4 reveals their key difference. Contextual equivalence makes minimal assumptions on the underlying observables, which are simply divergence and termination. On the other hand, the contextual coclosure of bisimilarity assumes maximum observability (as dictated by the behavior functor) and in that sense it represents the upper bound of what can be observed through contexts. Consequently, this criterion is useful if the observables adequately capture the threat model, which is true for the examples that follow.

This theorem echoes similar results in the broader study of coalgebraic bisimulation  [11, 41]. There are, however, two differences. The first is that our theorem allows for extra flexibility in the definition of contexts as the theorem is parametric on the context functor. Second, by making the context construction explicit we can directly connect (the contextual coclosure of) bisimilarity to contextual equivalence (Corollary 2) and so have a more convincing argument for using maps of distributive laws as secure compilers.

5 An Extra Register (Part II)

The next step is to define the syntax and behavior natural transformations. The first compiler, , is a very simple mapping of constructors in While to their counterparts. The second natural transformation, , is more complex as it involves an additional layer of syntax in .

Definition 12

(Sandboxing natural transformation). Consider the natural transformation which embeds \(\Sigma \) in . Using PL notation, we define . This yields a monad morphism (Remark 4).

Defining the natural translation between behaviors is a matter of choosing a designated value for the added observable label. The only constraint is that the chosen value has to coincide with the label that the sandbox produces. and are identical so we need a single natural transformation :

While to . We now have the natural translation pairs and , which allows us to check the coherence criterion from Sect. 4.1. We shall be using a graphical notation that provides for a good intuition as to what failure or success of the criterion means. For example, Fig. 3 shows failure of the coherence criterion for the first pair.

The horizontal arrows in the diagram represent the two semantics, \(\rho ^{*}\) and , while the vertical arrows are the two horizontal compositions of the natural translation pair. The top-left node holds an element of \(\Sigma ^{*} (\mathrm {Id}\times B)\), which in this case is an assignment operation. The two rightmost nodes represent behaviors, so the syntactic element is missing from the left side of the transition arrows.

Fig. 3.
figure 3

Failure of the criterion for .

In the upper path, the term is first applied to the GSOS law \(\rho ^{*}\) and the result is then passed to the translation pair, thus producing the designated label 0, typeset in blue for convenience. In the lower path, the term is first applied to the translation and then goes through the target semantics, , where the label v is produced. It is easy to find such an s so that .

While to . The same example is investigated for the second translation pair . Figure 4 shows what happens when we test the same case as before. Applying to is similar to acting twice. The innermost transition is the intermediate step and as it only appears in the bottom path it is typeset in red. This time the diagram commutes as the label produced in the inner layer, , is effectively erased by the sandboxing rules of .

Fig. 4.
figure 4

The coherence criterion for . (Color figure online)

An endo-compiler for . If is the set of closed terms for , the compiler , which “escapes” terms from their sandboxes can be elegantly modeled using category theory. As before, it is not possible to express it using a simple natural transformation . We can, however, use the free pointed endofunctor  [28] over , . What we want is to map non-sandboxed terms to themselves and lift the extra layer of syntax from sandboxed terms. Intuitively, for a set of variables X, is one layer of syntax “populated” with elements of X. If is the union of with the set of variables X, lifting the sandboxing layer is mapping the X in to the left of and the rest to themselves at the right.

Fig. 5.
figure 5

Failure of the criterion for \(\sigma _{u}\).

This is obviously not a secure compiler as it allows discerning previously indistinguishable programs. As we can see in Fig. 5, the coherence criterion fails in the expected manner.

6 State Mismatch

Having established our categorical foundations, we shall henceforth focus on examples. The first one involves a compiler where the target machine is not necessarily more powerful than the source machine, but the target value primitives are not isomorphic to the ones used in the source. This is a well-documented problem  [37], which has led to failure of full abstraction before  [8, 18, 25].

For example, we can repeat the development of While except we substitute natural numbers with integers. We call this new version \( While _{{\mathbb {Z}}}\).

  • \({\langle }expr{\rangle } \, {::}\,= \,\texttt {lit} \,\mathbb {Z}\, |\, \texttt {var}\, \mathbb {N} \,| \,{\langle }expr{\rangle }\, {\langle }bin{\rangle }\, {\langle }expr{\rangle } \,| \,{\langle }un{\rangle } \,{\langle }expr{\rangle }\)

The behavior functor also differs in that the store type S is substituted with \(S_{{\mathbb {Z}}}\), the set of lists of integers. We can define the behavioral natural transformation \(b_{\mathbb {Z}}: B \Longrightarrow B_{\mathbb {Z}}\) as the best “approximation” between the two behaviors. In \(\mathbf {Set}\):

Where \(\text {to}\mathbb {N}\) replaces all negative numbers in the store with 0 and \(\text {to}\mathbb {Z}\) typecasts S to \(S_{{\mathbb {Z}}}\). It is easy to see that the identity compiler from While to \( While _{{\mathbb {Z}}}\) is not fully abstract. For example, the expressions 0 and \(\texttt {min}(\texttt {var}[0],0)\) are identical in While but can be distinguished in \( While _{{\mathbb {Z}}}\) (if \(\mathtt {var}[0]\) is negative). This is reflected in the coherence criterion diagram for the identity compiler in Fig. 6, when initiating the store with a negative integer.

Fig. 6.
figure 6

Failure of the criterion for \((\mathrm {id}, b_{{\mathbb {Z}}})\).

The solution is to create a special environment where \( While _{{\mathbb {Z}}}\) forgets about negative integers, in essence copying what \(b_{{\mathbb {Z}}}\) does on the store. This is a special kind of sandbox, written \(\langle \_\rangle \), for which we introduce the following rules:

figure ca
Fig. 7.
figure 7

The coherence criterion for \((\sigma _{{\mathbb {Z}}}, b_{{\mathbb {Z}}})\).

We may now repeat the construction from Definition 12 to define the compiler \(\sigma _{{\mathbb {Z}}}\). We can easily verify that the pair \((\sigma _{{\mathbb {Z}}}, b_{{\mathbb {Z}}})\) constitutes a map of distributive laws. For instance, Fig. 7 demonstrates how the previous failing case now works under \((\sigma _{{\mathbb {Z}}}, b_{{\mathbb {Z}}})\).

7 Control Flow

Many low-level languages support unrestricted control flow in the form of jumping or branching to an address. On the other hand, control flow in high-level languages is usually restricted (think if-statements or function calls). A compiler from the high-level to the low-level might be insecure as it exposes source-level programs to illicit control flow. This is another important and well-documented example of failure of full abstraction  [3, 8, 33, 37].

Fig. 8.
figure 8

Semantics of the \( Low \) language. Elements in square brackets are optional.

We introduce low-level language \( Low \), the programs of which are non-empty lists of instructions. \( Low \) differs significantly from While and its derivatives in both syntax and semantics. For the syntax, we define the set of instructions \(\langle inst \rangle \) and set of programs \(\langle asm \rangle \).

  • \({\langle }inst{\rangle } \,{::}\,=\, \texttt {nop} \,| \,\texttt {stop} \,| \,\texttt {assign}\,\, \mathbb {N} \,{\langle }expr{\rangle }\, |\, \texttt {br} \,{\langle }expr{\rangle }\, \mathbb {Z}\)

  • \({\langle }asm{\rangle } \,{::} \, = \,{\langle }inst{\rangle } \, |\, {\langle }inst{\rangle }\, \texttt {;;}\, {\langle }asm{\rangle }\)

Instruction nop is the no-operation, stop halts execution and assign is analogous to the assignment operation in While. The br instruction is what really defines \( Low \), as it stands for bidirectional relative branching.

Semantics of Low. Figure 8 shows the operational semantics of \( Low \). The execution state of a running program consists of a run-time store and the program counter register \(\texttt {PC} \in \mathbb {Z}\) that points at the instruction being processed. If the program counter is zero, the leftmost instruction is executed. If the program counter is greater than zero, then the current instruction is further to the right. Otherwise, the program counter is out-of-bounds and execution stops. The categorical interpretation suggests a GSOS law \(\rho _{L}\) of syntax functor \(\Sigma _{L} X = \text {inst} \uplus (\text {inst} \times X)\) over behavior functor .

An Insecure Compiler. This time we start with the behavioral translation, which is less obvious as we have to go from to . The increased arity in \(B_{L}\) poses an interesting question as to what the program counter should mean in While. It makes sense to consider the program counter in While as zero since a program in While is treated uniformly as a single statement.

When it comes to translating terms, a typical compiler from While to \( Low \) would untangle the tree-like structure of While and convert it to a list of \( Low \) instructions. For while statements, the compiler would use branching to simulate looping in the low-level.

Example 4

Let us look at a simple case of a loop. The While program

is compiled to

\(\lrcorner \)

This compiler, called \(c_{L}\), cannot be defined in terms of a natural transformation \(\Sigma \Longrightarrow \Sigma ^{*}_{L}\) as per Remark 4, but it is inductive on the terms of the source language. In this case we can directly compare the two operational models (where \(h : A \rightarrow BA\)) and and notice that is not a coalgebra homomorphism (Fig. 9). The key is that the program counter in \( Low \) allows for finer observations on programs. Take for example the case for \(\texttt {while}~(\texttt {lit}~0)~(0 := \texttt {lit}~0)\), where the loop is always skipped. In \( Low \), we can still access the loop body by simply pointing the program counter to it. This is a realistic attack scenario because \( Low \) allows manipulation of the program counter via the br instruction.

Fig. 9.
figure 9

\(c_{L}\) is not a coalgebra homomorphism.

Solution. By comparing the semantics between While in Fig. 1 and \( Low \) in Fig. 8 we find major differences. The first one is the reliance of \( Low \) to a program counter which keeps track of execution, whereas While executes statements from left to right. Second, the sequencing rule in While dictates that statements are removed from the program stateFootnote 6 upon completion. On the other hand, \( Low \) keeps the program state intact at all times. Finally, there is a stark contrast between the two languages in the way they handle while loops.

To address the above issues we introduce a new sequencing primitive ;;\(_{c}\) and a new looping primitive loop for \( Low \), which prohibit illicit control flow and properly propagate the internal state. Furthermore, we change the semantics of the singleton assign instruction so that it mirrors the peculiarity of its While counterpart. The additions can be found in Fig. 10.

Fig. 10.
figure 10

Secure primitives for the \( Low \) language.

We may now define the simple “embedding” natural transformation \(\sigma _{E} : \Sigma \Longrightarrow \Sigma _{L}\), which maps skip to stop, assignments to assign, sequencing to ;;\(_{c}\) and while to loop.

Figure 11 shows success of the coherence criterion for the while case. Since the diagram commutes for all cases, \((\sigma _{E},b_{E})\) is a map of GSOS laws between While and the secure version of \( Low \). This guarantees that, remarkably, despite the presence of branching, a low-level attacker cannot illicitly access code that is unreachable on the high-level. Regardless, the solution is a bit contrived in that the new \( Low \) primitives essentially copy what While does. This is partly because the above are complex issues involving radically different languages but also due to the current limitations of the underlying theory. We elaborate further on said limitations, as well as advantages and future improvements, at Sect. 9.

Fig. 11.
figure 11

The coherence criterion for \((\sigma _{E},b_{L})\).

8 Local State Encapsulation

High-level programming language abstractions often involve some sort of private state space that is protected from other objects. Basic examples include functions with local variables and objects with private members. Low-level languages do not offer such abstractions but when it comes to secure architectures, there is some type of hardware sandboxingFootnote 7 to facilitate the need for local state encapsulation. Compilation schemes that respect confidentiality properties have been a central subject in secure compilation work  [8, 18, 37, 46], dating all the way back to Abadi’s seminal paper  [1].

In this example we will explore how local state encapsulation fails due to lack of stack clearing  [44, 46]. We begin by extending While to support blocks which have their own private state, thus introducing \( While _{B}\). More precisely, we add the frame and return commands that denote the beginning and end of a new block. We also have to modify the original behavior functor B to act on a stack of stores by simply specifying , where [S] denotes a list of stores. For reasons that will become apparent later on, we shall henceforth consider stores of a certain length, say L.

Fig. 12.
figure 12

Semantics of the \( While _{B}\) language.

The semantics for \( While _{B}\) can be found in Fig. 12. Command frame allocates a new private store by appending one to the stack of stores while return pops the top frame from the stack. This built-in, automatic (de)allocation of frames guarantees that there are no traces of activity, in the form of stored values, of past blocks. The rest of the semantics are similar to While, only now evaluating an expression and updating the state acts on a stack of stores instead of a single, infinite store and var expressions act on the active, topmost frame.

Fig. 13.
figure 13

Semantics of the \( Stack \) language.

Low-Level Stack. In typical low-level instruction sets like the Intel x86  [21] or MIPS  [30] there is a single, continuous memory which is partitioned in frames via processor registers. Figure 13 shows the semantics of \( Stack \), a variant of \( While _{B}\) with the same syntax that incorporates a simple low-level stack. The difference is that the stack frames are all sized L, the same size as each individual store in \( While _{B}\), so at each frame and return we need only increment and decrement the stack pointer. The presence of the stack pointer, which is essentially a natural number, means that the behavior of \( Stack \) is . The new evaluation function, evalSP, works similarly to eval in Definition 1, except for var l expressions that dereference values at offset \(l + L * sp\).

An Insecure Compiler. \( While _{B}\) and \( Stack \) share the same syntax so we only need a behavioral translation, which is all about relating the two different notions of stack. We thus define natural transformation \(b_B : B_{B} \Longrightarrow B_{S}\):

figure cl

We “divide” an infinite list by the number of stack frames, feed the result to the behavior function f and join (“flatten”) it back together while keeping the original part of the infinite list which extends beyond the active stack intact. Note that in the case of the frame command f adds a new frame to the list of stores. The problem is that in \( While _{B}\) the new frame is initialized to 0 in contrast to \( Stack \) where frame does not initialize new frames. This leads to a failure of the coherence criterion for \((id, b_{B})\) as we can see in Fig. 14.

Failure of the criterion is meaningful in that it underlines key problems of this compiler which can be exploited by a low-level attacker. First, the low-level calling convention indirectly allows terms to access expired stack frames. Second, violating the assumption in \( While _{B}\) that new frames are properly initialized breaks behavioral equivalence. For example, programs \(a \triangleq \texttt {frame}~;~0~\texttt {:=}~\texttt {var}[0] + 1\) and \(b \triangleq \texttt {frame}~;~0~\texttt {:=}~1\) behave identically in \( While _{B}\) but not in \( Stack \).

Fig. 14.
figure 14

Failure of the criterion for \((id, b_{B})\).

Solution. It is clear that the lack of stack frame initialization in \( Stack \) is the lead cause of failure so we introduce the following fix in the frame rule.

figure cm

The idea behind the new frame rule is that the L-sized block in position sp, which is going to be the new stack frame, has all its values replaced by zeroes. As we can see in Fig. 15, the coherence criterion is now satisfied and the example described earlier no longer works.

Fig. 15.
figure 15

The coherence criterion for \((id, b_{B})\) under the new frame rule.

9 Discussion and Future Work

On Mathematical Operational Semantics. The cases we covered in this paper are presented using Plotkin’s Structural Operational Semantics  [38], yet their foundations are deeply categorical  [48]. Consequently, for one to use the methods presented in this paper, the semantics involved must fall within the framework of distributive laws, the generality of which has been explored in the past  [47, 50], albeit not exhaustively. To the best of our knowledge, Sect. 7 and Sect. 8 show the first instances of distributive laws as low-level machines.

Bialgebraic semantics are well-behaved in that bisimilarity is a congruence  [19]. We used that to show that two bisimilar programs will remain bisimilar irrespective of the context they are plugged into, which is not the same as contextual equivalence. However, full abstraction is but one of a set of proposed characterizations of secure compilation  [2, 36] and the key intuition is that our framework is suitable as long as bisimilarity adequately captures the threat model. While this is the case in the examples, we can imagine situations where the threat model is weaker than the one implied by bisimilarity.

For example, language in Sect. 3 includes labels in its transition structure and the underlying model is accurate in that terms can manipulate said labels. However, if we were to remove obs statements from the syntax, the threat model becomes weaker than the one implied by bisimilarity. Similarly in Sect. 7 and \( Low \), we could remove the implicit assumption that the program counter can be manipulated by a low-level attacker.

This issue can be classified as part of the broader effort towards coalgebraic weak bisimilarity, a hard problem which has been an object of intense, ongoing scientific research  [12, 13, 20, 39, 42, 43]. Of particular interest is the work by Abou-Saleh and Pattinson  [6, 7] about bialgebraic semantics, where they use techniques introduced in  [20] to obtain a more appropriate semantic domain for effectful languages as a final coalgebra in the Kleisli category of a suitable monad. This method is thus a promising avenue towards exploring weaker equivalences in bialgebraic semantics, as long as these can be described by a monad.

On Maps of Distributive Laws. Maps of distributive laws were first mentioned by Power and Watanabe  [40], then elaborated as Well-behaved translations by Watanabe  [50] and more recently by Klin and Nachyla  [27]. Despite the few examples presented in  [27, 50], this paper is the first major attempt towards applying the theory behind maps of distributive laws in a concrete problem, let alone in secure compilation.

From a theoretical standpoint, maps of distributive laws have remained largely the same since their introduction. This comes despite the interesting developments discussed in Sect. 9 regarding distributive laws, which of course are the subjects of maps of distributive laws. We speculate the existence of Kleisli maps of distributive laws that guarantee preservation of equivalences weaker than bisimilarity. We plan to develop this notion and explore its applicability in future work.

Conclusion. It is evident that the systematic approach presented in this work may markedly streamline proofs for compiler security as it involves a single, simple coherence criterion. Explicit reasoning about program contexts is no longer necessary, but that does not mean that contexts are irrelevant. On the contrary, the guarantees are implicitly contextual due to the well-behavedness of the semantics. Finally, while the overall usability and eventual success of our method remains a question mark as it depends on the expressiveness of the threat model, the body of work in coalgebraic weak bisimilarity and distributive laws in Kleisli categories suggests that there are many promising avenues for further progress.