We present ParTypes, a type discipline for parallel programs. The model we have in mind comprises a fixed number of processes running in parallel and communicating via collective operations or point-to-point synchronous message exchanges. A type describes a protocol to be followed by each processes in a given program. We present the type theory, a core imperative programming language and its operational semantics, and prove that type checking is decidable (up to decidability of semantic entailment) and that well-typed programs do not deadlock and always terminate. The article is accompanied by a large number of examples drawn from the literature on parallel programming.
1 Introduction
Parallel programming allows tackling problems so large that it would otherwise be impractical to solve on a single computer. It allows to shorten time to completion with the associated cost savings. And parallel hardware is today present on most machines, from a few cores in smartphones to hundreds of thousands in large parallel supercomputers.
When compared to sequential applications, the difficulty of developing parallel applications is exacerbated by the fact that different processes running in parallel synchronise in order to exchange data. It is not difficult to write a program that causes processes to exchange data of unexpected sorts or lengths or that blocks indefinitely waiting for messages that will never arrive. Such faults can become quite costly, if, e.g., the application is running on a large supercomputer. Verifying these sort of properties for parallel programming is far from trivial. The most common techniques include runtime verification [30, 58, 65, 77], symbolic execution and model checking [25, 30, 58, 68, 72].
Runtime verification can never guarantee the absence of faults. In addition, the testing process can become daunting due to the difficulty in producing meaningful tests, the time to run the whole test suite, and the need to run the test suite on hardware similar to that where the final application will eventually be deployed. On the other hand, model checking approaches frequently stumble upon the problem of scalability, given that the search space grows exponentially with the number of processes. It is often the case that the verification of real applications limits the number of processes to less than a dozen [69].
In this article, we concentrate on a model of parallel programming featuring a fixed number of processes, each with its local memory, running its own program and communicating exclusively by point-to-point synchronous message exchanges or by synchronising via collective operations, such as broadcast or reduce. ParTypes is a type discipline for parallel programs; its programme is as follows: Types describe the interactive behaviour of programs. They will never assert that a given program effectively multiplies two matrices, but they will tell the pattern of interaction between the processes involved in a matrix multiplication program. Programs that conform to a type are guaranteed not enter a deadlocked situation where, e.g., one process is trying to broadcast and another to send a message, or where each process is trying to receive a message from the next, in a ring topology. We show that there is an effective procedure to check whether a program conforms to a type, hence that a type checker can be built for an imperative language that uses ParTypes as the type language.
A cornerstone of our approach is that all processes in a parallel program must share the same type. One can write a distinct program for each individual process, a unique program to be shared by all processes, or all the intermediate possibilities (such as programs for the even processes and for the odd processes, or program for one distinguished process and another for all others). In any case, the types for all programs must agree. Then, all processes are forced to comply to a single, global protocol (in the form of a type), forcing processes to synchronise and interact according to the protocol, thus precluding deadlocks by typing [39].
ParTypes is heavily inspired by MPI [17], covering point-to-point communication (with primitives \(\mathsf {{\color{#800066}{send}}}\) and \(\mathsf {{\color{#800066}{receive}}}\)) and well as different sorts of collective communication (such as, \(\mathsf {{\color{#800066}{broadcast}}}\) and \(\mathsf {{\color{#800066}{reduce}}}\)). Primitive data include integer scalars, arrays and pairs. All process interaction is blocking (as opposed to buffered, where send operations are non-blocking). We provide an operational semantics according to the standard [17] as we interpret it, and propose a type system for the core that we have selected. We cover only a very small fragment of MPI; the discussion (Section 8) addresses some limitations. We introduce ParTypes progressively, to a point where it can be used to model a large class of parallel programs found in the literature [19, 27, 57, 61]. ParTypes allow writing the same program for all processes (as in MPI) as well as different program for different processes (unlike MPI).
We use \(\mathsf {{\color{#800066}{violet}}}\) to write code and boldface \({\bf \mathsf {{\color{#5475D9}{blue}}}}\) to describe types. The type language includes point-to-point communications such as
detailing an exchange of an integer value from process 1 to process 2. Types depend on terms, so that we often see dark violet bits in types, as in the \({\bf \mathsf {{\color{#5475D9}{message}}}}\) type above. Collective operation types such as
characterise a program where all processes propose an integer value and process 0 receives the sum of them all. ParTypes is a dependently-typed language along the lines of DML [81]. Dependent types allow for protocols to depend on values transmitted before. If \({\color{#800066}{{\it size}}}\) denotes the number of processes, then type
describes a program whose process 0 proposes a process number (the \({\color{#800066}{{\it sink}}}\)) and subsequently all processes agree to reduce on this process. The number of a process, an integer value between 0 and \({\color{#800066}{{\it size}}}- 1\) is often called the rank of the process. The type of identifier \({\color{#800066}{{\it sink}}}\) is of a refined datatype denoting an integer value greater or equal to 0 and smaller than \({\color{#800066}{{\it size}}}\).
The base type language is completed with a sequential composition\({\color{#5475D9}{T;U}}\) operator; type \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) that describes processes without any interaction, primitive recursion\({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{T}}\), and a form of collective conditional\({\bf \mathsf {{\color{#5475D9}{ifc}}}}~{\color{#800066}{p}}~{\bf \mathsf {{\color{#5475D9}{then}}}}~{\color{#5475D9}{T}}~{\bf \mathsf {{\color{#5475D9}{else}}}}~{\color{#5475D9}{U}}\) allowing all processes in a program to take an identical decision based solely on information \({\color{#800066}{p}}\) exchanged in collective operations (and contained in types). The base language is then extended with a number of type constructors in order to accommodate arrays and array \({\bf \mathsf {{\color{#5475D9}{scatter}}}}\)/\({\bf \mathsf {{\color{#5475D9}{gather}}}}\) operations, further collective operations such as \({\bf \mathsf {{\color{#5475D9}{allreduce}}}}\) and \({\bf \mathsf {{\color{#5475D9}{allgather}}}}\), and pairs and \({\bf \mathsf {{\color{#5475D9}{allreducemaxloc}}}}\) (handy for the parallel implementation of Gaussian elimination [61]).
Type equivalence is at the core of ParTypes. It includes the monoidal rules for sequential composition and \({\bf \mathsf {{\color{#5475D9}{skip}}}}\), expansion and elimination of primitive recursion, the elimination of the collective conditional \({\bf \mathsf {{\color{#5475D9}{ifc}}}}~{\color{#800066}{p}}~{\bf \mathsf {{\color{#5475D9}{then}}}}~{\color{#5475D9}{T}}~{\bf \mathsf {{\color{#5475D9}{else}}}}~{\color{#5475D9}{U}}\) when \({\color{#800066}{p}}\) can be shown to be true or false. But perhaps the most vital rule is the projection rule, allowing to eliminate \({\bf \mathsf {{\color{#5475D9}{message}}}}\) types. The rule allows aligning the types of all processes on what concerns point-to-point operations even for processes that are not involved in the communication.
Imagine a program with three processes where process 0 sends an integer message to process 2, and process 1 does not engage in any interactive behaviour. The type of both processes 0 and 2 is \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{2}}\). But the type of process 1 must be aligned with these two. Because the process does not engage in any interaction, one of its many types is \({\bf \mathsf {{\color{#5475D9}{skip}}}}\). But because 1 is different both from 0 and from 2, type \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) is equivalent to \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{2}}\) for process 1. In this way, the three processes all have the same type, thus guaranteeing the absence of deadlocks.
This article consolidates the theory of ParTypes introduced in López et al. [46], elaborates on the decidability of type equivalence and type checking, introduces strong normalization for typed processes, adds examples and proofs, and extends the type language with a large number of new type constructors. We do not cover the verification of C+MPI programs using the VCC tool [9, 50]. The detailed differences with respect to López et al. [46] are as follows:
•
We introduce the complete set of rules for program and type formation and for type equivalence (sections 2 and 3);
•
We include a new strong normalization theorem (Section 4);
•
We introduce an algorithm for type checking programs and prove its correctness against the declarative system (Section 5);
•
We extend the base type language with new datatypes and new constructors for collective operations (Section 6).
Outline. The next section introduces a core programming language for parallel programs and its operational semantics. Section 3 presents the notion of types as protocols to discipline process interaction, type equivalence and program typing. Section 4 proves progress, preservation, and strong normalisation for well typed programs. Section 5 presents an algorithmic typing system and proves it sound with respect to the declarative system introduced in Section 3. Section 6 discusses a number of extensions to the type language towards a more encompassing modelling of extant protocols. Section 7 discusses related work and Section 8 concludes this article.
2 A Message Passing Parallel Imperative Language
This section introduces the syntax and the operational semantics of programs, starting with a few illustrative examples.
2.1 Examples of Common Parallel Programs
Four examples highlight the salient features of ParTypes. To concentrate on the interactive behaviour, we make use of calls to undefined functions (such as \(\texttt {read_int()}\)). Finally, we use a wildcard (\(\texttt {_}\)) in place of a variable when its value is not important.
The Monte Carlo Algorithm. We start with the Monte Carlo method to estimate the value of \(\pi\) [57]. Suppose we toss darts randomly at a circular target inscribed on a square board. If the points that are hit by the darts are uniformly distributed (and darts always hit the square board), then the number of darts that hit inside the circle should approximately satisfy the equation
\begin{equation*} \frac{\text{darts in circle}}{\text{total number of tosses}} = \frac{\pi }{4} \end{equation*}
since the ratio of the area of the circle to that of the square is \(\pi /4\).
Estimating the value of \(\pi\) can be easily parallelised by distributing dart tossing among a given number of parallel processes. Our implementation works by distinguishing one process, the root process (usually at rank 0), to \({\mathsf {\color{#800066} {broadcast}}}\) the number of darts each process is supposed to toss. Each individual process then computes the number of hits. A \({\mathsf {\color{#800066} {reduce}}}\) operation collects the sum of all such hits at the root process. To optimize resources, the root process also computes its share. Predefined variable \({\mathsf {\color{#800066} {size}}}\) denotes the number of processes. We start with the code for the root process, making use of functions to read an integer and to compute the number of hits given a number of darts.
The root process reads the number of darts per process and broadcasts it to all processes, including itself (line 1). The process then prepares a reference —\({\mathsf {darts_in_circle}}\)— to receive the contributions from all processes, including itself (line 2). The \({\mathsf {\color{#800066} {reduce}}}\) operation stores in the reference the sum of the contributions; line 4 prints the approximate value of \(\pi\). The number \({\mathsf {0}}\) in both the \({\mathsf {\color{#800066} {broadcast}}}\) and the \({\mathsf {\color{#800066} {receive}}}\) operations denote the rank of the process that distributes and collects data (the root process).
The code for the remaining processes is somewhat simpler: each process first receives the number of darts to toss, stores it in variable \({\mathsf {\color{#800066} {darts}}}\) (line 1) and then computes and propose the number of hits (line 2). ParTypes (as MPI) provides one only broadcast operation for all processes, so that the non-sending partners must provide a mock value (\({\mathsf {\color{#800066} {-1}}}\) in this case) in the place where the root process provides a concrete value.
Similarly, there is one only \({\mathsf {\color{#800066} {reduce}}}\) operation for all processes, hence the mock \({\mathsf {\color{#800066} {mkref 0}}}\) in line 2. The operational semantics (Section 2.3) will not evaluate the last expressions in both \({\mathsf {\color{#800066} {broadcast}}}\) and \({\mathsf {\color{#800066} {reduce}}}\) operations on non-root processes. By not distinguishing the root process from the remaining processes in collective operations, this unusual syntax and behaviour (standard in MPI) allows the same source code to be run on all processes.
Combining the two snippets of code above, and taking advantage of the predefined variable \({\mathsf {\color{#800066} {myrank}}}\) denoting the rank of a process, we obtain code that can be run on any process:
In this case, all processes allocate memory in line 2, but only the root process actually uses it,
Message Exchange on a Ring Topology. This simple example illustrates the exchange of messages on a topology where all processes receive a value from their left neighbour and send another value to their right neighbour. The process at the “left” of rank \({\mathsf {0}}\) is rank \({\mathsf {\color{#800066} {size}-1}}\). The example computes the factorial of \({\mathsf {\color{#800066} {size}}}\). The invariant is that the process at rank i receives the factorial of i from its left neighbour and sends to its right neighbour the factorial of \(i+1\). At the end, process at rank 0 receives the factorial of \({\mathsf {\color{#800066} {size}}}\) from process at rank \({\mathsf {\color{#800066} {size}-1}}\).
Our implementation distinguishes three processes: process at rank 0 (the root), processes between rank 1 and \({\mathsf {\color{#800066} {size}-2}}\), and process at rank \({\mathsf {\color{#800066} {size}-1}}\). The root process starts the computation by sending number 1 (the factorial of its rank) to its right neighbour. Then, the process at rank i (with i\({\mathsf {\color{#800066} {\lt size}-1}}\)) receives the factorial of i from its left neighbour, and sends the value the factorial of \(i+1\) to its right neighbour. Finally, the last process (at rank \({\mathsf {\color{#800066} {size}-1}}\)), receives the factorial of \({\mathsf {\color{#800066} {size}-1}}\) from its left neighbour, and sends the factorial of \({\mathsf {\color{#800066} {size}}}\) to the root that prints it. The source code for the root process is as follows:
The source code for processes at intermediate ranks is:
and source code for the last process is as follows:
The first argument to a \({\mathsf {\color{#800066} {send}}}\) or \({\mathsf {\color{#800066} {receive}}}\) operation is the target rank, the second is the integer to be sent or the reference on which to store the value received. Notice that the communication pattern for the root process is different from that of the other processes: it first sends to its right neighbour and then receives from its left neighbour, while all other processes first receive from their left neighbours and then send to their right neighbours (with “left” and “right” understood as in a ring topology). This asymmetry is sufficient to break the communication deadly embrace. Otherwise, the complete program would run into a deadlock with all processes blocked at either sending or receiving operations.
ParTypes also allows for using the same source code for all processes, as opposed to different source code for different (classes of) processes. Next, we illustrate how to combine the source code of all processes but the root, making use of the modulus operation. In fact, all processes with \({\mathsf {\color{#800066} {0 \lt myrank \lt size}-1}}\) exhibit the same behaviour.
Lastly, we show how to combine the code in a single program. The behaviour of the root process is distinguished from all others by checking the value of \({\mathsf {\color{#800066} {myrank}}}\).
Fig. 1.
Odd-Even Sort. Our next example is the parallel transposition sorting algorithm [57]. The algorithm consists of a sequence of odd-even phases. Given an array a to sort, during the even phases compare-swap operations are executed on the following pairs
Figure 1 shows how the algorithm proceeds in the particular case when the length of the array matches the number of processes. For simplicity, we assume \({\mathsf {\color{#800066} {size}}}\) to be even. It should be clear that distinct source code must be run at rank \({\mathsf {0}}\), ranks \({\mathsf {1}}\) to \({\mathsf {\color{#800066} {size}-2}}\) and at rank \({\mathsf {\color{#800066} {size}-1}}\). By taking advantage of the distinguished variable \({\mathsf {\color{#800066} {myrank}}}\) we can write a single expression that runs on all processors.
The initial key of each process is obtained via function \({\mathsf {read_int()}}\) (line 1). Line 2 introduces a downwards loop controlling the number of phases. At even phases, even processes exchange keys with the right process (line 5); the smaller key is retained by the process at the left (line 7) and the larger by the process at the right (line 9). At odd phases, the same procedure, this time for odd processes: keys are exchanged with the right process (line 13) and the smaller key retained by the left process (line 14). Because we are assuming that \({\mathsf {\color{#800066} {size}}}\) is even, the last process is of odd rank and, therefore, does not have a right neighbour to exchange keys with. The conditional at line 11 controls this situation.
Lines 3 and 4 exhibit two different flavours of the conditional instruction: \({\mathsf {\color{#800066} {ifc}}}\) stands for a conditional where all processes decide equally (the \({\mathsf {{c}}}\) is for collective); \({\mathsf {\color{#800066} {if}}}\) is a conventional conditional. The conventional conditional is evaluated locally, at the process level, and is independent of the remaining processes. The collective conditional is performed by all processes at the same time so that they all decide equally on the phase.
The n-Body Problem. Our last example looks at another classical problem of parallel programming, the n-body problem, which computes the trajectories of n bodies that influence each other through gravitational forces. The algorithm computes the forces between all pairs of bodies, applying a pipeline technique to distribute and balance the work on a parallel architecture [28, 29, 57]. It then determines the bodies’ positions.
The overall idea of the algorithm is as follows: each process obtains the position and velocity of a particle. Then, process rank 0 broadcasts the number of iterations for the simulation. Thereafter, each process enters a loop that computes that many discrete steps. In each iteration, the algorithm computes the forces between all pairs of particles by having each process computing the forces between its particle and those from the neighbour processes. Towards this end, the algorithm relies on a ring topology where each process passes one particle’s data to the right process and receives a new particle’s data from the left. Then it computes the forces against the received particle. After \({\mathsf {\color{#800066} {size}-1}}\) steps all processes have visited all particles. Then, each process computes the position of its particle. A more realistic example has each process manipulating an array of particles, a notion we introduce in Section 6.1. Economising on primitive types, we use integer rather than floating point values.
Our implementation builds on a ring topology. We distinguish the root process (process \({\mathsf {0}}\)) from other processes (processes \({\mathsf {1}}\) to \({\mathsf {\color{#800066} {size}-1}}\)). Once again, taking advantage of variable \({\mathsf {\color{#800066} {myrank}}}\), we write the behaviour of all processes with a single expressions.
Process \({\mathsf {0}}\) reads and broadcasts the number of iterations (line 1). Each process then reads the particle’s position and velocity (line 2) and embarks on a refinement loop (lines 7–21). At each step of the loop, process \({\mathsf {0}}\) sends and receives (in this order) messages with particle’s data (lines 10–13). All other processes invert message passing: they receive first and send then (line 15–18). Notice the modulo-\({\mathsf {\color{#800066} {size}}}\) arithmetic so that process \({\mathsf {\color{#800066} {size}-1}}\) correctly communicates with process \({\mathsf {0}}\).
Fig. 2.
2.2 The Syntax of Programs
Figure 2 summarises the syntax of programs. Let \({\color{#800066}{r}}\), \({\color{#800066}{x}}\), \({\color{#800066}{y}}\), \({\color{#800066}{z}}\) range over variables (a countable set) and let \({\color{#800066}{k}}, {\color{#800066}{l}}, {\color{#800066}{m}}, {\color{#800066}{n}}, {\color{#800066}{o}}\) range over integer literals. Let \({\color{#800066}{i}}, {\color{#800066}{j}}\) range over index terms, which in addition to variables and literals, further include integer arithmetic operations generically denoted as \({\color{#800066}{i\oplus j}}\) (such as, \(+\), \(-\), \(+\), \(/\), \(\%\)). Index terms also comprise the standard operations on references: reference creation, \({\color{#800066}{\mathsf {{\color{#800066}{mkref}}}~{i}}}\), dereference, \({\color{#800066}{!i}}\), and update, \({\color{#800066}{i:=j}}\). Rather than introducing a new syntactic category for reference identifiers we use variables, for this simplifies the theory. We nevertheless use identifier \({\color{#800066}{r}}\) to refer to a (variable that denotes a) reference.
Let \({\color{#800066}{p}}\), \({\color{#800066}{q}}\) range over propositions which include the standard boolean operations generically denoted by \({\color{#800066}{p\bigcirc\!\!\!\!\!\!\!\wedge \ q}}\) (such as \(||\) and \(\&\&\)), equality and relational integer operations generically denoted by \({\color{#800066}{i \bigcirc\!\!\!\!\!\!\!< j}}\) (such as \(\gt\), \(\lt\), \(\le\), \(\ge\), \(=\), \(\ne\)). Rather than introducing literals for Boolean values, we distinguish two closed propositions, \({\color{#800066}{p}}\) and \({\color{#800066}{q}}\), such that \({\color{#800066}{p}}\) can be proved to be true and \({\color{#800066}{q}}\) cannot be proved to be true. We call \(\mathsf {{\color{#800066}{true}}}\) and \(\mathsf {{\color{#800066}{false}}}\) to such propostions and denote them collectively by \({\color{#800066}{b}}\). The details are in Section 3.2. Let \({\color{#5475D9}{D}}\), \({\color{#5475D9}{E}}\) range over datatypes which include \({\bf \mathsf {{\color{#5475D9}{int}}}}\) and refinement datatypes \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{D} \mid }}~{\color{#800066}{p}}{\color{#5475D9}{\rbrace }}\). To handle references, datatypes include a \({\color{#5475D9}{{D}~{\bf \mathsf {{\color{#5475D9}{ref}}}}}}\) constructor. There are two distinguished variables: \({\color{#800066}{{\it size}}}\), denoting the number of parallel processes and \({\color{#800066}{{\it myrank}}}\), denoting the rank number of a given process. Both variables are 0-based, so that process ranks range from 0 to \({\color{#800066}{{\it size}}}-1\).
Let \({\color{#800066}{e}}\), \({\color{#800066}{f}}\) range over expressions. \(\mathsf {{\color{#800066}{skip}}}\) denotes the terminated expression. The \(\mathsf {{\color{#800066}{send}}}\) and \(\mathsf {{\color{#800066}{receive}}}\) operations both expect two index terms: the first represents the target or the source rank, respectively; the second the value to be sent or the reference where to store the value received, again respectively. The target and the source ranks must be different from \({\color{#800066}{{\it myrank}}}\) so that processes do not deadlock when trying to send or receive a message to or from itself. The \(\mathsf {{\color{#800066}{reduce}}}\) operation expects three index terms: the first denotes the root process, the second the value to send, and the last the reference where the root process stores the sum of the values. The \(\mathsf {{\color{#800066}{broadcast}}}\) operation requires two index terms: the root process and the value to be broadcast. Variable \({\color{#800066}{x}}\) denotes the value broadcasted and can be further used in the continuation expression \({\color{#800066}{e}}\). The collective conditional, \({\color{#800066}{\mathsf {{\color{#800066}{ifc}}}~{p}~\mathsf {{\color{#800066}{then}}}~{e}~\mathsf {{\color{#800066}{else}}}~{f}}}\) denotes a conditional operation on which all processes must decide equally. The remaining expression constructs are fairly standard: sequential composition, downwards-for loop and conditional expression.
To simplify type checking, we provide two forms of let-expressions. In expressions of the from \(\mathsf {{\color{#800066}{let}}}~{\color{#800066}{{x}:}}{\color{#5475D9}{D}}~{\color{#800066}{=~{i}}}~\mathsf {{\color{#800066}{in}}}~{\color{#800066}{e}}\), index term \({\color{#800066}{i}}\) cannot contain operations on references. As such its datatype, \({\color{#5475D9}{D}}\), may be an expressive type, containing refinement types. In turn, in expressions of the from \(\mathsf {{\color{#800066}{lets}}}~{\color{#800066}{{x}:}}{\color{#5475D9}{D}}~{\color{#800066}{=~{i}}}~\mathsf {{\color{#800066}{in}}}~{\color{#800066}{e}}\) (the \({\color{#800066}{s}}\) is for store operations), index term \({\color{#800066}{i}}\) is arbitrary and hence may contain operations on references. The price to pay is that \({\color{#5475D9}{D}}\) may not contain refinement types. Concrete programming languages are expected to present a uniform syntax to programmers (with \(\mathsf {{\color{#800066}{let}}}\) only, for example). Compilers can easily convert the surface language into the appropriate form according to the presence or absence of operations on references. That is the approach we follow in all examples in this article.
Apart from the wildcard (\({\color{#800066}{\_}}\)) examples in Section 2.1, lump together multiple \(\mathsf {{\color{#800066}{let}}}\) expressions and multiple assignments. Furthermore, expressions of the form \({\color{#800066}{x:=i;e}}\) abbreviate \(\mathsf {{\color{#800066}{let}}}~{\color{#800066}{{\_}:}}{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{int}}}}}}~{\color{#800066}{=~{x:=i}}}~\mathsf {{\color{#800066}{in}}}~{\color{#800066}{e}}\) or \(\mathsf {{\color{#800066}{lets}}}~{\color{#800066}{{\_}:}}{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{int}}}}}}~{\color{#800066}{=~{x:=i}}}~\mathsf {{\color{#800066}{in}}}~{\color{#800066}{e}}\) as appropriate. Once again, compilers can easily convert the surface language into the core language discussed in this article.
Stores associate reference identifiers \({\color{#800066}{r}}\) to values \({\color{#800066}{v}}\) (integer or Boolean values or variables denoting references). Processes are pairs composed of a store \({\color{#800066}{\rho }}\) and an expression \({\color{#800066}{e}}\). Programs are ordered lists of \({\color{#800066}{{\it size}}}\) processes.
Fig. 3.
2.3 Program Reduction
We introduce index term, proposition, and process evaluation in Figure 3, and program reduction in Figure 4. Notation \({\color{#800066}{\rho }}{{[}}{\color{#800066}{r}}{{:=}}{\color{#800066}{v}}{{]}}\) denotes store \({\color{#800066}{\rho }}\) where the entry for \({\color{#800066}{r}}\), of the form \({\color{#800066}{r}}:={\color{#800066}{v^{\prime }}}\), is replaced by \({\color{#800066}{r}}:={\color{#800066}{v}}\).
Index term evaluation takes a store \({\color{#800066}{\rho }}\), a term \({\color{#800066}{i}}\), the number n of processes and the number m of a given process; it returns a new store \({\color{#800066}{\sigma }}\) and a value \({\color{#800066}{v}}\). Integer n is used to resolve variable \({\color{#800066}{{\it size}}}\); integer m to resolve variable \({\color{#800066}{{\it myrank}}}\). In rule IE-Mkref it is insufficient to take \({\color{#800066}{r}}\) from the complement of the domain of \({\color{#800066}{\sigma }}\), for references are plain variables and an unwise choice would clash with some variable in the expression where the index term is taken from. Index term evaluation imposes a strict consistency model in which write and read operation are performed in order [52].
Proposition evaluation takes a store \({\color{#800066}{\rho }}\), a proposition \({\color{#800066}{p}}\), the number n of processes and the number m of the rank of the process, and returns a new store \({\color{#800066}{\sigma }}\) and a simplified expression \({\color{#800066}{q}}\). The values of n and m are used to evaluate index terms. The rules should be straightforward.
Process evaluation describes the local behaviour of processes, that is, computations that do not include synchronisation or communication with other processes. In Figure 3, one finds two rules for the conditional, to be used when the guard evaluates to \(\mathsf {{\color{#800066}{true}}}\) or to \(\mathsf {{\color{#800066}{false}}}\). The figure further includes rules for loop expansion and termination (to be used when the loop upper bound is larger than 0 and when it is lower than 1, respectively), for \(\mathsf {{\color{#800066}{let}}}\)-processes, and for sequential composition. Lemma 4.13 ensures that if \({\color{#800066}{{P}}}\!\Downarrow ^{n}_{m}{\color{#800066}{{{\color{#800066}{\langle {\rho },{e}\rangle }}}}}\), then \({\color{#800066}{e}}\) is \(\mathsf {{\color{#800066}{skip}}}\) or of the form \({\color{#800066}{(c;f)}}\) with \({\color{#800066}{c}}\) a collective operation ready to synchronise at program-level.
Fig. 4.
Program reduction is in Figure 4. Rule PR-Msg deals with message passing. It requires a \(\mathsf {{\color{#800066}{send}}}\) process and a \(\mathsf {{\color{#800066}{receive}}}\) process. The \(\mathsf {{\color{#800066}{send}}}\) process is l-ranked and that must be the value of the “from” index (\({\color{#800066}{i_m}}\)) in the \(\mathsf {{\color{#800066}{receive}}}\) process. The \(\mathsf {{\color{#800066}{receive}}}\) process is m-ranked and that must be the value of the “to” index (\({\color{#800066}{i_l}}\)) in the \(\mathsf {{\color{#800066}{send}}}\) process. The index to be sent (\({\color{#800066}{j_l}}\)) is evaluated in the \(\mathsf {{\color{#800066}{send}}}\) process. The index denoting the reference to hold the incoming value (\({\color{#800066}{j_m}}\)) is evaluated at the \(\mathsf {{\color{#800066}{receive}}}\) process. After reduction, both processes progress to their continuations (\({\color{#800066}{e_l}}\) and \({\color{#800066}{e_m}}\)), and the value (\({\color{#800066}{v}}\)) in the message is stored in the store of the \(\mathsf {{\color{#800066}{receive}}}\) process. All other processes remain still. There is an unwritten proviso in rule PR-Msg that program \({\color{#800066}{S_1}}\) is of length l, program \({\color{#800066}{S_2}}\) of length \(m-l-1\), and program \({\color{#800066}{S_3}}\) of length \(n-m+1\).
Rule PR-Red engages all processes. Each process locally evaluates the first index term to identify the root process (\({\color{#800066}{l}}\)) and the second index term to identify the value to send (\({\color{#800066}{v_k}}\)). The root process alone evaluates the third index term, to obtain the reference \({\color{#800066}{r}}\) on which to store the sum of the values.1 All processes progress to their continuations (\({\color{#800066}{e_k}}\)) with stores unchanged, except for the root process that records in \({\color{#800066}{r}}\) the sum of the values.
PR-BCast is another rule that engages all processes. The first index term (\({\color{#800066}{i_k}}\)) identifies the root process (\({\color{#800066}{l}}\)). The root process alone evaluates the second index term (\({\color{#800066}{j_k}}\)) to obtain the value \({\color{#800066}{v}}\) to transmit. All processes progress to their continuations with variable \({\color{#800066}{x}}\) replaced by the transmitted value, that is, \({\color{#800066}{e_k}}{{\lbrace }}{{\color{#800066}{v}}}{{/}}{{\color{#800066}{x}}}{{\rbrace }}\).
Rule PR-IfcT requires all processes to evaluate their propositions (\({\color{#800066}{p_k}}\)) to \(\mathsf {{\color{#800066}{true}}}\). All processes then advance to their then-branches (\({\color{#800066}{e_k}}\)). The last rule in the figure allows one process (\({\color{#800066}{P_l}}\)) to progress individually, via local process reduction (Figure 3).
Let us follow a program that computes \(\Sigma ^{8}_{i=5}i\) by using two processes, three \(\mathsf {{\color{#800066}{reduce}}}\) operations, and that leaves the result in reference \({\color{#800066}{r_0}}\) at process 0. The expressions for the two processes are as follows:
where reduction alternates between the rule that allows processes to progress individually (rule PR-Proc) and rule PR-Red. In this case, local reduction simply rearranges the expression in order to expose the next collective operation (\(\mathsf {{\color{#800066}{reduce}}}\)). Notice that there is no global synchronisation, except on collective operations; each process locally knows what to do: expose the next collective operation.
3 Types for Message Passing Parallel Programs
This section introduces the notion of types, type equivalence, and type assignment to programs, starting with examples of types for the source code introduced in the previous section.
3.1 Typing Common Parallel Programs
We discuss types for all four examples introduced in Section 2.1.
The Monte Carlo Algorithm. Recall that process \({\mathsf {0}}\) broadcasts the number of darts and then collects the sum of darts (in all processes) that hit the target via a reduce operation. Here is a possible type:
Section 2.1 introduces three expressions: for the root process, for all non-root processes, and for all processes (including the root). The type above characterises the interactive behaviour of all these expressions, given that the \({\mathsf {\color{#800066} {broadcast/reduce}}}\) pattern is common to all three. Variable \({\mathsf {\color{#800066} {n}}}\) denotes the value broadcasted and could be used in the rest of the protocol since it denotes a value equally known to all ranks. This dependence is discussed in the following variant, where process \({\mathsf {0}}\) further chooses (and broadcasts) the \({\mathsf {\color{#800066} {sink}}}\), the process that must collect the number of hits.
Notice that the value of the \({\mathsf {sink}}\), broadcast by process \({\mathsf {0}}\) to all processes including itself, influences the rest of the protocol: all processes agree to reduce on this value. Datatype \({\mathsf {\color{#5475D9} {rank}}}\) abbreviates a natural number smaller than \({\mathsf {\color{#800066} {size}}}\) and could be written as \(\{ {\mathsf {x: {\color{#5475D9} {nat}} |x\lt {\color{#800066}{size}}}}\}\), with \({\mathsf {\color{#800066} {nat}}}\) (another handy type) being \(\{{\mathsf {x:{\color{#5475D9}{int}}|x\gt =0}}\}\).
Message Exchange on a Ring Topology. This example highlights message passing between processes. A type for the code of each process is as follows:
Each individual process send one message and receives another, so that exactly \({\mathsf {\color{#800066} {size}}}\) messages are exchanged in total. This fact is captured by the \({\mathsf {\color{#5475D9} {forall}} \ \mathsf {i\lt} \ \mathsf {\color{#800066}{size}}}\) type, where \({\mathsf {\color{#800066} {i}}}\) ranges from \({\mathsf {\color{#800066} {size}-1}}\) down to \({\mathsf {0}}\). In this case, each “step” in the \({\mathsf {\color{#5475D9} {forall}}}\) type describes one message exchange between two different processes. Because \({\mathsf {\color{#800066} {size}}}\) messages are exchanged and there are \({\mathsf {\color{#800066} {size}}}\) processes, we expect to see exactly one \({\mathsf {\color{#800066} {send}}}\) and one \({\mathsf {\color{#800066} {receive}}}\) operation in the code of each process. Referring to the code in Section 2.1 we can see that this\({\mathsf {\color{#5475D9} {forall}}}\) type does not correspond to any loop in the source code.
Messages are exchanged to the right in all cases but one: rank \({\mathsf {0}}\) sends to rank \({\mathsf {\color{#800066} {1}}}\), which in turn sends to rank \({\mathsf {\color{#800066} {2}}}\), ...which sends to rank \({\mathsf {\color{#800066} {size}-1}}\) which sends to rank \({\mathsf {0}}\). The “wrap-around” happens for \({\mathsf {\color{#800066} {i}}}\) = \({\mathsf {0}}\) in which case a message is sent to the left. This reversal with respect to the receive-first-send-then pattern of all other processes breaks the potential communication deadlock and is in clear sync with the code.
Odd-even Sort. The type associated with the algorithm can be easily derived.
The outer \({\mathsf {\color{#5475D9} {forall}}}\) type describes the number of iterations in the program and matches the \({\mathsf {\color{#800066} {for}}}\) loop in line 2 of the program. In each phase messages are either sent left or right, but this decision must be unanimously taken among all processes, otherwise a deadlock will happen. The decision is based on the \({\mathsf {\color{#800066} {phase}}}\), a variable that is introduced by the \({\mathsf {\color{#5475D9} {forall}}}\) type (and the \({\mathsf {\color{#800066} {for}}}\) loop in the program) and is hence common to all ranks. As such, both the type and the program may use the \({\mathsf {\color{#5475D9} {ifc}}}\)/\({\mathsf {\color{#800066} {ifc}}}\) constructor. Notice that the \({\mathsf {\color{#800066} {for}}/{\mathsf {\color{#800066} {ifc}}}}\) dependency on program variable \({\mathsf {{phase}}}\) (lines 2–3 in the source code) is shared by the \({\mathsf {\color{#5475D9} {forall}}}\)/\({\mathsf {\color{#5475D9} {ifc}}}\) part of the type (lines 1–2). Each branch features a further \({\mathsf {\color{#5475D9} {forall}}}\) to state that each processes must exchange two messages. This type, however, does not correspond to a loop in the program, as discussed in the example above.
A little arithmetic will show that the indices in the \({\mathsf {\color{#5475D9} {message}}}\) type correspond to those in the \({\mathsf {\color{#800066} {send}}}\) and \({\mathsf {\color{#800066} {receive}}}\) operators in the program. From the type one may infer the number of messages exchanged in the program: \({\mathsf {\color{#800066} {size}}}\) in the outer loop times \({\mathsf {2 *{\color{#800066}{size}}/ 2}}\) (in the worst case) in the inner loop, that is \({\mathsf {\color{#800066} {size}}*{\mathsf {\color{#800066}{size}}}}\), as expected.
The n-Body Problem. The type below captures the interactive behaviour of the code for the n-body problem in Section 2.1.
One can easily notice the alignment of the type against the code: type \({\mathsf {\color{#5475D9} {broadcast}}}\) corresponds to the \({\mathsf {\color{#800066} {broadcast}}}\) operation in the source code (line 1); the \({\mathsf {\color{#5475D9} {forall}}}\) type controlling the number of iterations corresponds to the \({\mathsf {\color{#800066} {for}}}\) loop in line 7 and that controlling the pipe corresponds to \({\mathsf {\color{#800066} {for}}}\) loop in line 8. The inner \({\mathsf {\color{#5475D9} {forall}}}\) type (in line 4 above) simply says that each of the \({\mathsf {\color{#800066} {size}}}\) processes is to exchange four messages (lines 5–8 above), and does not correspond to any loop in the source code, as discussed in the examples above. The modulo-\({\mathsf {\color{#800066} {size}}}\) arithmetic enforces the reversal of rank 0 with respect to the receive-first-send-then pattern of all other processes, thus breaking the communication deadlock.
Fig. 5.
Fig. 6.
3.2 Type Formation
The syntax of datatypes, propositions and index terms are in Figure 2; the associated intuitions are discussed in Section 2. Figure 5 introduces the formation rules for all these syntactic categories. In addition, it introduces typing contexts as lists of \({\color{#800066}{x}}:{\color{#5475D9}{D}}\) binds. The syntax of types and the type formation rules are in Figure 6.
Metavariables \({\color{#5475D9}{T}}\), \({\color{#5475D9}{U}}\) range over types. The type of terminated processes is \({\bf \mathsf {{\color{#5475D9}{skip}}}}\). Sequential composition is denoted by \({\color{#5475D9}{T;U}}\). The primitive recursion operator is \({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{T}}\) denoting the sequential composition of \({\color{#800066}{i}}\) copies of type \({\color{#5475D9}{T}}\) (modulo a substitution introduced below). Point-to-point communication is denoted by type \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{i}}~{\color{#800066}{j}}\) describing a message from the process at rank \({\color{#800066}{i}}\) to the process at rank \({\color{#800066}{j}}\). Reduce operations are introduced by type \({\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{i}}\) whose value is to be collected by the process at rank \({\color{#800066}{i}}\). Broadcast operations are denoted by \({\bf \mathsf {{\color{#5475D9}{broadcast}}}}~{\color{#800066}{i}}~{\color{#800066}{x}}{\color{#5475D9}{:{D}. {T}}}\), issued from process rank \({\color{#800066}{i}}\) and distributing a value of datatype \({\color{#5475D9}{D}}\) denoted by variable \({\color{#800066}{x}}\) in the continuation type \({\color{#5475D9}{T}}\). Finally, collective conditional types are denoted by \({\bf \mathsf {{\color{#5475D9}{ifc}}}}~{\color{#800066}{p}}~{\bf \mathsf {{\color{#5475D9}{then}}}}~{\color{#5475D9}{T}}~{\bf \mathsf {{\color{#5475D9}{else}}}}~{\color{#5475D9}{U}}\) which behave as \({\color{#5475D9}{T}}\) or \({\color{#5475D9}{U}}\) depending on the truth value of proposition \({\color{#800066}{p}}\).
A few constructors introduce bindings for type variables. In datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{D} \mid }}~{\color{#800066}{p}}{\color{#5475D9}{\rbrace }}\), variable \({\color{#800066}{x}}\) is bound in proposition \({\color{#800066}{p}}\). In types \({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{T}}\) and \({\bf \mathsf {{\color{#5475D9}{broadcast}}}}~{\color{#800066}{i}}~{\color{#800066}{x}}{\color{#5475D9}{:{D}. {T}}}\), variable \({\color{#800066}{x}}\) is bound in type \({\color{#5475D9}{T}}\). The set of free variables in object A, denoted by \(\operatorname{fv}(A)\), is defined accordingly.
The distinguished variable \({\color{#800066}{{\it size}}}\) belongs to datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{x\ge 0}}{\color{#5475D9}{\rbrace }}\). Contexts \(\Gamma , \Delta\) keep track of variables and their datatypes. The language of datatypes allows defining a few important derived constructors, including \({\bf \mathsf {{\color{#5475D9}{nat}}}}\) standing for the datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{x\ge 0}}{\color{#5475D9}{\rbrace }}\), and \({\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}\) abbreviating datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{nat}}}}} \mid }}~{\color{#800066}{x\lt {\color{#800066}{{\it size}}}}}{\color{#5475D9}{\rbrace }}\). As discussed in Section 2.2, \(\mathsf {{\color{#800066}{true}}}\) is a distinguished proposition such that \({}\vdash {\color{#800066}{\mathsf {{\color{#800066}{true}}}}}\ \mathbf {true}\) holds, and \(\mathsf {{\color{#800066}{false}}}\) is another distinguished proposition such that \({}\vdash {\color{#800066}{\mathsf {{\color{#800066}{false}}}}}\ \mathbf {true}\) does not hold.
All these notions—index terms, propositions, datatypes, types, and contexts—are subject to formation rules that discard non well formed syntax. The rules should be self-explanatory. We essentially require that all variables (in index terms, propositions or datatypes) are declared in the context. For example, \({\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{x=y}}{\color{#5475D9}{\rbrace }}\) is not a datatype when considered under the empty context, \(\varepsilon\), but becomes so if one chooses a context of the form \({\color{#800066}{y}}:{\bf \mathsf {{\color{#5475D9}{int}}}}\).
The formation rules for types are new and deserve a brief explanation. All type constructors that use index terms \({\color{#800066}{i}}\) to talk about process ranks require \({\color{#800066}{i}}\) to be a valid rank. That is the case of \({\bf \mathsf {{\color{#5475D9}{message}}}}\), \({\bf \mathsf {{\color{#5475D9}{reduce}}}}\), and \({\bf \mathsf {{\color{#5475D9}{broadcast}}}}\). In addition, rule T-Msg requires the source and the target rank, \({\color{#800066}{i}}\) and \({\color{#800066}{j}}\), to be distinct, for an attempt to send a message to itself would lead to a deadlocked situation. Types \({\bf \mathsf {{\color{#5475D9}{message}}}}\) and \({\bf \mathsf {{\color{#5475D9}{reduce}}}}\) do not explicitly mention the value transmitted, for programs cannot rely on this information. The case for \({\bf \mathsf {{\color{#5475D9}{broadcast}}}}\) is different: the rest of the protocol may rely on the value transmitted.
Proposition entailment includes an hypothesis of the form \({\Gamma }\vDash {\color{#800066}{p}}\) that refers to logical deducibility. This proof obligation is usually passed to an SMT solver. A few assumptions are required on deducibility, namely on what concerns subtyping, weakening, strengthening, and substitution (Lemmas 3.1, 4.1, 4.2, and 4.4). The following assumption are expected to hold for most SMT solvers.
Decidability of logical deducibility is discussed together with algorithmic type checking (Section 5.2).
Datatype subtyping is defined as usual for refinement types [26], and equivalence is defined from subtyping again as usual. Informally, a refinement type \({\color{#5475D9}{D}}\) is a subtype of a refinement type \({\color{#5475D9}{E}}\) if the formulas in \({\color{#5475D9}{D}}\) are “more precise” than those in \({\color{#5475D9}{E}}\). In rule S-RefinL, proposition \({\color{#800066}{p}}\) appears in the “more precise” side, hence we require only its good formation. In the case of rule S-RefinR, proposition \({\color{#800066}{p}}\) appears in the “less precise” side, hence we require \({\color{#800066}{p}}\) to hold (which in turn implies that it is well formed).
We conclude this section with an example attesting the importance of type formation. Consider a program where all processes are to send some integer value to a fixed process. Process \({\mathsf {0}}\) decides and broadcasts the rank of this process. One could try a type of the form:
but it would not be well formed, for rank \({\mathsf {\color{#800066} {n}}}\) sends a message to itself, which would lead to a deadlocked situation. Sketching a derivation for the formation of the type above, one stumbles at an unsatisfiable premise to rule T-Msg, namely:
which yields the unsatisfiable verification condition \({\color{#800066}{({\color{#800066}{{\it size}}}\gt 1 \wedge 0\le n\lt {\color{#800066}{{\it size}}}\wedge i\le {\color{#800066}{{\it size}}}) \rightarrow i \ne n}}\).
We can fix this type in a number of different ways. But before we do that, we introduce two handy derived constructs. Notation \({\mathsf {\color{#5475D9} {forall}} \ \mathsf{i: j..k. T}}\) abbreviates the type \({\mathsf {\color{#5475D9} {forall}} \ {\mathsf {i \lt k-j+1. U}}}\), where type \({\mathsf {{U}}}\) is obtained from type \({\mathsf {{T}}}\) via the appropriate translation on index \({\mathsf {{i}}}\). Notation \({\mathsf {\color{#5475D9} {ifc}} \ \mathsf {p} \ \mathsf {\color{#5475D9}{then}} \ \mathsf {T}}\) abbreviates the type \({\mathsf {\color{#5475D9} {ifc}}\ \mathsf{p}\ {\mathsf{\color{#5475D9}{then}}}\ \mathsf{T}\ {\mathsf{\color{#5475D9}{else\quad skip}}}}\).
So, here is one of the possible fixes, a solution using two loops:
and another that makes use of a collective conditional:
Fig. 7.
3.3 Type Equivalence
The complete set of rules is in Figure 7. Premises in parenthesis are used solely for agreement purposes (Lemma 4.3). Recall that variable \({\color{#800066}{{\it myrank}}}\) denotes the rank of a given process. The variable is of datatype \({\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}\) and should be present in the typing context, if ever used in a derivation. Type equivalence rules can be grouped under six different categories.
Congruence Rules. Aligns propositions, index types, and datatypes in types, allowing, for example, to show the following equivalences.
The group includes the first seven rules in Figure 7, one for each of the seven type constructors in basic ParTypes (Figure 6). In all these rules, we require propositions, index terms, and datatypes to be equivalent. In rules that introduce new entries in the yping context (\(\mathsf {{\color{#800066}{broadcast}}}\)), we push one of the datatypes into the context. This group of rules ensures that type equivalence is reflexive.
Monoidal Rules. Turns the set of types into a monoid with respect to sequential composition and the terminated type. Allows to show that:
There are two rules that define the semantics of primitive recursion. The former unfolds the type, making use of sequential composition; the latter rewrites primitive recursion into \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) then the limit, \({\color{#800066}{i}}\), is 0. Primitive recursion unrolls loops at the left. Unless loops can be fully unrolled, right unroll is not valid. Examples:
can be equated to a sequence of two messages \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{2}}~{\color{#800066}{4}}; {\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{4}}\) under a context that entails \({\color{#800066}{{\it size}}}={\color{#800066}{5}}\).
Projection Rule. Implements projection in the sense of multiparty session types [39]. This is where the distinguished variable \({\color{#800066}{{\it myrank}}}\) comes into play. The projection rule rewrites a message type into \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) if both the target and the source can be proved to be different from \({\color{#800066}{{\it myrank}}}\). For example:
Preorder Rules. Include the symmetry and transitivity rules. Reflexivity is attained via the congruence rules. Notice that transitivity is not derivable from the other rules. We have \({\Gamma }\vdash {\color{#5475D9}{{\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{skip}}}}}}}}\equiv {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{skip}}}}}}\) and \({\Gamma }\vdash {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{skip}}}}}}\equiv {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{1}}}}\) under an appropriate \(\Gamma\), but cannot show \({\Gamma }\vdash {\color{#5475D9}{{\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{skip}}}}}}}}\equiv {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{1}}}}\) without the rule for transitivity.
We can easily show that what we call datatype and type equivalence are indeed equivalence relations.
One might wonder why we have not lifted subtyping from datatypes to types, and decided instead to define an equality relation on types. Languages with subtyping and some form of input/output always behave co/contravariantly with respect to subtyping. This is the case with functions, where the \(T\rightarrow U\) type constructor is contravariant in the input position T and covariant in the output position U [62], and with binary session types [33, 37], where the message types \(?T.S\) and \(!U.S\) are covariant in the input position T and contravariant in output position U [23]. Our types do not talk about input/output. Rather than talking about message-sending and message-reception (as is the case with binary session types), a type \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{i}}~{\color{#800066}{j}}\) describes a message exchanged between two processes \({\color{#800066}{i}}\) and \({\color{#800066}{j}}\), taking no side (as is the case with global session types [38, 39]). Any attempt to subtype a message type would violate safe substitutability either on the sending or on the receiving side.
Flexible as it is, type equality is still quite intensional. We have seen above that right loop-unroll does not yield an equivalent type. Another instance is the example at the end of Section 3.2, where the two alternative types (two loops in a row and a loop with a conditional) are not equivalent, yet they describe the same pattern of interaction among processes. A third example are the two types \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{1}}{\color{#5475D9}{;}} {\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{2}}~{\color{#800066}{3}}\) and \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{2}}~{\color{#800066}{3}}{\color{#5475D9}{;}} {\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{1}}~{\color{#800066}{0}}\) that denote two independent messages exchanges (and that may happen in any order or even simultaneously), yet they are not equivalent.
Fig. 8.
Fig. 9.
3.4 Program Formation
Expression formation is in Figure 9. Syntax \({\color{#5475D9}{\lbrace }}{\color{#800066}{p}}{\color{#5475D9}{\rbrace }}\) abbreviates datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{\_}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{p}}{\color{#5475D9}{\rbrace }}\) [26], and is used in the rule for the collective conditional to push a proposition into the context. The \(\mathsf {{\color{#800066}{send}}}\) and the \(\mathsf {{\color{#800066}{receive}}}\) operators both expect two index terms: the first represents the target or the source rank, respectively; the second the value to be sent or the reference where to store the value received, again respectively. The target and the source ranks must be different from \({\color{#800066}{{\it myrank}}}\) as witnessed by premise \({\Gamma }\vdash {\color{#800066}{i \ne {\color{#800066}{{\it myrank}}}}}\ \mathbf {true}\) where, we recall, \({\color{#800066}{{\it myrank}}}\) is a distinguished variable that denotes the rank of the process. The formation of the \(\mathsf {{\color{#800066}{reduce}}}\) operation follows the same principles.
The \(\mathsf {{\color{#800066}{broadcast}}}\) operation requires two index terms: the root process \({\color{#800066}{i}}\) and the value \({\color{#800066}{j}}\) to be broadcast. Variable \({\color{#800066}{x}}\) denotes the value broadcasted and can be further used both in the continuation expression \({\color{#800066}{e}}\) and in type \({\color{#5475D9}{T}}\), hence it is added to the context when checking the formation of \({\color{#800066}{e}}\).
In the \(\mathsf {{\color{#800066}{broadcast}}}\) operation, the value of index term \({\color{#800066}{j}}\) is shared among all processes, so that these may use it (through variable \({\color{#800066}{x}}\)) in the continuation. The same does not apply \(\mathsf {{\color{#800066}{receive}}}\) or \(\mathsf {{\color{#800066}{reduce}}}\), where the value exchanged is known only to a subset of the processes, and therefore cannot be used in types. We must make sure that values exchanged among processes are not references; we use judgement \({\Gamma }\vdash {\color{#800066}{i}}:{\color{#5475D9}{D}}\) for this purpose.
The rules for the collective conditional E-Ifc, copy \({\color{#800066}{p}}\) or \({\color{#800066}{\lnot p}}\) to the context before checking the then and the else branch. The rule for the conventional conditional, E-If, is standard. Rules E-All and E-Let introduce in the context an entry for the bound variable \({\color{#800066}{x}}\) with the appropriate refinement type in each case. Rule E-LetS simply introduces type \({\color{#5475D9}{D}}\) (which may contain \({\bf \mathsf {{\color{#5475D9}{ref}}}}\)-types) in the context. In rule E-All, variable \({\color{#800066}{x}}\) may appear in \({\color{#5475D9}{T}}\) because \({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{T}}\) introduces a binding for it. This is not the case for the two let-rules and so we require \({\color{#800066}{x}}\notin \operatorname{fv}({\color{#5475D9}{T}})\). Finally, rule E-Eq includes type equivalence in expression formation.
The following example shows a typical usage of rule E-Eq. Consider an expression \({\color{#800066}{e}}\) of the form \(\mathsf {{\color{#800066}{let}}}~{\color{#800066}{{x}:}}{\color{#5475D9}{{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}}}~{\color{#800066}{=~{i}}}~\mathsf {{\color{#800066}{in}}}~{\color{#800066}{{\color{#800066}{\mathsf {{\color{#800066}{reduce}}}~{x}~{5}~{r}}}}}\) and take a context \(\Gamma\) such that \({\Gamma }\vdash {\color{#800066}{i}}:{\color{#5475D9}{{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}}}\) and \({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{r}}:{\color{#5475D9}{{\color{#5475D9}{{{\bf \mathsf {{\color{#5475D9}{int}}}}}~{\bf \mathsf {{\color{#5475D9}{ref}}}}}}}}\). Then, using rules E-Red and E-Let, we know that \({\Gamma ,{\color{#800066}{x}}:{\color{#5475D9}{\lbrace }}{\color{#800066}{y}}{\color{#5475D9}{:{{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}} \mid }}~{\color{#800066}{y=i}}{\color{#5475D9}{\rbrace }}}\vdash {\color{#800066}{{\color{#800066}{\mathsf {{\color{#800066}{reduce}}}~{x}~{5}~{r}}}}}:{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{x}}}}\), but we cannot conclude that \({\color{#800066}{e}}\) has type \({\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{x}}\) for \({\color{#800066}{x}}\) is bound in the \(\mathsf {{\color{#800066}{let}}}\)-expression (and not present in the final context). Instead, between the two rules, we use rule E-Eq to conclude \({\Gamma ,{\color{#800066}{x}}:{\color{#5475D9}{\lbrace }}{\color{#800066}{y}}{\color{#5475D9}{:{{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}} \mid }}~{\color{#800066}{y=i}}{\color{#5475D9}{\rbrace }}}\vdash {\color{#800066}{{\color{#800066}{\mathsf {{\color{#800066}{reduce}}}~{x}~{5}~{r}}}}}:{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{i}}}}\), and then use rule E-Let to conclude \({\Gamma }\vdash {\color{#800066}{e}}:{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{i}}}}\).
In Section 4.3, we show that all well-formed programs terminate. By using a \(\mathsf {{\color{#800066}{for}}}\)-loop in conjunction with references, we may trying writing a program that diverges
Any attempt to write a derivation for this program will necessarily fail. Consider a context \({\color{#800066}{r}}:{\color{#5475D9}{{\color{#5475D9}{{{\bf \mathsf {{\color{#5475D9}{int}}}}}~{\bf \mathsf {{\color{#5475D9}{ref}}}}}}}}\) binding \({\color{#800066}{r}}\), the only free variable in the expression. We first apply rule E-All to obtain a premise featuring the following candidate context \({\color{#800066}{r}}:{\color{#5475D9}{{\color{#5475D9}{{{\bf \mathsf {{\color{#5475D9}{int}}}}}~{\bf \mathsf {{\color{#5475D9}{ref}}}}}}}}, {\color{#800066}{x}}:{\color{#5475D9}{{\color{#5475D9}{\lbrace }}{\color{#800066}{y}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{nat}}}}} \mid }}~{\color{#800066}{y\lt !r}}{\color{#5475D9}{\rbrace }}}}\). There is now one path in the candidate derivation leading to a judgement of the form
which is unsatisfiable. The path is composed of rules E-Let, IS-Assign, I-Var, C-Bind, D-Refin, and P-Rel (possibly with occurrences of rule E-Eq). The crucial observation is that a refinement type for variable \({\color{#800066}{x}}\) is added to the context and refinement types may not contain operations on references.
Despite bearing a similar syntax, expressions \({\color{#800066}{\mathsf {{\color{#800066}{if}}}~{p}~\mathsf {{\color{#800066}{then}}}~{e}~\mathsf {{\color{#800066}{else}}}~{f}}}\) and \({\color{#800066}{\mathsf {{\color{#800066}{ifc}}}~{p}~\mathsf {{\color{#800066}{then}}}~{e}~\mathsf {{\color{#800066}{else}}}~{f}}}\) are quite different. The former is evaluated within a process (Figure 3), the latter is a collective operation evaluated at the level of programs (Figure 4). The conventional conditional expression cannot be seen as particular case of the collective conditional (of type \({\bf \mathsf {{\color{#5475D9}{ifc}}}}~{\color{#800066}{p}}~{\bf \mathsf {{\color{#5475D9}{then}}}}~{\color{#5475D9}{T}}~{\bf \mathsf {{\color{#5475D9}{else}}}}~{\color{#5475D9}{T}}\)) for it would require a global synchronisation among all processes which is not what a local conditional suggests.
Fig. 10.
We now turn our attention to programs. Store, process, and program formation are in Figure 10. A separate, yet conventional, typing context \(\Delta\) is used to type the store. The expression in a process is typed under a pair of contexts: \(\Delta\) for the references and \(\Gamma\) for the variables. The latter context typically contains bindings for variables \({\color{#800066}{{\it size}}}\) and \({\color{#800066}{{\it myrank}}}\). The variables in context \(\Delta\)—reference identifiers—cannot show up in the type \({\color{#5475D9}{T}}\) of the expression, an assumption captured by premise \({\Gamma }\vdash {\color{#5475D9}{T}}~\mathbf {type}\).
Programs, ordered lists of \({\color{#800066}{{\it size}}}\) processes, are typed under a context that introduces n for the number of processes, that is, \({\color{#800066}{{\it size}}}:{\color{#5475D9}{\lbrace }}{\color{#800066}{x}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{x=n}}{\color{#5475D9}{\rbrace }}\), denoted by \(\Gamma ^{n}\). Each process adds to this context an entry for variable \({\color{#800066}{{\it myrank}}}\) with the corresponding process number. Notation \(\Gamma ^{n}_{m}\) denotes the context \(\Gamma ^{n}, {\color{#800066}{{\it myrank}}}:{\color{#5475D9}{\lbrace }}{\color{#800066}{y}}{\color{#5475D9}{:{{\bf \mathsf {{\color{#5475D9}{int}}}}} \mid }}~{\color{#800066}{y=m}}{\color{#5475D9}{\rbrace }}\). All processes in a program share the same type \({\color{#5475D9}{T}}\); this is the type of the program itself.
We complete this section by discussing the two last premises to the rule for programs. Premise \({\Delta _0,\dots ,\Delta _{n-1}}~\mathbf {context}\) forces store locality by not allowing the same reference name to appear in two distinct processes. Premise \({\Gamma ^{n}}\vdash {\color{#5475D9}{T}}~\mathbf {type}\) makes sure variable \({\color{#800066}{{\it myrank}}}\) does not appear in type \({\color{#5475D9}{T}}\). Suppose that we allow variable \({\color{#800066}{{\it myrank}}}\) in the type of a program. The following program
would be typable at type \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{{\color{#800066}{{\it myrank}}}}}~{\color{#800066}{(({\color{#800066}{{\it myrank}}}+1)\%{\color{#800066}{{\it size}}})}}\), yet it would deadlock, for there is no receiving process for any of the two \(\mathsf {{\color{#800066}{send}}}\) operations.
4 Progress and Strong Normalisation
This section presents the main results of ParTypes: progress, preservation, and strong normalisation. In the process, we introduce a notion of normal forms for types and an algorithm to convert types to normal forms.
4.1 Administrative Results for Programs and Types
We start by addressing weakening, strengthening and agreement for the relevant syntactic constructions introduced in Figure 56789710.
Strengthening allows removing an entry \({\color{#800066}{x}}:{\color{#5475D9}{D}}\) from a context \(\Gamma ,{\color{#800066}{x}}:{\color{#5475D9}{D}},\Delta\) under the conditions that \({\color{#800066}{x}}\) does not occur in the types in \(\Delta\) and that \({\color{#5475D9}{D}}\) is closed (so that information in refinement types in \({\color{#5475D9}{D}}\) cannot be used by semantic entailment).
The substitution lemma is a cornerstone in the proof of preservation (Theorem 4.17).
The proofs of normalisation for evaluation, progress and preservation both for evaluation and reduction (Lemmas 4.134.144.124.17) require extracting the general form of the type for a given expression.
The last result in this section tells us the possible shapes of expressions that belong to a certain type.
4.2 Type Conversion and List Normal Forms
The notion of type equivalence introduced in Section 3.3 induces equivalent types that can be quite different, syntactically. Fortunately, types can be converted into a normal form that allows for quick verification of type equivalence and for a quick assessment of the size of a type. The former notion is used in algorithmic type equivalence (Section 5.1); the latter in strong normalisation (Section 4.4).
Fig. 11.
The normal forms we are interested in resemble lists, hence the name list normal form. The inductive definition is in Figure 11 and comprises one rule for each of the type constructors introduced so far. We can easily see that objects in list normal form are types (Lemma 4.8); the premises in parentheses work towards this end. Intuitively, a type is in list normal form, if it is of the form \({\color{#5475D9}{T_1;T_2;\dots ;T_n;{\bf \mathsf {{\color{#5475D9}{skip}}}}}}\), for \(n\ge 0\), and each type \({\color{#5475D9}{T_i}}\) is neither \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) nor of the form \({\color{#5475D9}{\_;\_}}\). In addition, types that occur in \({\color{#5475D9}{T_i}}\) are themselves in list normal form. There are three more conditions for a type to be a list normal form: message types must be genuine (not equivalent to \({\bf \mathsf {{\color{#5475D9}{skip}}}}\)), conditional types cannot be simplified, and primitive recursion cannot be unfolded or eliminated. For messages \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{i}}~{\color{#800066}{j}}\) we make sure that neither the sender \({\color{#800066}{i}}\) nor the recipient \({\color{#800066}{j}}\) can be proved equal to \({\color{#800066}{{\it myrank}}}\); for conditional types \({\bf \mathsf {{\color{#5475D9}{ifc}}}}~{\color{#800066}{p}}~{\bf \mathsf {{\color{#5475D9}{then}}}}~{\color{#5475D9}{T}}~{\bf \mathsf {{\color{#5475D9}{else}}}}~{\color{#5475D9}{U}}\) we require that neither \({\color{#800066}{p}}\) nor \({\color{#800066}{\lnot p}}\) can be proved; and for primitive recursion \({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{i}}.{\color{#5475D9}{T}}\) we require that the upper bound \({\color{#800066}{i}}\) cannot be proved neither equal nor greater than 0.
Fig. 12.
Conversion to normal form is performed by the algorithm in Figure 12. The first rule builds an empty list (\({\bf \mathsf {{\color{#5475D9}{skip}}}}\)) from the terminated type (\({\bf \mathsf {{\color{#5475D9}{skip}}}}\)). As before, premises in parentheses account for agreement (Lemma 4.8). The first rule for messages builds a singleton list. The second is for non-genuine messages, that is, messages that are equivalent to \({\bf \mathsf {{\color{#5475D9}{skip}}}}\) because both the sender and the recipient of the message are different from \({\color{#800066}{{\it myrank}}}\). In this case the message is converted to \({\bf \mathsf {{\color{#5475D9}{skip}}}}\). The rules for \({\bf \mathsf {{\color{#5475D9}{reduce}}}}\) and \({\bf \mathsf {{\color{#5475D9}{broadcast}}}}\) build singleton lists from types. The rule for sequential composition calls recursively the conversion rule and concatenates the results. Type concatenation, \({\color{#5475D9}{T}}+\!\!+~{\color{#5475D9}{U}}\), is a standard list concatenation function. There are three rules for converting primitive recursion. When the bound cannot be proved neither equal nor greater than 0, the function builds a singleton list. Otherwise, when the upper bound can be proved to be 0, the type is converted into \({\bf \mathsf {{\color{#5475D9}{skip}}}}\), and when the upper bound can be proved to be larger than 0 the type is expanded into a sequential composition, as in the corresponding rules in type equivalence (Figure 7). Finally, the rules for the conditional type follow an approach similar to primitive recursion, featuring three cases: when both \({\color{#800066}{p}}\) and \({\color{#800066}{\lnot p}}\) cannot be proved, when \({\color{#800066}{p}}\) is true, and when \({\color{#800066}{p}}\) is false.
For example, let \({\color{#5475D9}{T}}\) be the type \({\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{0}}~{\color{#800066}{1}}\) and recall that \(\Gamma ^{3}_{1}\) is the context stating that \({\color{#800066}{{\it size}}}\) is 3 and \({\color{#800066}{{\it myrank}}}\) is 1. Then,
because \({\Gamma ^{3}_{2}}\not\vDash {\color{#800066}{0 = {\color{#800066}{{\it myrank}}}}}\) and \({\Gamma ^{3}_{2}}\not\vDash {\color{#800066}{1 = {\color{#800066}{{\it myrank}}}}}\). For an example with primitive recursion, let \({\color{#5475D9}{U}}\) be the type \({\color{#5475D9}{\forall }}~{\color{#800066}{x}}{\color{#5475D9}{\lt }}{\color{#800066}{y}}.{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{x}}}}\). Then, we have
because \({\Gamma }\vDash {\color{#800066}{y\gt 0}}\) and \({\Gamma }\vDash {\color{#800066}{y-1\gt 0}}\) and \({\Gamma }\vDash {\color{#800066}{y-1-1=0}}\), but
because \({\Gamma }\not\vDash {\color{#800066}{y\gt 0}}\) and \({\Gamma }\not\vDash {\color{#800066}{y=0}}\).
The following lemma states that the judgements introduced in Figure 1112 respect agreement.
We now look into properties of concatenation: type concatenation is defined for types in normal form; the result of concatenation is type equivalent to the sequential composition of the input; and given two types in list normal form, the result of the concatenation is in list normal form.
We now show that type conversion is strongly normalising. We also show that type conversion yields a type in normal form that is equivalent to the original type. We finally show that type conversion is defined on all well-formed types.
The notion of structural equivalence (Figure 12) equates types that are syntactically equal except possibly on their index terms and propositions, which must be equal or have the same truth-value. For example, \({\Gamma ^{3}_{2}}\vdash {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{2}}}}\equiv _s{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{reduce}}}}~{\color{#800066}{1+1}}}}\). From the definition of structural equivalence, one easily concludes that structurally equivalent types are equivalent. Further, if two types are equivalent and one is in normal form, then the two types are structurally equivalent.
4.3 Progress
Local reduction on processes is captured by evaluation. The next lemma states that evaluation preserves the good formation of stores and processes, as well as the datatypes of the index terms under evaluation. Notice that the lemma requires contexts with entries only for \({\color{#800066}{{\it size}}}\) and \({\color{#800066}{{\it myrank}}}\) (that is, \(\Gamma ^{n}_{m}\)) and for references (\(\Delta\)), for these are the only variables that may evaluate to a value.
The next lemma states that evaluation for processes terminates and that datatypes and types are preserved during the operation.
The following axiom determines the meaning of assertions of the form \({\color{#800066}{S}}~\mathbf {halted}\).
Our first main result ensures that well-typed programs are either halted or can be further reduced.
Fig. 13.
4.4 Strong Normalisation
The second main result states that each program reduction step preserves typing. In preparation for strong normalisation we also show that the size of a program strictly decreases with reduction. We define the size of program by taking into consideration both its type and the expressions in each process; the details are in Figure 13. The size of a program denotes an upper bound on the number of collective operations the program will perform when run. The size of a type may or may not decrease with reduction. In the below example, the size of the type reduces from 1 to 0,
For programs that may reduce locally, we take into consideration the size of expressions in processes. Processes whose first operation is a collective operation or \(\mathsf {{\color{#800066}{skip}}}\) have size 0, for they are ready to engage in a collective operation or are halted. All other programs have size 1 because they will eventually engage in a local reduction.
In general, type equivalence does not preserve the size of a type. Consider context \(\Gamma ^{3}_{0}\) for a program of \({\color{#800066}{{\it size}}}\) 3 and a process rank 0. Then, \({\Gamma }\vdash {\color{#800066}{1,2\ne {\color{#800066}{{\it myrank}}}}}\ \mathbf {true}\), hence \({\Gamma }\vdash {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{1}}~{\color{#800066}{2}}}}\equiv {\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{skip}}}}}}\) but \({{\parallel }}{\bf \mathsf {{\color{#5475D9}{message}}}}~{\color{#800066}{1}}~{\color{#800066}{2}}{{\parallel }} = 1 \ne 0 = {{\parallel }}{\bf \mathsf {{\color{#5475D9}{skip}}}}{{\parallel }}\). However, one can show that size is preserved for equivalent types in normal form.
The following technical result is used in preservation for reduction (Theorem 4.17).
Equipped with the notion of the size of a program, we are in a position to state and prove the first main result of this section.
Process evaluation is deterministic; that should be apparent from the rules in Figure 3. Program reduction, on the other hand, is not deterministic. For example, processes that do not synchronise may evolve locally; program reduction for such processes encompasses all possible interleavings. In the case of message exchange, two processes that reach a rendez-vous (processes l and m in rule PR-Msg, Figure 4) may reduce immediately or else be delayed by message exchanges involving different processes or by local reductions in other processes (via rule PR-Proc in the same figure). In any case, if a program is capable of two distinct reductions, then there exists a common program that is reachable from both results.
5 Algorithmic Type Checking
This section introduces sound and complete algorithms for checking type equivalence and expression formation.
Fig. 14.
5.1 Checking Type Equivalence
Algorithmic type equivalence, in Figure 14, works by converting types to normal forms. Given two types, the algorithm converts each to a normal form and then compares the thus obtained types for structural type equivalence. Equipped with the previous results, we are in a position to prove the main result of this section.
5.2 Checking Expression Formation
In a concrete implementation of our system, programmers write a series of expressions of length \({\color{#800066}{{\it size}}}\), not necessarily distinct. Then, a program is assembled from a series of processes, each composed of an expression and an empty store. We must then provide for algorithmic checking for expressions only, an easy task once equipped with algorithmic type equivalence.
We first notice that if proposition entailment is decidable, then datatype formation \({\Gamma }\vdash {\color{#5475D9}{D}}~\mathbf {dtype}\), context formation \({\Gamma }~\mathbf {context}\), datatype subtyping \({\Gamma }\vdash {\color{#5475D9}{D}}\lt :{\color{#5475D9}{E}}\) and equivalence \({\Gamma }\vdash {\color{#5475D9}{D}}\equiv {\color{#5475D9}{E}}\), type formation \({\Gamma }\vdash {\color{#5475D9}{T}}~\mathbf {type}\) and store-related proposition formation \({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{p}}~\mathbf {prop}\) index term formation \({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{i}}:{\color{#5475D9}{D}}\) (in Figure 568) are all decidable relations, given that the rules in each of these judgements are directed by the syntax. For these relations, we shall not require algorithmic counterparts. Figure 15 introduces the algorithmic counterparts of the remaining relations.
Due to the presence of subtyping in datatype formation (rule I-Sub in Figure 5) and that of type equivalence in expression formation (rule E-Eq in Figure 9), algorithmic index term and algorithmic expression formation are bidirectional systems [13, 14, 60]. Index term formation is split into a synthesise function (\({\Gamma }\vdash {\color{#800066}{i}}\Rightarrow {\color{#5475D9}{D}}\)) and a check against function (\({\Gamma }\vdash {\color{#800066}{i}}\Leftarrow {\color{#5475D9}{D}}\)), and similarly for expression formation (\({\Gamma }\vdash {\color{#800066}{e}}\Leftarrow {\color{#5475D9}{T}}\) and \({\Gamma }\vdash {\color{#800066}{e}}\Rightarrow {\color{#5475D9}{T}}\)). In general, all objects in these judgements are considered as input, except those at right of a right arrow (that is, the \({\color{#5475D9}{D}}\) and \({\color{#5475D9}{T}}\) in the synthesis functions) which are considered output.
The rules in the datatype synthesis function are obtained from the syntax-directed counterparts (I-Var, I-Int, I-Op in Figure 5) by replacing the colons in the premises by \(\Leftarrow\) or \(\Rightarrow\) as appropriate. Notice that the rules for integer values and arithmetic operations yield refinement types that describe subtypes of all possible datatypes for the index term, thus compensating for the absence of rule I-Refin. There is one only rule for the check-against datatype function: it synthesises a datatype \({\color{#5475D9}{E}}\) for the given index term \({\color{#800066}{i}}\) and checks that \({\color{#5475D9}{E}}\) is a subtype of the given \({\color{#5475D9}{D}}\).
Algorithmic expression formation is obtained similarly. The only rule for check-against calls expression synthesis to obtain a type \({\color{#5475D9}{U}}\) and checks whether the given type \({\color{#5475D9}{T}}\) and \({\color{#5475D9}{U}}\) are equivalent, using algorithmic type equivalence (Figure 14). The rules for type synthesis are obtained from the syntax-directed rules in Figure 10 by replacing the semicolons in the premises by \(\Leftarrow\) or \(\Rightarrow\) as appropriate.
Fig. 15.
Proposition entailment, \({\Gamma }\vdash {\color{#800066}{p}}\ \mathbf {true}\), calls logical deducibility, \({\Gamma }\vDash {\color{#800066}{p}}\). Deducibility is usually transformed into a verification condition and passed to an SMT solver. Decidability of type checking ultimately depends on the decidability of logical deducibility, and this depends on the concrete language of propositions (Figure 5). The inclusion of multiplicative integer operators in the language of propositions (as instances of the generic binary operator \({\color{#800066}{\oplus }}\) in Figure 5) renders logical deducibility undecidable in general. We sometimes use modular arithmetic in examples and that can pose problems to SMT solvers when the value of \({\color{#800066}{{\it size}}}\) cannot be determined.
We can now show that algorithmic type checking is sound and complete with respect to its declarative counterpart.
We complete this section with a brief discussion on the running time of type checking. The procedures for checking the formation of index terms, propositions, datatypes and expressions make one call to each subterm. On what concerns type equivalence (\({\Gamma }\vdash _{\!a}{\color{#5475D9}{T}}\equiv {\color{#5475D9}{U}}\)), structural type equivalence checking (\({\Gamma }\vdash {\color{#5475D9}{T}}\equiv _s{\color{#5475D9}{U}}\)) is linear on the size of the types. Type conversion (\({\Gamma }\vdash {\color{#5475D9}{T}}\Downarrow {\color{#5475D9}{U}}\)) makes one call to each subterm of the input type, excepted for primitive recursion. In this case, the number of calls if constant, for the number of goals \({\Gamma }\vDash {\color{#800066}{i}}\), \({\Gamma }\vDash {\color{#800066}{i-1}}\), ...that succeed is finite (and fixed by \(\Gamma\)). There remains logical deducibility, \({\Gamma }\vDash {\color{#800066}{p}}\). Even though the logical deducibility for equality, uninterpreted functions and linear arithmetics is NP-hard, state-of-the-art solvers for this theory can be quite efficient for the simple queries that arise in the context of type checking ParTypes expressions.
6 Extensions
This section sketches the introduction of a number of extensions to expressions and types. The extensions allow modelling many protocols usually found in the literature of parallel programming. We illustrate the new constructions with examples.
6.1 Scattering and Gathering Arrays
This section introduces arrays and operations for communicating arrays. The \({\mathsf {\color{#800066} {scatter}}}\) operator distributes arrays among processes. A root process proposes an array. The array is divided in \({\mathsf {\color{#800066} {size}}}\) equal parts, each delivered to a different process. Each process (including the root) puts forward a reference (to an array) where to receive its part. The \({\mathsf {\color{#800066} {gather}}}\) operator performs the inverse operation: each process proposes an array and the root process collects all parts into a large array. In both cases, the typing rules enforce that the length of the large array divides \({\mathsf {\color{#800066} {size}}}\), the number of processes.
We start with a few examples. The first is the parallel scalar product (or dot product) of two arrays. The code is as follows:
The root process (\({\mathsf {0}}\) in our example) starts by reading the problem size (the length of the arrays). It then broadcasts this value to all processes, so that each process may allocate space for their local arrays. Recall that the semantics of \({\mathsf {\color{#800066} {broadcast}}}\) ensures that \({\mathsf {read_problem_size()}}\) is evaluated by rank \({\mathsf {0}}\) alone (rule PR-BCast in Figure 4). The root process then reads and scatters the two arrays among all processes including itself (lines 4–5). Similarly to \({\mathsf {\color{#800066} {broadcast}}}\), the \({\mathsf {read_array(n)}}\) expression is only evaluated by the root process. Each process stores its sub-array locally, in \({\mathsf {x}}\) and \({\mathsf {y}}\). Finally, each process computes a sequential version of the dot product on their local arrays, and the root collects the result via a \({\mathsf {\color{#800066} {reduce}}}\) operation (line 7). The length of the arrays must evenly divide the number of processes, so that each process receives parts of the original arrays of equal size. We capture this requirement with the refined type for \({\mathsf {n}}\) in line 1.
The following type describes the program’s protocol.
We can easily notice the alignment between the code and the type. For example, the \({\mathsf {\color{#800066} {broadcast}}}\) operation in line 1 of the code corresponds to the \({\mathsf {\color{#5475D9} {broadcast}}}\) type in line 1 of the type, and similarly for \({\mathsf {\color{#800066} {scatter}}}\) and \({\mathsf {\color{#800066} {reduce}}}\). The type of the value broadcast, \(\{ \mathsf {z: {\color{#5475D9}{nat}}|z \% size==0} \}\) is the same in both the code and the type Then, the array declarations in lines 2–3 make sure that the \({\mathsf {\color{#800066} {scatter}}}\) operations type check against the \({\mathsf {scatter}}\) types: function \({\mathsf {\color{#800066} {read_array}}}\) returns an int array of size \({\mathsf {n}}\) that is scattered as \({\mathsf {n/}\mathsf {\color{#800066}{size}}}\) arrays allocated on each process.
Our next example takes advantage of dependent types to implement topology passing algorithms. The topology underlying the protocol for the finite differences is that of a ring: a linear array with a wraparound link. If a different mapping of ranks to processes is to be used, a new protocol must be derived. It turns out that the language of datatypes is flexible enough to encode topologies in integer arrays. Such a topology may then be made known to all processes in such a way that processes exchange messages as per the particular topology.
encodes a one-dimensional network topology, where \({\color{#800066}{j=a[i]}}\) means \({\color{#800066}{j}}\) is a direct neighbor of \({\color{#800066}{i}}\): each node has exactly one direct neighbor (a number between \({\color{#800066}{0}}\) and \({\color{#800066}{{\color{#800066}{{\it size}}}-1}}\)) that is different from itself. At this point, we assume that the language of propositions is extended with quantifiers. Call such a datatype \({\mathsf {D}}\) and consider the type below. The topology is first distributed among all processes by the root process (\({\mathsf {0}}\)). Thereafter, each process can exchange a message with its neighbour.
For example, a right-to-left ring topology of length five can be encoded as \([4,0,1,2,3]\).
The encoding above requires all processes to have exactly one direct neighbour. How can one encode a topology when this is not the case? Think of a star or a line. One possibility is to weaken the above condition on the elements of the array, while strengthening the subsequent message passing loop. We could, for example, drop the restriction that \({\color{#800066}{a[i] \ne i}}\) and encode a right-to-left line of length five as \([0,0,1,2,3]\), a 0-centred star as \([0,0,0,0,0]\), and a full binary 0-rooted tree of depth 3 as \([0,0,0,1,1,2,2]\). In all cases, the topology maps 0 to 0. And this causes a problem if we try to send a message from \({\mathsf {i}}\) to \({\mathsf {topology [i]}}\), as in the above example. Here’s the example rewritten with a conditional type that guards against ranks without neighbours:
Arbitrary topologies can be encoded, e.g., using an adjacency matrix. For example, a partially connected mesh topology can be encoded with two arrays, one for the sources, and another for the targets. We leave to the reader adapting the above protocols to this situation.
The formation rules for arrays and array operations are in Figure 16. Index terms are extended to include array constants, \({\color{#800066}{[i_1,\dots ,i_n]}}\), array length, \({\color{#800066}{\mathsf {{\color{#800066}{len}}}(i)}}\), array access, \({\color{#800066}{i_1[i_2]}}\), array creation, \({\color{#800066}{\mathsf {{\color{#800066}{mkarray}}}~{i}~{j}}}\), and array update, \({\color{#800066}{i[j]:=k}}\). To datatypes, we add \({\color{#5475D9}{{D}{\bf \mathsf {{\color{#5475D9}{[]}}}}}}\) to describe arrays of a given datatype \({\color{#5475D9}{D}}\). The shorthand notation \({\color{#5475D9}{{D}[}}{\color{#800066}{i}}{\color{#5475D9}{]}}\) stands for the datatype \({\color{#5475D9}{\lbrace }}{\color{#800066}{a}}{\color{#5475D9}{:{{\color{#5475D9}{{D}{\bf \mathsf {{\color{#5475D9}{[]}}}}}}} \mid }}~{\color{#800066}{\mathsf {{\color{#800066}{len}}}(a) = i}}{\color{#5475D9}{\rbrace }}\). As before, we distinguish index terms that may be present in types (\({\Gamma }\vdash {\color{#800066}{i}}:{\color{#5475D9}{D}}\)) from those that may manipulate the store (\({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{i}}:{\color{#5475D9}{D}}\)). Operators that update the store (\(\mathsf {{\color{#800066}{mkarray}}}\) and array update) are present in the latter only. Other operators are present in both relations.
The formation rules for the new index terms and datatypes should be straightforward; notice that the rule for array access formation performs array-bound checking. The type formation rules for \({\bf \mathsf {{\color{#5475D9}{scatter}}}}\) and \({\bf \mathsf {{\color{#5475D9}{gather}}}}\) require the root process, \({\color{#800066}{i}}\), to be a valid rank and the values to scattered/gathered, \({\color{#5475D9}{D}}\), to be arrays of a length that divides the number of processes, \({\color{#800066}{{\it size}}}\). This restriction ensures that each process receives/sends sub-arrays of equal length, so that they may share the same code.
On what concerns expressions, \({\color{#800066}{\mathsf {{\color{#800066}{scatter}}}~{i}~{j}~{k}}}\) denotes a collective operator where \({\color{#800066}{i}}\) is the root process (the process holding the array), \({\color{#800066}{j}}\) is the array to be scattered (held by the root), and \({\color{#800066}{k}}\) is the reference, local to each process, that will receive a part of the array. The expression formation rule for \(\mathsf {{\color{#800066}{scatter}}}\), ensures that the initial array evenly divides the number of processes (\({\color{#800066}{{\it size}}}\)), so that each gets an array of equal length, as requested by type \({\bf \mathsf {{\color{#5475D9}{scatter}}}}\). The type for expression \({\color{#800066}{\mathsf {{\color{#800066}{gather}}}~{i}~{j}~{k}}}\) sees the roles of index terms \({\color{#800066}{j}}\) and \({\color{#800066}{k}}\) reversed: the former denotes the local arrays proposed by each process, and the latter the reference, at the root process, that will collect the result. In both expressions, \({\color{#800066}{\mathsf {{\color{#800066}{scatter}}}~{i}~{j}~{k}}}\) and \({\color{#800066}{\mathsf {{\color{#800066}{gather}}}~{i}~{j}~{k}}}\), the root process \({\color{#800066}{i}}\) goes into the type, so we use pure index term formation (\({\Gamma }\vdash {\color{#800066}{i}}:{\color{#5475D9}{{\color{#5475D9}{{\bf \mathsf {{\color{#5475D9}{rank}}}}}}}}\)); the two other indices, \({\color{#800066}{i}}\) and \({\color{#800066}{j}}\), are arbitrary index terms that may manipulate the store and hence we use the store-related relation (\({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{j}}:{\color{#5475D9}{{\color{#5475D9}{{{\color{#5475D9}{{D}[}}{\color{#800066}{n}}{\color{#5475D9}{]}}}~{\bf \mathsf {{\color{#5475D9}{ref}}}}}}}}\) and \({\Gamma }\vdash _{\!\mathbf {s}}{\color{#800066}{k}}:{\color{#5475D9}{{\color{#5475D9}{{D}[}}{\color{#800066}{n * {\color{#800066}{{\it size}}}}}{\color{#5475D9}{]}}}}\) in the case of \({\bf \mathsf {{\color{#5475D9}{scatter}}}}\)).
Notice that array subtyping is invariant on the type of the elements in the array. This is justified by the fact that array elements are both read and updated as usual in imperative languages [59] (cf. rule S-Ref in Figure 5). We can easily check that agreement (Lemma 4.3) holds for the extensions and that subtyping is still a preorder (Lemma 3.1). Theories of arrays are presently built into many SMT solvers.
6.2 More Collective Operations
This section presents a few extra useful collective operations. Operator \({\mathsf {\color{#800066} {allreduce}}}\) behaves very much like \({\mathsf {\color{#800066} {reduce}}}\), except that the value obtained from the reduction process is disseminated to all processes (including the root, as usual). The \(\mathsf {{\color{#800066}{allgather}}}\) collective operator is similar to \(\mathsf {{\color{#800066}{gather}}}\), only that all processes get the whole array (hence, we allow protocol dependency on this value). We could alternatively define \(\mathsf {{\color{#800066}{allreduce}}}\) and \(\mathsf {{\color{#800066}{allgather}}}\) as operators derived from \(\mathsf {{\color{#800066}{reduce}}}\) or \(\mathsf {{\color{#800066}{gather}}}\) (respectively) followed by a \(\mathsf {{\color{#800066}{broadcast}}}\). MPI provides them as primitives to allow performance optimisations by implementors.
As an example let us consider a program that computes the projection of a vector \(\vec{y}\) onto a vector \(\vec{x}\), notation \(\operatorname{proj}_{\vec{x}}\vec{y}\). The formula uses the dot product introduced in the previous section as follows:
A simple algorithm to compute the \(\operatorname{proj}_{\vec{y}}\vec{x}\) starts by scattering both vectors as in the previous section, but now processes call the \(\mathsf {{\color{#800066}{allreduce}}}\) operation (twice) so that \(\vec{x} \cdot \vec{y}\) and \(\vec{x} \cdot \vec{x}\) become known to all processes. After that, each process computes its part of the projection vector and, eventually, the root process gathers the resulting vector. The program is as follows:
The first five lines coincide with the first lines of example in the previous section. The remaining code instructs processes to compute the dot product of \({\mathsf {x}}\) by \({\mathsf {y}}\) and of \({\mathsf {x}}\) by itself, both locally and using function \({\mathsf {serial\_dot}}\), and then to engage in two \({\mathsf {\color{#800066} {allreduce}}}\) operations where each process contributes with its local portion of the dot products, the sum of which is then provided to each process (lines 6–7). Because all processes now know both \(\vec{x} \cdot \vec{y}\) and \(\vec{x} \cdot \vec{x}\), they can proceed with the computation of the local projection of \({\mathsf { y}}\) over \({\mathsf {x}}\), using function \({\mathsf {project}}\). In the end, process rank \({\mathsf {0}}\) gathers the contribution of all processes and stores the result in reference \({\mathsf {proj\_y\_x}}\) (line 9).
The type for this code is as follows:
We can easily check that the code type checks against the type. The first three lines coincide with the type of the previous section and we already explained how they match against the first five lines of the program. As for the remaining code, the \({\mathsf {\color{#800066} {allreduce}}}\) operations in lines 6 and 7 yield the \({\mathsf {\color{#5475D9} {allreduce}}}\) types in lines 4 and 5. In both cases, variables \({\mathsf {x\_dot\_y}}\) and \({\mathsf {{x\_dot\_x}}}\) of type \({\mathsf {\color{#5475D9} {int}}}\) are introduced. Then, line 9 in the source code correspond to line 6 in the type. The \({\mathsf {{project}}}\) function provides an \({\mathsf {\color{#5475D9} {int}} \ [\mathsf{n/} {\mathsf {\color{#5475D9} {size}}]}}\) array that is gathered into an \({\mathsf {\color{#800066} {int}}[n]}\) array as prescribed by the type. The restrictions on array lengths imposed by the typing rule for \({\mathsf {\color{#800066} {reduce}}}\) are met due to the return type of the \({\mathsf {{project}}}\) function (which we assume being \({\mathsf {\color{#800066} {int}}[{\mathsf {n/} {\mathsf{\color{#800066}size}}}]}\)) and by the type of array \({\mathsf {{proj\_y\_x}}}\).
Fig. 17.
The extensions introduced in this section are summarised in Figure 17. It is instructive to compare the type formation rule of \({\bf \mathsf {{\color{#5475D9}{allreduce}}}}\) against that of \({\bf \mathsf {{\color{#5475D9}{reduce}}}}\) (Figure 6): the former allows the continuation type \({\color{#5475D9}{T}}\) to use the value obtained by reduction (via variable \({\color{#800066}{x}}\)), whereas the latter does not even mention the continuation type. The difference is that, in \({\bf \mathsf {{\color{#5475D9}{allreduce}}}}\), all processes share a common value, whereas in \({\bf \mathsf {{\color{#5475D9}{reduce}}}}\) this value is held by the root process alone and therefore cannot constitute a dependency for the global protocol.
More collective operations in the spirit of MPI [17] can easily be added. Examples include a simple \(\mathsf {{\color{#800066}{barrier}}}\) where processes synchronise and progress only when all have arrived at the barrier, \(\mathsf {{\color{#800066}{sendreceive}}}\) where two processes exchange a pair of bidirectional messages, and \(\mathsf {{\color{#800066}{alltoall}}}\) where each process proposes a value to each other process and then collects values from all processes.
6.3 Pairs and allreducemaxloc
This section introduces pairs as a new datatype as well as a new collective communication primitive. Operator \(\mathsf {{\color{#800066}{allreducemaxloc}}}\) behaves very much like \(\mathsf {{\color{#800066}{allreduce}}}\) (Figure 17), only that each process contributes a value-rank pair and the value obtained from the reduction process, a pair with the max value and its rank, is disseminated to all processes (including the root, as usual).
As an example we investigate the parallel implementation of the Gaussian elimination algorithm based on a data distribution of a matrix A and a sequence of matrices \(A^k\), \(k=1,\dots ,n-1\), organised in a row-cyclic distribution and a column-oriented pivoting [61]. If n is the dimension of the matrix, each process q owns rows \(q, q+{\color{#800066}{{\it size}}}, q+2\cdot {\color{#800066}{{\it size}}},\dots\), that is, it owns all rows i with \(i\%{\color{#800066}{{\it size}}}=q\) for \(0 \le i \lt n\). The algorithm works as follows: for the forward elimination phase each process takes its elements of column k (\(0 \le k \lt n-1\)) and determines the local pivot by computing the element with the largest absolute value. The global pivot is the largest absolute value of the local pivots. In case the pivot is owned by a process other than the one holding row, the line is exchanged between these two processes. In order for elimination to happen locally, the pivot is broadcast to all processes. For the backward substitution, process \(n-1\) computes locally and informs all other processes of the value computed; then, process \(n-2\) does its part, and so on, until process 0 completes the computation.
A possible implementation of the Gauss elimination algorithm is as follows:
Since our loops unroll downwards, we use expression \({\mathsf {\color{#800066} {n-2-k}}}\) to force column indices to move upwards. The program starts by informing all processes of the problem size \({\mathsf {\color{#800066} {n}}}\) and then allocates and initialises the arrays to hold the coefficient matrix A, vector b, and the unknowns vector x. The program focus on the communication between processes during forward elimination and backward substitution, so we assume that row cyclic distribution has taken place and is stored at every process. Functions \({\mathsf {\color{#800066} {read\_A\_row\_cyclic\_distribution}}}\) and \({\mathsf {\color{#800066} {read\_b\_row\_cyclic\_}}}\)\({\mathsf {\color{#800066} {distribution}}}\) load the data appropriately to arrays a and b, respectively (lines 2–5).
The forward elimination operates on a column at a time and is anchored around the selection of a pivot. For that, each process computes the (local) pivot for the lines it holds (function \({\mathsf {\color{#800066} {pivot}}}\)) and then engages in a reduction operation (\(\mathsf {{\color{#800066}{allreducemaxloc}}}\)) to determine which process owns the (largest local) pivot (lines 8–10). This operation communicates to all processes a pair containing the pivot and the rank of the process holding it.
To eliminate row \(n-2-k\), two situations may happen: either the row to eliminate and the pivot row are on the same process, in which case the elimination can proceed without any communication (lines 18–19); or rows are on different processes. In this case, the process that owns the pivot row must send the significant part of the row (the \(k+2\) elements that are not 0) to the process that owns row \(n-2-k\) where the elimination is going to take place (lines 12–16). The collective conditional in line 11 ensures that all processes decide equally. To conclude the elimination, the significant portion of the pivot row is distributed among all processes so they can apply the elimination procedure to the rows they own (lines 20–21).
The backward substitution phase, starting at \(n-1\), computes the unknowns. The program relies on function \({\mathsf {\color{#800066} {solve\_k}}}\) to compute the \({\mathsf {\color{#800066} {k}}}\)th unknown. The process that owns the unknown broadcast it to all processes so they update its \({\mathsf {\color{#800066} {x}}}\) vector.
A possible type for the Gauss elimination algorithm is as follows:
where we have omitted the variables (\({\mathsf {\color{#800066} {\_}}}\)) as well as the continuations (\({\mathsf {\color{#5475D9} {skip}}}\)) of the two \({\mathsf {\color{#5475D9} {broadcast}}}\) operators. We can easily see that the expression above conforms to the type. The \({\mathsf {\color{#800066} {broadcast}}}\) expressions in line 1 corresponds to the \({\mathsf {\color{#5475D9} {broadcast}}}\) in line 1 of the type; they both introduce variable \({\mathsf {{n}}}\) of datatype \({\mathsf {\color{#5475D9} {nat}}}\). The \({\mathsf {\color{#800066} {allreducemaxloc}}}\), the \({\mathsf {\color{#800066} {for}}}\) and the \({\mathsf {\color{#800066} {ifc}}}\) in the code correspond to their counterparts in the type. The \({\mathsf {\color{#800066} {send}}}\) to \({\mathsf {{node}}}\) and the \({\mathsf {\color{#800066} {receive}}}\) from \({\mathsf {(n-2-k) \ the \ code \ (lines \ 14 \ and \ 16)\ correspond \ the \ lstinline }}\)message (n-2-k) in the code. The rest of the expression conforms to the rest of the type. For the \({\mathsf {\color{#5475D9} {broadcast}}}\) in line 6, notice that \({\mathsf {{buf}}}\) is an integer array of size \({\mathsf {{k+2}}}\). For that in line 8, we assume that function \({\mathsf {{solve\_k}}}\) returns an \({\mathsf {\color{#5475D9} {int \ \ ref}}}\).
Fig. 18.
Figure 18 describes the extensions introduced in this section. The pair constructor, \({\color{#800066}{(i,j)}}\), and its two deconstructors, \({\color{#800066}{\mathsf {{\color{#800066}{fst}}}(i)}}\) and \({\color{#800066}{\mathsf {{\color{#800066}{snd}}}(i)}}\), are standard, and so is the datatype \({\color{#5475D9}{D\times E}}\) for pairs. The formation and subtyping rules are also standard. The figure introduces a new type constructor, \({\bf \mathsf {{\color{#5475D9}{allreducemaxloc}}}}\), similar to \({\bf \mathsf {{\color{#5475D9}{allreduce}}}}\) (Figure 17), except that each process provides a pair with the value and its rank (rather than just the value), and a pair is collected, composed of the maximum (rather than the sum) element and the rank of the process that proposed the element. In the examples, and for the sake of readability, we use pattern matching in the \({\mathsf {\color{#5475D9} {allreducemaxloc}}}\) expression to deconstruct pairs, obviating the use of \({\mathsf {\color{#800066} {fst}}}\) and \({\mathsf {\color{#800066} {snd}}}\) deconstructors in the subsequent code (and similarly for the type).
In order to simplify the type theory we decided that messages carry integer values only (Figure 9). The \({\mathsf {\color{#800066} {send}}}\) and \({\mathsf {\color{#800066} {receive}}}\) expressions in the example exchange arrays (of datatype \({\mathsf {\color{#800066} {int}}} \ \mathsf{[k+2]}\)). In order to type check the example we need to generalise rules E-Send and E-recv so as to allow exchanging arrays as well.
We can easily check that agreement (Lemma 4.3) holds for the extensions and that subtyping is still a preorder (Lemma 3.1). A theory for pairs is not difficult to develop, if not present as primitive in SMT solvers.
7 Related Work
This section compares ParTypes to related approaches found in theliterature.
Session Type Theories. Among all theoretical works on session types, the closest to ours is probably that of Deniélou et al. [11], introducing dependent types and a form of primitive recursion into session types. ParTypes aims at verifying real parallel programming languages, so that it provides for various communication primitives (in contrast to message passing only in [11]) and incorporates dependent collective choices. On the other hand, we do not allow session delegation which is not needed in parallel programming languages such as MPI. At the term level, we work with an imperative language, as opposed to a variant of the the \(\pi\)-calculus used by Deniélou et al. [11]. Kouzapas et al. [41] introduce a notion of broadcast in the setting of session types. A new operational semantics system provides for the description of 1-to-n and n-to-1 message passing, where n is not fixed a priori, meaning that a non-deterministic number of processes may join the message-passing operation, the others being left waiting. Types, however, do not distinguish point-to-point from broadcast operations. We work on a deterministic setting and provide a much richer choice of type operators.
Toninho and Yoshida [74] propose a value-dependent multiparty session calculus and study several conditions for ensuring the well-formedness of global types. The base syntax is a variant of the (multiparty session) \(\pi\)-calculus; parameterisation over the number of participants (as in ParTypes and others [8, 53, 54]) is not studied. See the following paragraph for comparison with multiparty session-based approaches.
Toninho and Yoshida [75] propose a dependent typed calculus that combines functions and linear logic-based session-typed processes, allowing full value dependency (i.e., higher-order functional dependency). The base syntax is the linear logic based \(\pi\)-calculus and its expressiveness is limited to a strong normalising (linear-logic based) fragment of the \(\pi\)-calculus.
Scribble. Based on the theory of multiparty session types by Honda et al. [38],39], Scribble [34, 36, 66, 82] is a language to describe protocols for message-passing programs. Protocols written in Scribble include explicit senders and receivers, thus ensuring that all senders have a matching receiver and vice versa. Global protocols are projected into each participant, yielding one local protocol for each participant present in the global protocol. Developers can then implement programs based on the local protocols and using standard message-passing libraries, as in Multiparty Session C [56].
Pabble [54] is a parametric extension of Scribble, which adds indices to participants and represents Scribble protocols in a compact and concise notation for parallel programming. Pabble protocols can represent interaction patterns of MPI programs where the number of participants in a protocol is decided at runtime. Pabble was applied to generate communication safe-by-construction MPI programs [53, 55], leveraging the close affinity between Pabble protocols and MPI programs. These works show how protocol languages can be used for verifying or constructing MPI programs. Recently an indexed dependent-type theory for Scribble which allows more flexible projection was proposed by Castro-Perez et al. [8] and applied to generating APIs for Go programming language.
In ParTypes, we depart from multiparty session types along two distinct dimensions: (a) our protocol language is specifically built for MPI primitives, and (b) we do not explicitly project a protocol nor generate the MPI code but else check the conformance of code against a global protocol. In contrast to ParTypes, works on parameterised session types [8, 53, 54] cannot deal with:
•
Protocols where a given communication (say the source or the target) depends on the contents of previously exchanged data;
•
Protocols whose behaviour does not depend directly on message passing, but else on a data-dependent common agreement among all processes (what we call collective operations); and
•
Most of the collective operations (broadcast, gather, scatter, reduce) primitives, as well as general and array passing.
Recent work by Zhou et al. [84] studies refinement types in multiparty session types and proposes a toolchain for F⋆ [73]. This work does not treat parameterisation of participants like ours and [8], [53], and [54], but handle a general value dependency which can be updated when unrolling the recursions.
Choreographic and Multitier Programming. Choreographic programming [6, 7] is another formalism rooted in the theory of the \(\pi\)-calculus, which can express multiparty communication protocols by providing primitives for programming global interactions. One of the early examples is the W3C’s Web Services Choreography Description Language [78], targeting mainly web services and distributed systems. Multitier programming [79], on the other hand, are rooted in the theory of the \(\lambda\)-calculus and enable programming computations that spans across several tiers of a distributed system, by supporting primitives for changing the location of program execution. The work by Giallorenzo et al. [24] explains similarities between these two paradigms. The main difference between our work and choreographic programming can be seen from the following perspective: ParTypes start with code for each participant and looks for a type that makes each code typable, whereas choreographic programming starts with a single code that is then projected into the different participants.
Dependent Type Systems. Following Martin-Löf’s work on constructive type theory [48], a number of programming languages have made use of dependent type systems. Rather than taking advantage of the power of full dependent type systems (that brings undecidability to type checking), Freeman and Pfenning [21] and Xi and Pfenning [81] introduce a restricted form of dependent types, where types may refer to values of a restricted domain, as opposed to values of the term language. The type checking problem is then reduced to constraint satisfiability, for which different tools are nowadays available. ParTypes follows this approach. Xanadu [80] incorporates these ideas in a imperative C-like language. F⋆ [73], Omega [67], Liquid Types [63], and Liquid Haskell [76] are further examples of pure functional languages that resort to theorem provers. All these languages share refinement types with ParTypes, but are functional, so that their type systems cannot abstract program’s communication patterns.
TheParTypesFamily. This work constitutes the result of several attempts at verifying C+MPI programs using a type-based approach. Starting with Honda et al. [35], key session type abstractions to capture the general traits of MPI programs were identified, for instance: rank-based communication, collective operations, typical communication patterns (e.g., ring, mesh), and the possible choice/coexistence between blocking and nonblocking operations. Subsequent work proposed a preliminary evaluation of the approach and experiments [47], where we did not make use of a protocol language, verification did not scale and also required an a priori defined number of processes. We also considered the type-based verification of WhyML parallel programs [64] and the synthesis of correct-by-construction C+MPI programs from protocol specifications [43].
López et al. [46] first presented ParTypes in its maturity, introducing a type-based methodology for imperative message-passing parallel programs, the implementation of a protocol verification toolchain, and an experimental evaluation of type-based verification against model checking and runtime verification.
ParTypes have influenced different research works. Martins et al. [49] extend ParTypes with type inference: from the source code of one individual process, it attempts to generate its type. Then it gradually merges the type of each individual process into a single type. In case the type can be generated, then the program is deadlock-free and assured to behave as prescribed by the type. The verification method by Martins et al. [49] depends on the number of processes, which limits its extensibility. An additional difference is the omission of parametric types. Finally, ParTypes influenced further extensions of session type theories in other application fields. For example, López et al. [45] extend the primitives for collective communication taking into consideration the availability of nodes in a network, which served to model coordination protocols used in electrical distribution networks [44].
Static Analysis for MPI-like Programs. Fu et al. [22] present MPISE, a symbolic checker for MPI programs written in C. The analysis extracts a subset of the MPI interfaces and checks whether there is a path that deadlocks. The set of MPI primitives include synchronous, point-to-point communication, collectives, as well as wildcard receives, but no scatter-gather operations. The symbolic execution takes the state of the program and uses a scheduler to analyse the result of possible continuations. If one of the continuations reports a deadlock, the process stops. MPISE offers deadlock-freedom guarantees, but no protocol conformance for endpoints have been studied. Droste et al. [12] present an extension of the Clang static analyzer for MPI programs. It performs a combination of abstract-syntax tree analysis and symbolic execution to check for properties including type mismatch, buffer referencing, protocol well-formedness and deadlock freedom. In comparison to our work, MPI-checker does not consider errors coming from the use of collective communications. Botbol et al. [2] apply abstract interpretation techniques to MPI-communication. Their technique represents sets of program states as lattice automata, and the program semantics is encoded as symbolic transducers. Safety properties are encoded as lattice automaton, and the verification of properties computes whether the intersection of the languages generated by the program and a safety property is empty. The technique was tested in a subset of the MPI language considered by ParTypes, including point-to-point message passing, broadcast. and reduce. In comparison, the method proposed by Botbol et al. [2] allows the verification of properties regarding the state of the MPI program, and the method can be used to check whether reachable program states are not matched by any transition, thus detecting deadlocks.
Model Checking. Verification techniques based on model checking require tracking the state of each of the processes involved, which easily becomes a non-scalable solution. Some symbolic model checkers, such as TASS [71] and CIVL [70] employ symbolic model checking techniques in order to verify safety properties such as deadlock detection, buffer overflows and memory leaks. TASS checks for the functional equivalence between MPI programs and sequential counterparts [71]. Böhm et al. [1] check for deadlocks by implementing partial-order reduction techniques to construct the state space directly. Yu [83] presents a symbolic model checking technique for detecting deadlocks in MPI programs with non-deterministic sends and non-blocking receives: it first collects all the paths where processes block or terminate, and for each of the non-deterministic communication actions, it generates symbolic states representing different message matchings. The path-level models are used to generate a CSP [32] model representing possible interleavings, which is then used for model checking against a global reachability property representing deadlock freedom.
Tools for the Verification of MPI Programs. The MPI standard has more than 20 years of existence, and it is not surprising that it has generated a large number of tools for validating and verifying protocols. Gopalakrishnan et al. [25] provide an initial perspective on the type of formal techniques used ensuring the reliability of message-passing programs with the MPI standard.
Tools such as the in-situ partial order (ISP) [58], the distributed analyser for MPI (DAMPI) [77] and Marmot [3, 30, 31, 42] are runtime verifiers that aim at detecting deadlocks, hence are dependent on the quality of the tests. These tools typically require an overhead at runtime. The overhead can be manifested in additional wrappers that intercept any MPI call issued by the application (e.g., Marmot [42]), or by additional threads per each MPI process (e.g., Umpire [3]). Even with architectural changes that decrease the runtime overhead (cf. MUST [31]), the performance of runtime verifiers depends on the number of processes and the number of loop iterations that monitored programs execute. In contrast, our type checker running time does not depend on the number of loop iterations, yet reliance on an SMT solver renders type checking exponential on the number of variables in verification conditions, as opposed to the number of processes.
The Hermes tool [40] implements a hybrid verification technique to limit the growth in the state space when verifying multipath MPI programs, for instance, those using wildcard operations. The tool combines explicit-state model checking and symbolic analysis: the explicit part encodes the program in the form of rules to constraint rules, that are fed to an SMT solver. In case of unsatisfiability, the symbolic analysis modifies the scheduler to infer the existence of another executable control flow path. It is difficult to compare the approach taken by Hermes with respect to ours, since most of the work has been placed with the assumption that data is received using wildcard receives, and no collective communications are supported.
The MOPPER tool [15, 16] is a verifier that detects deadlocks by analyzing execution traces of MPI programs. It executes an MPI program, collects the trace, builds a formula from the trace that determines whether there exists a valid MPI run that respects the matches-before order and yields a deadlock, and checks for satisfiability. This technique differs from our solution, that performs verification without requiring running the program.
Errors and Exceptions. Large-scale parallel applications can suffer from different kinds of errors that cannot be overlooked. Exceptions handling at the protocol level has been studied in the context of session and behavioural types [4, 5, 20, 51]. These works allows for communicating peers to escape from one point of a protocol to another in a coordinated fashion, so that processes are guaranteed to follow a protocol even in the presence of errors. On the MPI side, little work has been done on error/exception verification, apart from DeFreez et al. [10] that study error code propagation on MPI library implementations, notably MPICH. The current version of ParTypes does not address errors or exceptions; Section 8 discusses ideas for future work.
8 Conclusion and Discussion
This article presents ParTypes, a theory of types for parallel programming whereby programs that can be given a type are guaranteed to be strongly normalising, hence exempt from deadlocks. Furthermore, in ParTypes, checking program formation against a type is a decidable property, and we present an algorithm for the effect. The ParTypes language is quite rich, allowing to describe a large number of algorithms usually found in textbooks on parallel programming. Type-based approaches have clear advantages against competing solutions for the verification of the sort of functional properties that can be captured by types. With respect to testing, not only they forego the writing (and running) of test suits, but they give a far greater confidence. When compared to model checking, they do not suffer from scalability problems on what concerns, e.g., the problem size or the number of loops iterations in a program. There are however limitations to the current version of ParTypes, which we discuss in turn.
Type Equivalence Is Too Intensional. Type equivalence, as introduced in this article, is at times not flexible enough, requiring source code to be “too aligned” with the type. This restrains developers’ options when writing code for a given type. An example features one type with a “fat” loop inside which a conditional avoids sending a message to a fixed process, and another type with two consecutive loops achieving the same effect (see Section 3.3). The types are not equivalent but programs exhibit the same interactive behaviour, namely they both send the same number of messages, in the same order. A bisimulation-based notion of type equivalence could alleviate this problem.
Non-Blocking Message Passing. The sort of point-to-point messages that we consider are synchronous, blocking, and non-buffered: the sending and the receiving processes wait for each other, then exchange a value, and then continue their works. Asynchronous, or non-blocking, or buffered messages allow the superposition of computation and communication: the sender process writes to the buffer and continues; the receiver only blocks if the buffer is empty. Many efficient algorithms use buffered message-passing. Modelling at type level buffered messages poses some subtle difficulties we did not manage to overcome. We present two of these hurdles: keeping track of the message ordering between pairs of sender-receiver processes and accounting for the effective transmission of non-blocking messages. The former may cause deadlocks, while the latter may originate data races.
For the former, the order of the messages exchanged between the same pairs of processes must be maintained, regardless of the blocking/non-blocking semantics. On the other hand, messages not related to the same sender-receiver pair can be interleaved. Capturing such message intertwining in a satisfactory manner in a type system turns out be quite a challenge.
As for the latter, a sender or receiver process may only use the data in messages after its effective transmission. Otherwise, the process may enter a data race with the underlying middleware responsible for the actual movement of the data between processes. Therefore, every non-blocking send/receive operation needs to be matched by a wait operation, which marks the point in the program after which data may be safely used. Finding an adequate representation, in the type system, for uniquely identifying such matches revealed a challenge, in particular, in the presence of primitive recursion.
Receive from Any. MPI includes a receive-from-any primitive (wildcard receive), which leads directly to race conditions, and race conditions may easily lead to deadlocks. Just think of process 0 receiving from any and processes 1 and 2 sending to 0. One of the sending processes will deadlock, a clearly undesirable situation. We analysed some instances of this primitive in textbooks and found many instances where the wildcard can be easily identified. A notable example is the solution to the parallel Gaussian elimination algorithm by Rauber and Rünger [61] (discussed in Section 6.3) where the sending process can be identified, making the wildcard a convenience for not having to explicitly calculate the sender. Genuine races not leading to deadlocks—as in process 0 receives twice from any and processes 1 and 2 send to 0—constitute a challenge we leave for further work.
Process Topologies. Topologies restrict collective communications to subgroups of processes, imposing a logical organisation of processes that mimics the structure of particular problems, for instance, by renumbering ranks so that communicating processes become physically closer to each other. MPI offers n-dimension Cartesian topologies, with primitives to create a new topologies, to obtain the rank of a process in the new topology, and to obtain neighbours to a process by specifying an offset in one of the dimensions. Given the discrete nature of Cartesian topologies, we could think of adding a datatype to describe a topology and the length of each dimension (as a \({\color{#5475D9}{{{\bf \mathsf {{\color{#5475D9}{nat}}}}}{\bf \mathsf {{\color{#5475D9}{[]}}}}}}\)) together with (expression and type) primitives to encode the topology-specific operators. We leave this for future work.
Further Datatypes. A single base datatype is enough to exhibit the versatility of ParTypes. The propositions that refine datatypes are those of the programming language and deal with integer values, references, pairs and arrays. Decidability of type checking depends on the availability (and decidability) of theories for the different datatypes. Realistic programming languages require further base types, including booleans, characters, strings, and floating point numbers. Floating point values are particularly important for scientific computing, a notable application of parallel programming. They can be easily added to store-related indices, together with the relevant operators (Figure 8).
General Recursion. We have deliberately eschewed while loops from our term language. The addition of such a loop construct at the term level would undermine strong normalisation (just think of a process running a while true loop). In a full-fledged programming language, one would probably accept such a tradeoff. General recursion at the type level is a completely different story: type equivalence would become undecidable undermining the whole ParTypes metatheory.
Errors and Exceptions. MPI v4 [18] makes available three run-time error handlers with different semantics (abort all executing processes, abort processes within a communicator, return an error code without aborting). In most of the cases, error handling allows for the application to take appropriate actions based upon the error code. Typically, before terminating, applications flush buffers and save internal state that allows the algorithm to resume later. Ideas put forward on session and behavioural types [4, 5, 20, 51] could be used to enrich ParTypes. We anticipate challenges due to the rudimentary support for error handling made available by the MPI library. ParTypes could explore the new concept of session introduced in MPI v4 that allows for processes to rejoin a session after a failure and resume the computation, thus ensuring that it happens according to the protocol.
Acknowledgments
We thank the anonymous reviewers for the detailed comments and valuable suggestions that greatly improved the quality of the article.
Footnote
1
Variants of the \(\mathsf {{\color{#800066}{reduce}}}\) operator can be easily added (e.g., that compute the maximum). Alternatively, the operator could be parametric on the reducing function at the expense of requiring support for function types.
References
[1]
Stanislav Böhm, Ondřej Meca, and Petr Jančar. 2016. State-space reduction of non-deterministically synchronizing systems applicable to deadlock detection in MPI. In FM 2016: Formal Methods, Lecture Notes in Computer Science (LNCS). Springer, 102–118.
Vincent Botbol, Emmanuel Chailloux, and Tristan Le Gall. 2017. Static analysis of communicating processes using symbolic transducers. In Verification, Model Checking, and Abstract Interpretation, Lecture Notes in Computer Science (LNCS). Springer, 73–90.
Luís Caires and Jorge A. Pérez. 2017. Linearity, control effects, and behavioral types. In Programming Languages and Systems, Hongseok Yang (Ed.). Springer, Berlin, 229–259.
David Castro-Perez, Raymond Hu, Sung-Shik Jongmans, Nicholas Ng, and Nobuko Yoshida. 2019. Distributed programming using role-parametric session types in Go: Statically-typed endpoint APIs for dynamically-instantiated communication structures. Proc. ACM Program. Lang. 3, POPL (2019), 29:1–29:30.
E. Cohen, M. Dahlweid, M. Hillebrand, D. Leinenbach, M. Moskal, T. Santen, W. Schulte, and S. Tobies. 2009. VCC: A practical system for verifying concurrent C. In TPHOLs, Lecture Notes in Computer Science (LNCS), Vol. 5674. Springer, 23–42.
Daniel DeFreez, Antara Bhowmick, Ignacio Laguna, and Cindy Rubio-González. 2020. Detecting and reproducing error-code propagation bugs in MPI implementations. In PPoPP (PPoPP’20). ACM, New York, 187–201.
Alexander Droste, Michael Kuhn, and Thomas Ludwig. 2015. MPI-checker: Static analysis for MPI. In Workshop on the LLVM Compiler Infrastructure in HPC. ACM, 3:1–3:10.
Joshua Dunfield and Neelakantan R. Krishnaswami. 2013. Complete and easy bidirectional typechecking for higher-rank polymorphism. In ICFP. ACM, 429–442.
V. Forejt, D. Kroening, G. Narayanswamy, and S. Sharma. 2014. Precise predictive analysis for discovering communication deadlocks in MPI programs. In FM (LNCS), Vol. 8442. Springer, 263–278.
Simon Fowler, Sam Lindley, J. Garrett Morris, and Sára Decova. 2019. Exceptional asynchronous session types: Session types without tiers. Proc. ACM Program. Lang. 3, POPL, Article 28 (2019), 29 pages.
X. Fu, Z. Chen, Y. Zhang, C. Huang, W. Dong, and J. Wang. 2015. MPISE: Symbolic execution of MPI programs. In 2015 IEEE 16th International Symposium on High Assurance Systems Engineering. IEEE, 181–188.
Saverio Giallorenzo, Fabrizio Montesi, Marco Peressotti, David Richter, Guido Salvaneschi, and Pascal Weisenburger. 2021. Multiparty languages: The choreographic and multitier cases (pearl). In ECOOP (LIPIcs), Vol. 194. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 22:1–22:27.
Ganesh Gopalakrishnan, Robert M. Kirby, Stephen F. Siegel, Rajeev Thakur, William Gropp, Ewing L. Lusk, Bronis R. de Supinski, Martin Schulz, and Greg Bronevetsky. 2011. Formal analysis of MPI-based parallel programs. Commun. ACM 54, 12 (2011), 82–91.
A. D. Gordon and C. Fournet. 2010. Principles and applications of refinement types. In International Summer School Logics and Languages for Reliability and Security. IOS Press, 73–104.
Per Brinch Hansen. 1991. The N-Body Pipeline. Technical Report. Electrical Engineering and Computer Science Technical Reports Paper 120, College of Engineering and Computer Science, Syracuse University.
T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. 2012. MPI runtime error detection with MUST: Advances in deadlock detection. In SC. IEEE/ACM, 30:1–30:11.
T. Hilbrich, M. Schulz, B. R. de Supinski, and M. S. Müller. 2009. MUST: A scalable approach to runtime error detection in MPI programs. In Parallel Tools Workshop. Springer, 53–66.
K. Honda, E. R. B. Marques, F. Martins, N. Ng, V. T. Vasconcelos, and N. Yoshida. 2012. Verification of MPI programs using session types. In Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science (LNCS), vol. 7490. Springer, 291–293.
K. Honda, A. Mukhamedov, G. Brown, T. Chen, and N. Yoshida. 2011. Scribbling interactions with a formal foundation. In ICDCIT, Lecture Notes in Computer Science (LNCS), vol. 6536. Springer, 55–75.
K. Honda, V. T. Vasconcelos, and M. Kubo. 1998. Language primitives and type discipline for structured communication-based programming. In ESOP, Lecture Notes in Computer Science (LNCS), vol. 1381. Springer, 122–138.
B. Krammer, T. Hilbrich, V. Himmler, B. Czink, K. Dichev, and M. S. Müller. 2008. MPI correctness checking with marmot. In Parallel Tools Workshop. Springer, 61–78.
Hugo A. López and Kai Heussen. 2017. Choreographing cyber-physical distributed control systems for the energy sector. In Proceedings of the Symposium on Applied Computing. ACM, 437–443.
Hugo A. López, Flemming Nielson, and Hanne Riis Nielson. 2016. Enforcing availability in failure-aware communicating systems. In FORTE, Lecture Notes in Computer Science (LNCS), vol. 9688. Springer, 195–211.
Hugo A. López, Eduardo R. B. Marques, Francisco Martins, Nicholas Ng, César Santos, Vasco Thudichum Vasconcelos, and Nobuko Yoshida. 2015. Protocol-based verification of message-passing parallel programs. In OOPSLA. ACM, 280–298.
E. R. B. Marques, F. Martins, V. T. Vasconcelos, N. Ng, and N. Martins. 2013. Towards deductive verification of MPI programs against session types. In PLACES (EPTCS), Vol. 137. 103–113.
M. Moskal. 2011. Verifying functional correctness of C programs with VCC. In NASA Formal Methods, Lecture Notes in Computer Science (LNCS), vol. 6617. Springer, 56–57.
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2020. A Primer on Memory Consistency and Cache Coherence, Second Edition. Morgan & Claypool Publishers.
Nicholas Ng, José Gabriel de Figueiredo Coutinho, and Nobuko Yoshida. 2015. Protocols by default - safe MPI code generation based on session types. In CC, Lecture Notes in Computer Science (LNCS), vol. 9031. Springer, 212–232.
Nicholas Ng and Nobuko Yoshida. 2017. Behavioural Types: From Theory to Tools. River Publishers, Chapter Protocol-Driven MPI Program Generation, 329–352.
N. Ng, N. Yoshida, and K. Honda. 2012. Multiparty session C: Safe parallel programming with message optimisation. In TOOLS Europe, Lecture Notes in Computer Science (LNCS), vol. 7304. Springer, 202–218.
S. Pervez, G. Gopalakrishnan, R. M. Kirby, R. Palmer, R. Thakur, and W. Gropp. 2007. Practical model-checking method for verifying correctness of MPI programs. In PVM/MPI, Lecture Notes in Computer Science, (LNCS), vol. 4757. Springer, 344–353.
S. F. Siegel and G. Gopalakrishnan. 2011. Formal analysis of message passing. In VMCAI, Lecture Notes in Computer Science (LNCS), vol. 6538. Springer, 2–18.
S. F. Siegel and L. F. Rossi. 2008. Analyzing BlobFlow: A case study using model checking to verify parallel scientific software. In EuroPVM/MPI, Lecture Notes in Computer Science (LNCS), vol. 5205. Springer, 274–282.
Stephen F. Siegel, Manchun Zheng, Ziqing Luo, Timothy K. Zirkel, Andre V. Marianiello, John G. Edenhofner, Matthew B. Dwyer, and Michael S. Rogers. 2015. CIVL: The concurrency intermediate verification language. In SC. ACM, 61:1–61:12.
S. F. Siegel and T. K. Zirkel. 2011. FEVS: A functional equivalence verification suite for high performance scientific computing. Mathematics in Computer Science 5, 4 (2011), 427–435.
S. F. Siegel and T. K. Zirkel. 2012. Loop invariant symbolic execution for parallel programs. In VMCAI, Lecture Notes in Computer Science (LNCS), vol. 7148. Springer, 412–427.
Nikhil Swamy, Catalin Hritcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean Karim Zinzindohoue, and Santiago Zanella Béguelin. 2016. Dependent types and multi-monadic effects in F. In POPL. ACM, 256–270.
A. Vo, S. Aananthakrishnan, G. Gopalakrishnan, B. R. de Supinski, M. Schulz, and G. Bronevetsky. 2010. A scalable and distributed dynamic formal verifier for MPI programs. In SC. IEEE, 1–10.
Tronge JPritchard HBrown J(2023)Improving MPI Safety for Modern LanguagesProceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615328(1-11)Online publication date: 11-Sep-2023
SPLASH 2023: Companion Proceedings of the 2023 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity
Gradual typing supports imprecise types in the type system, allowing incremental migration from untyped code to typed in the same language. Through the gradual typing approach, our ongoing work proposes a new theory based on the Martin-Löf type theory ...
Building on the recent extension of dependent type theory with a universe of definitionally proof-irrelevant types, we introduce TTobs, a new type theory based on the setoidal interpretation of dependent type theory. TTobs equips every type with an ...
In dependent type theory, impredicativity is a powerful logical principle that allows the definition of propositions that quantify over arbitrarily large types, potentially resulting in self-referential propositions. Impredicativity can provide a system ...
Tronge JPritchard HBrown J(2023)Improving MPI Safety for Modern LanguagesProceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615328(1-11)Online publication date: 11-Sep-2023