5 MPIprogramming

MPI Programming — Part 1
Objectives
• Basic structure of MPI code
• MPI communicators
• Sample programs
1
Introduction to MPI
The Message Passing Interface (MPI) is a library of

subroutines (in Fortran) or function calls (in C) that
can be used to implement a message-passing program.
MPI allows the coordination of a program running

as multiple processes in a distributed-memory
environment, yet it is flexible enough to also be used
in a shared-memory environment.
MPI programs can be used and compiled on a

wide variety of single platforms or (homogeneous or
heterogeneous) clusters of computers over a network.
The MPI library is standardized, so working code

containing MPI subroutines and function calls should
work (without further changes!) on any machine on
which the MPI library is installed.
2
A very brief history of MPI
MPI was developed over two years of discussions led

by the MPI Forum, a group of roughly sixty people
representing some forty organizations.
The MPI 1.0 standard was defined in Spring of 1994.
The MPI 2.0 standard was defined in Fall of 1997.
MPI 2.0 was such a complicated enhancement to the

MPI standard that very few implementations exist!
Learning MPI can seem intimidating: MPI 1.1 has

more than 125 different commands!
However, most programmers can accomplish what they

want from their programs while sticking to a small
subset of MPI commands (as few as 6).
In this course, we will stick to MPI 1.1.
3
Basic structure of MPI code
MPI programs have the following general structure:
• include the MPI header file
• declare variables
• initialize MPI environment
• < compute, communicate, etc. >
• finalize MPI environment
Notes:
• The MPI environment can only be initialized once

per program, and it cannot be initialized before the
MPI header file is included.
• Calls to MPI routines are not recognized before the

MPI environment is initialized or after it is finalized.
4
MPI header files
Header files are more commonly used in C than in older

versions of Fortran, such as FORTRAN77.
In general, header files are usually source code for

declarations of commonly used constructs.
Specifically, MPI header files contain the prototypes

for MPI functions/subroutines, as well as definitions of
macros, special constants, and datatypes used by MPI.
An include statement must appear in any source file

that contains MPI function calls or constants.
In Fortran, we type
INCLUDE ‘mpif.h‘
In C, the equivalent is
#include <mpi.h>
5
MPI naming conventions
All MPI entities (subroutines, constants, types, etc.)

begin with MPI to highlight them and avoid conflicts.
In Fortran, they have the general form
MPI XXXXX(parameter,...,IERR)
For example,
MPI INIT(IERR)
The difference in C is basically the absence of IERR:
MPI Xxxxx(parameter,...)
MPI constants are always capitalized, e.g.,
MPI COMM WORLD, MPI REAL, etc.
In Fortran, MPI entities are always associated with the

INTEGER data type.
6
MPI subroutines and return values
An MPI subroutine returns an error code that can be

checked for the successful operation of the subroutine.
This error code is always returned as an INTEGER in
the variable given by the last argument , e.g.,
INTEGER IERR
...
CALL MPI INIT(IERR)
...
The error code returned is equal to the pre-defined
integer constant MPI SUCCESS if the routine ran
successfully:
IF (IERR.EQ.MPI SUCCESS) THEN

do stuff given that routine ran correctly
END IF
If an error occurred, then IERR returns an

implementation-dependent value indicating the specific
error, which can then be handled appropriately.
7
MPI handles
MPI defines and maintains its own internal data

structures related to communication, etc.
These data structures are accessed through handles.
Handles are returned by various MPI calls and may be

used as arguments in other MPI calls.
In Fortran, handles are integers or arrays of integers;

any arrays are indexed from 1.
For example,
• MPI SUCCESS is an integer used to test error codes.
• MPI COMM WORLD is an integer representing a pre-

defined communicator consisting of all processes.
Handles may be copied using the standard assignment

operation.
8
MPI data types
MPI has its own reference data types corresponding to

elementary data types in Fortran or C:
• Variables are normally declared as Fortran/C types.
• MPI type names are used as arguments to MPI

routines when needed.
• Arbitrary data types may be built in MPI from the

intrinsic Fortran/C data types.
MPI shifts the burden of details such as the floating-

point representation to the implementation.
MPI allows for automatic translation between

representations in heterogeneous environments.
In general, the MPI data type in a receive must match

that specified in the send.
9
Basic MPI data types
MPI data type Fortran data type

MPI INTEGER INTEGER
MPI REAL REAL
MPI DOUBLE PRECISION DOUBLE PRECISION
MPI COMPLEX COMPLEX
MPI LOGICAL LOGICAL
MPI CHARACTER CHARACTER
MPI PACKED user-defined
MPI data type C data type

MPI INT signed integer
MPI FLOAT float
MPI DOUBLE double
MPI CHAR signed char
MPI PACKED user-defined
10
Communicators
A communicator is a handle representing a group of

processes that can communicate with one another.
The communicator name is required as an argument

to all point-to-point and collective operations.
• The communicator specified in the send and receive

calls must agree for communication to take place.
• Processes can communicate only if they share a

communicator.
There can be many communicators, and a process can

be a member of a number of different communicators.
Within each communicator, processes are numbered

consecutively (starting at 0).
This identifying number is known as the rank of the

process in that communicator.
11
Communicators
The rank is also used to specify the source and

destination in send and receive calls.
If a process belongs to more than one communicator,

its rank in each can (and usually will) be different.
MPI automatically provides a basic communicator

called MPI COMM WORLD that consists of all available
processes.
Every process can communicate with every other

process using the MPI COMM WORLD communicator.
Additional communicators consisting of subsets of the

available processes can also be defined.
12
Getting communicator rank
A process can determine its rank in a communicator

with a call to MPI COMM RANK.
In Fortran, this call looks like
CALL MPI COMM RANK(COMM, RANK, IERR)
where the arguments are all of type INTEGER.
COMM contains the name of the communicator, e.g.,

MPI COMM WORLD.
The rank of the process within the communicator COMM

is returned in RANK.
The analogous syntax in C looks like
int MPI Comm rank(

MPI Comm comm /* in */
int* my rank p /* out */);
13
Getting communicator size
A process can also determine the size (number of

processes) of any communicator to which it belongs
with a call to MPI COMM SIZE.
In Fortran, this call looks like
CALL MPI COMM SIZE(COMM, SIZE, IERR)
where the arguments are all of type INTEGER.
Again, COMM contains the name of the communicator.
The number of processes associated with the

communicator COMM is returned in SIZE.
The analogous syntax in C looks like
int MPI Comm size(

MPI Comm comm /* in */
int* comm sz p /* out */);
14
Finalizing MPI
The last call to an MPI routine in any MPI program

should be to MPI FINALIZE.
MPI FINALIZE cleans up all MPI data structures,

cancels incomplete operations, etc.
MPI FINALIZE must be called by all processes!
If any processes do not call MPI FINALIZE, the

program will hang.
Once MPI FINALIZE has been called, no other MPI

routines (including MPI INIT!) may be called.
In Fortran, the call looks like
CALL MPI FINALIZE(IERR)
In C, the analogous syntax is
int MPI Finalize(void);
15
hello, world — serial Fortran
Here is a simple serial Fortran90 program called

helloWorld.f90 to print the message “hello, world”:
PROGRAM helloWorld
PRINT *, "hello, world"
END PROGRAM helloWorld
Compiled and run with the commands
gfortran helloWorld.f90 -o helloWorld

./helloWorld
produces the unsurprising output
hello, world
16
hello, world with MPI
hello, world is the classic first program of anyone

using a new computer language.
Here is a simple MPI version using Fortran90.
PROGRAM helloWorldMPI
INCLUDE ’mpif.h’
INTEGER IERR
! Initialize MPI environment
CALL MPI_INIT(IERR)
PRINT *, "hello, world!"
! Finalize MPI environment

CALL MPI_FINALIZE(IERR)
END PROGRAM helloWorldMPI
17
hello, world with MPI
To compile and link this program to an executable

called helloWorldMPI, we could do something like
>> mpif90 -c helloWorldMPI.f90

>> mpif90 -o helloWorldMPI helloWorldMPPI.o
or all at once using
>> mpif90 helloWorldMPI.f90 -o helloWorldMPI
Without a queuing system, we could use
>> mpirun -np 4 ./helloWorldMPI
Thus, when run on four processes, the output of this

program is
hello, world
hello, world
hello, world
hello, world
18
hello, world, the sequel
We now modify the hello, world program so that

each process prints its rank as well as the total number
of processes in the communicator MPI COMM WORLD.
PROGRAM helloWorld2MPI
INCLUDE ’mpif.h’
! Declare variables
INTEGER RANK, SIZE, IERR
! Initialize MPI environment

CALL MPI_INIT(IERR)
! Not checking for errors :)
! Get the rank

CALL MPI_COMM_RANK(MPI_COMM_WORLD, RANK, IERR)
! Get the size
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, SIZE, IERR)
! Display the result
PRINT *, "Processor", RANK, "of ", SIZE, &
"says, ’hello, world’"
! Finalize MPI environment

CALL MPI_FINALIZE(IERR)
END PROGRAM helloWorld2MPI
19
Running this code on 6 processes could produce
something like:
spiteri@robson:~/test> mpirun -np 6 ./helloWorld2MPI
Process 0 of 6 says, ’hello, world’
20
Let’s take a look at a variant of this program in C:

#include <stdio.h>
#include <string.h> /* For strlen */
#include <mpi.h> /* For MPI functions, etc */
const int MAX_STRING = 100;
int main(void)
char greeting[MAX_STRING]; /* String storing message */
int comm_sz; /* Number of processes */
int my_rank; /* My process rank */
/* Start up MPI */
MPI_Init(NULL, NULL);
/* Get the number of processes */

MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
/* Get my rank among all the processes */

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
21
if (my_rank != 0)
/* Create message */
sprintf(greeting, "Greetings from process %d of %d!",
my_rank, comm_sz);
/* Send message to process 0 */
MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0,
MPI_COMM_WORLD);
else
/* Print my message */
printf("Greetings from process %d of %d!\n", my_rank, comm_sz);
for (int q = 1; q < comm_sz; q++)
/* Receive message from process q */
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, q,
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/* Print message from process q */
printf("%s\n", greeting);
/* Shut down MPI */

MPI_Finalize();
return 0;
/* main */
22
Recall we compile this code via
mpicc -g -Wall -o mpi hello mpi hello.c
The code is run with (basically) an identical call as for

the Fortran program:
mpirun -n <number of processes> ./mpi hello
So the output from the command
mpirun -n 4 ./mpi hello
will be
Greetings from process 0 of 4!

23
The main difference between the Fortran and C

programs is that the latter sends the greeting as a
message for process 0 to receive (and print).
The syntax of MPI Send is
int MPI_Send(
void* msg buf p /* in */,
int msg size /* in */,
MPI_Datatype msg_type /* in */,
int dest /* in */,
int tag /* in */,
MPI_Comm communicator /* in */);
The first three arguments determine the contents of

the message; the last three determine the destination.
24
msg buf p is a pointer to the block of memory

containing the contents of the message, in this case
the string greeting.
msg size is the number of characters in the string
plus 1 for the ’\0’ string termination character in C.
msg type is of type MPI CHAR; so together we see that
the message contains strlen(greeting)+1 chars.
Note: the size of the string greeting is not necessarily
the same as that specified by msg size and msg type.
dest gives the destination process.
tag is a non-negative int that can be used to
distinguish messages that might otherwise be identical;
e.g., when sending a number of floats, a tag can
indicate whether a given float should be printed or
used in a computation.
Finally, MPI Comm is the communicator; dest is defined
relative to it.
25
MPI Recv has a similar (complementary) syntax to

MPI Send, but with a few important differences.
int MPI_Recv(
void* msg buf p /* out */,
int buf size /* in */,
MPI_Datatype buf_type /* in */,
int source /* in */,
int tag /* in */,
MPI_Comm communicator /* in */
MPI_Status status_p /* out */);
Again, the first three arguments specify the memory

available (buffer) for receiving the message.
The next three identify the message.
Both tag and communicator must match those sent
by the sending process.
We discuss status p shortly.
It is not often used, so in the example the MPI constant
MPI STATUS IGNORE was passed.
26
In order for a message to be received by MPI Recv, it

must be matched to a send.
Necessary but not sufficient conditions are that the

tags and communicators match and the source and
destination processes must be consistent; e.g., process
A is expecting a message from process B.
The only other caveat is that the sending and receiving

buffers must be compatible.
We will assume this means they are of the same type

and that the receiving buffer size is at least as large as
the sending buffer size.
27
One needs to beware that receiving processes are

generally receiving messages from many sending
processes, and it is generally impossible to predict
the order in which messages are sent (or received).
To allow for flexibility in handling received messages,
MPI provides a special constant MPI ANY SOURCE as
a wildcard that can be used in MPI Recv.
Similarly, the wildcard constant MPI ANY TAG can be
used in MPI Recv to handle messages with any tag.
Two final notes:
1. Only receiving processes can use wildcards; sending

processes must specify a destination process rank
and a non-negative tag.
2. There is no wildcard that can be used for

communicators; both sending and receiving
processes must specify communicators.
28
It is possible for a receiving process to successfully

receive a message while not knowing the sender, the
tag, and the size of the message.
All this information is of course known and can be
accessed via the status p argument.
status p has type MPI Status*, which is a struct
having at least the members MPI SOURCE, MPI TAG,
and MPI ERROR.
We can determine the sender and tag of a received
message from MPI Recv by accessing
status_p.MPI_SOURCE
status_p.MPI_TAG
The amount of data received is not directly accessible

but can be retrieved with a call to MPI Get count.
MPI_Get_count(&status_p, recv_type, &count)
Of course, all of these issues are not necessary and

potentially should be avoided in the name of efficiency.
29
What happens exactly when a message is sent from one

process to another depends on the particular system,
but the general procedure is the following.
The sending process assembles the message, i.e., the

“address” information plus the data themselves.
The sending process then either buffers message and

returns or blocks; i.e., it does not return until it can
begin transmission.
(Other functions are available if it is important that we

know when a message is actually received, etc.)
Specific details depend on the implementation, but

typically messages that are below a certain threshold
in size are automatically buffered by MPI Send.
30
In contrast, MPI Recv always blocks until a matching

message has been received.
So when a call from MPI Recv returns, we know

(barring errors!) that the message has been safely
stored in the receive buffer.
(There is a non-blocking version of this function

that returns having only checked whether a matching
message is available, whether one is or not.)
MPI also requires that messages be non-overtaking,

i.e., the order in which messages sent by one processor
to another must be reflected in the order in which they
can be received.
However, the order in which messages are received (let

alone sent!) by other processes cannot be controlled.
As usual, it is important for programmers to maintain

the mindset that the processes run autonomously.
31
A potential pitfall of blocking communication is that if

a process tries to receive a message for which there is
no matching send, it will block forever or deadlock.
In other words, the program will hang.
It is therefore critical when programming to ensure

every receive has a matching send.
This includes ensuring that tags match and that the

source and destination are never the same process so
the program does not hang but also so that messages
are not inadvertantly received!
Similarly, unmatched calls to MPI Send will typically

hang the sending process; “best” case, the message is
buffered so the process can continue but the message
will be lost.
32
Sample Program: Trapezoidal Rule
Historically in mathematics, quadrature refers to the

act of trying to find a square with the same area of a
given circle.
In mathematical computing, quadrature refers to the
numerical approximation of definite integrals.
Let f (x) be a real-valued function of a real variable,
defined on a finite interval a ≤ x ≤ b.
We seek to compute the value of the definite integral
Z b
f (x) dx.
a
Note: This is a real number.

In line with its historical meaning, “quadrature” might
conjure up the vision of plotting the function on
graph paper and counting the little squares that lie
underneath the curve.
33
Let ∆x = b−a be the length of the integration interval.
The trapezoidal rule T approximates the integral by

the area of a trapezoid with base ∆x and sides equal
to the values of f (x) at the two end points.

f (a) + f (b)
T = ∆x .
2
Put differently, the trapezoidal rule forms a linear

interpolant between (a, f (a)) and (b, f (b)) and
integrates the interpolant exactly to define the rule.
Naturally the error in using the trapezoidal rule depends

on ∆x: in general, as ∆x increases, so does the error.
34
This leads to the composite trapezoidal rule:
Assume the domain [a, b] is partitioned into n (equal)

subintervals so that
b−a
∆x = .
n
Let xi = a + i∆x, i = 0, 1, . . . , n.
Then the composite trapezoidal rule is
n
∆x X
Tn = (f (xi−1) + f (xi))
2 i=1
" n−1
#
f (a) + f (b) X
= ∆x + f (xi) .
2 i=1
35
Serial code for this method might look like
/* input a, b, n */
dx = (b-a)/n;
approx = (f(a)+f(b))/2;
for (i = 1; i <= n-1; i++){
x_i = a + i*dx;
approx += f(x_i);
}
approx = dx*approx;
36
Following Foster’s methodology, we
1. partition the problem (find the area of a single

trapezoid, add them up)
2. identify communication (information about single

trapezoid scattered, single areas gathered)
3. aggregate tasks (there are probably more trapezoids

than cores, so split [a, b] into comm sz subintervals)
4. map tasks to cores (subintervals to cores; results

back to process 0)
37
/* File: mpi_trap1.c
* Purpose: Use MPI to implement a parallel version of the trapezoidal
* rule. In this version the endpoints of the interval and
* the number of trapezoids are hardwired.
*
* Input: None.
* Output: Estimate of the integral from a to b of f(x)
* using the trapezoidal rule and n trapezoids.
*
* Compile: mpicc -g -Wall -o mpi_trap1 mpi_trap1.c
* Run: mpiexec -n <number of processes> ./mpi_trap1
*
* Algorithm:
* 1. Each process calculates "its" interval of
* integration.
* 2. Each process estimates the integral of f(x)
* over its interval using the trapezoidal rule.
* 3a. Each process != 0 sends its integral to 0.
* 3b. Process 0 sums the calculations received from
* the individual processes and prints the result.
*
* Note: f(x), a, b, and n are all hardwired.
*
* IPP: Section 3.2.2 (pp. 96 and ff.)
*/
#include <stdio.h>
/* We’ll be using MPI routines, definitions, etc. */

#include <mpi.h>
/* Calculate local integral */

double Trap(double left_endpt, double right_endpt, int trap_count,
double base_len);
38
/* Function we’re integrating */
double f(double x);
int main(void) {
int my_rank, comm_sz, n = 1024, local_n;
double a = 0.0, b = 3.0, dx, local_a, local_b;
double local_int, total_int;
int source;
/* Let the system do what it needs to start up MPI */

MPI_Init(NULL, NULL);
/* Get my process rank */

MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out how many processes are being used */

MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
dx = (b-a)/n; /* dx is the same for all processes */

local_n = n/comm_sz; /* So is the number of trapezoids */
/* Length of each process’ interval of

* integration = local_n*dx. So my interval
* starts at: */
local_a = a + my_rank*local_n*dx;
local_b = local_a + local_n*dx;
local_int = Trap(local_a, local_b, local_n, dx);
/* Add up the integrals calculated by each process */

if (my_rank != 0) {
MPI_Send(&local_int, 1, MPI_DOUBLE, 0, 0,
MPI_COMM_WORLD);
} else {
total_int = local_int;
for (source = 1; source < comm_sz; source++) {
MPI_Recv(&local_int, 1, MPI_DOUBLE, source, 0,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
39
total_int += local_int;
}
}
/* Print the result */

if (my_rank == 0) {
printf("With n = %d trapezoids, our estimate\n", n);
printf("of the integral from %f to %f = %.15e\n",
a, b, total_int);
}
/* Shut down MPI */

MPI_Finalize();
return 0;
} /* main */
/*------------------------------------------------------------------
* Function: Trap
* Purpose: Serial function for estimating a definite integral
* using the trapezoidal rule
* Input args: left_endpt
* right_endpt
* trap_count
* base_len
* Return val: Trapezoidal rule estimate of integral from
* left_endpt to right_endpt using trap_count
* trapezoids
*/
double Trap(
double left_endpt /* in */,
double right_endpt /* in */,
int trap_count /* in */,
double base_len /* in */) {
double estimate, x;
int i;
40
estimate = (f(left_endpt) + f(right_endpt))/2.0;
for (i = 1; i <= trap_count-1; i++) {
x = left_endpt + i*base_len;
estimate += f(x);
}
estimate = estimate*base_len;
return estimate;
} /* Trap */
/*------------------------------------------------------------------
* Function: f
* Purpose: Compute value of function to be integrated
* Input args: x
*/
double f(double x) {
return x*x;
} /* f */
41
Summary
• MPI header file, initialize / finalize MPI environment
• MPI entities (naming, error codes, handles,

data types)
• hello, world, and hello, world, the sequel
• Basic trapezoidal rule
42

5 MPIprogramming

Uploaded by

Copyright:

Available Formats

5 MPIprogramming

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 MPIprogramming

Uploaded by

Copyright:

Available Formats

MPI Programming — Part 1

• Basic structure of MPI code

The Message Passing Interface (MPI) is a library of

MPI allows the coordination of a program running

MPI programs can be used and compiled on a

The MPI library is standardized, so working code

MPI was developed over two years of discussions led

The MPI 1.0 standard was defined in Spring of 1994.

The MPI 2.0 standard was defined in Fall of 1997.

MPI 2.0 was such a complicated enhancement to the

Learning MPI can seem intimidating: MPI 1.1 has

However, most programmers can accomplish what they

In this course, we will stick to MPI 1.1.

MPI programs have the following general structure:

• include the MPI header file

• initialize MPI environment

• < compute, communicate, etc. >

• finalize MPI environment

• The MPI environment can only be initialized once

• Calls to MPI routines are not recognized before the

Header files are more commonly used in C than in older

In general, header files are usually source code for

Specifically, MPI header files contain the prototypes

An include statement must appear in any source file

All MPI entities (subroutines, constants, types, etc.)

In Fortran, they have the general form

The difference in C is basically the absence of IERR:

MPI constants are always capitalized, e.g.,

MPI COMM WORLD, MPI REAL, etc.

In Fortran, MPI entities are always associated with the

An MPI subroutine returns an error code that can be

IF (IERR.EQ.MPI SUCCESS) THEN

If an error occurred, then IERR returns an

MPI defines and maintains its own internal data

These data structures are accessed through handles.

Handles are returned by various MPI calls and may be

In Fortran, handles are integers or arrays of integers;

• MPI SUCCESS is an integer used to test error codes.

• MPI COMM WORLD is an integer representing a pre-

Handles may be copied using the standard assignment

MPI has its own reference data types corresponding to

• Variables are normally declared as Fortran/C types.

• MPI type names are used as arguments to MPI

• Arbitrary data types may be built in MPI from the

MPI shifts the burden of details such as the floating-

MPI allows for automatic translation between

In general, the MPI data type in a receive must match

MPI data type Fortran data type

MPI data type C data type

A communicator is a handle representing a group of

The communicator name is required as an argument

• The communicator specified in the send and receive

• Processes can communicate only if they share a

There can be many communicators, and a process can

Within each communicator, processes are numbered

This identifying number is known as the rank of the

The rank is also used to specify the source and

If a process belongs to more than one communicator,

MPI automatically provides a basic communicator