Introduction, or What's Wrong With Sockets?: Remote Procedure Calls
Introduction, or What's Wrong With Sockets?: Remote Procedure Calls
Introduction, or What's Wrong With Sockets?: Remote Procedure Calls
and receiving messages over a network. The following sequence of operations takes
place (from p. 693 of W. Richard Steven’s UNIX Network Programming):
1 10 6 5
client stub server stub
kernel 2 9 7 4 kernel
network routines 3 network routines
3. Network messages are transferred by the kernel to the remote system via
some protocol (either connectionless or connection-oriented).
5. The server stub executes a local procedure call to the actual server function,
passing it the arguments that it received from the client. 6. When the server
is finished, it returns to the server stub with its return values.
7. The server stub converts the return values (if necessary) and marshals them
into one or more network messages to send to the client stub.
8. Messages get sent back across the network to the client stub.
9. The client stub reads the messages from the local kernel.
10. It then returns the results to the client function (possibly converting them
first).
The client code then continues its execution…
The major benefits of RPC are twofold: the programmer can now use procedure call
semantics and writing distributed applications is simplified because RPC hides all of the
network code into stub functions. Application programs don’t have to worry about
details (such as sockets, port numbers, byte ordering).
Using the OSI reference model, RPC is a presentation layer service.
Several issues arise when we think about implementing such a facility:
1 Big endian storage stores the most significant byte(s) in low memory. Little endian storage stores the most
significant byte(s) of a word in high memory. Machines such as Sun Sparcs and 680x0s use big endian
storage. Machines such as Intel x86/Pentium, Vaxen, and PDP-11s use little endian.
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 3
Remote Procedure Call
about a remote procedure call? Figure 2. Compilation steps for Remote Procedure Calls
Think of the extra steps involved. Just calling the client stub function and getting a
return from it incurs the overhead of a procedure call. On top of that, we need to
execute the code to marshal parameters, call the network routines in the OS (incurring
a context switch), deal with network latency, have the server receive the message and
switch to the server process, unmarshal parameters, call the server function, and do it
all over again on the return trip. Without a doubt a remote procedure call will be much
slower.
Advantages of RPC
You don’t have to worry about getting a unique transport address (a socket on a
machine). The server can bind to any port and register the port with its RPC
name server. The client will contact this name server and request the port
number that corresponds to the program it needs.
The system is transport independent. This makes code more portable to
environments that may have different transport providers in use. It also allows
processes on a server to make themselves available over every transport
provider on a system.
Applications on the client only need to know one transport address—that of the
rpcbind (or portmap) process.
The function-call model can be used instead of the send/receive (read/write)
interface provided by sockets.
All the programmer has to write is a client procedure, the server functions, and the
RPC specification. When the RPC specification (a file suffixed with .x, for example a
file named date.x) is compiler with rpcgen, three or four files are created. These are
(for date.x):
2 This restriction has now been removed and can be disabled with a command-line option to rpcgen.
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 6
Remote Procedure Call
the RPC definition file converted to lower case and suffixed by an underscore followed
by a version number. For example, BIN_DATE becomes a reference to the function
bin_date_1. Your server must implement bin_date_1 and the client code should issue
calls to bin_date_1.
The server
When we start the server, the server stub runs and puts the process in the background
(don’t forget to run ps to find it and kill it when you no longer need it)3. It creates a
socket and binds any local port to the socket. It then calls a function in the RPC library,
svc_register, to register the program number and version. This contacts the port
mapper. The port mapper is a separate process that is usually started at system boot
time. It keeps track of the port number, version number, and program number. On
UNIX System V release 4, this process is rpcbind. On earlier systems, it was known as
portmap.
The server then waits for a client request (i.e., it does a listen).
The client
When we start the client program, it first calls clnt_create with the name of the remote
system, program number, version number, and protocol. It contacts the port mapper on
the remote system to find the appropriate port on that system.
The client then calls the RPC stub function (bin_date_1 in this example). This
function sends a message (e.g., a datagram) to the server (using the port number found
earlier) and waits for a response. For datagram service, it will retransmit the request a
fixed number of times if the response is not received.
The message is then received by the remote system, which calls the server function
(bin_date_1) and returns the return value back to the client stub. The client stub then
returns to the client code that issued the call.
Microsoft DCOM
In April 1992, Microsoft released Windows 3.1 which included a mechanism called
OLE (object linking and embedding). This allowed a program to dynamically link other
libraries to allow facilities such as embedding a spreadsheet into a Word document (this
wasn’t a Microsoft innovation – they were just trying to catch up to Apple). OLE
evolved into something called COM (Component Object Model). A COM object is a
binary file. Programs that use COM services have access to a standardized interface for
the COM object (but not its internal structures). COM objects are named with globally
unique identifiers (GUIDs) and classes of objects are identified with class IDs. Several
methods exist to create a COM object (e.g., CoGetInstanceFromFile). The COM
libraries look up the appropriate binary code (a DLL or executable) in the system
registry, create the object, and return an interface pointer to the caller.
4 Polymorphism is the ability to create different functions with the same name. The appropriate function is invoked based on
its parameters.
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 8
Remote Procedure Call
DCOM (Distributed COM) was introduced with Windows NT 4.0 in 1996 and is an
extension of the Component Object Model to allow objects to communicate between
machines. Since DCOM is meant to support access to remote COM objects, a process
that needs to create an object would need to supply the network name of the server as
well as the class ID. Microsoft provides a couple of mechanisms for accomplishing this.
The most transparent is to have the remote machine’s name fixed in the registry (or
DCOM class store), associated with the particular class ID. This way, the application is
unaware that it is accessing a remote object and can use the same interface pointer that
it would for a local COM object. Alternatively, an application may specify a machine
name as a parameter.
A DCOM server is capable of serving objects at runtime. A service known as the
Service Control Manager (SCM), part of the DCOM library, is responsible for
connecting to the server-side SCM and requesting the creating of the object on the
server. On the server, a surrogate process is responsible for loading components and
running them. This differs from RPC models such as ONC and DCE in that a service
for a specific interface was not started a priori. This surrogate process is capable of
handling multiple clients simultaneously.
To support the identification of specific instances of a class (individual objects),
DCOM provides an object naming capability called a moniker. Each instance of an
object can create its own moniker and pass it back to the client. The client will then be
able to refer to it later or pass the moniker to other processes. A moniker itself is an
object. Its IMoniker interface can be used to locate, activate, and access the bound
object without having any information about where the object is located.
Several types of monikers are supported:
File moniker: This moniker uses the file type (e.g., “.doc”) to determine the
appropriate object (e.g., Microsoft Word). Microsoft provides support for
persistence - storing an object’s data in a file. If a file represents a stored object,
DCOM will use the class ID in the file to identify the object. URL moniker:
This abstracts access to URLs via Internet protocols (e.g. http, https, ftp) in a
COM interface. Binding a URL moniker allows the
remote data to be retrieved. Internally, the URL moniker uses the WinInet API
to retrieve data.
Class moniker: This is used together with other monikers to override the class
ID lookup mechanism.
DCOM also provides some support for persistent objects via monikers.
DCOM summary
Microsoft DCOM is a significant improvement over earlier RPC systems. The Object
RPC layer is an incremental improvement over DCE RPC and allows for object
references. The DCOM layer builds on top of COM’s access to objects (via function
tables) and provides transparency in accessing remote objects. Remote reference
management is still somewhat problematic in that it has to be done explicitly but at least
there is a mechanism for supporting this. The moniker mechanism provides a COM
interface to support naming objects, whether they are remote references, stored objects
in the file system, or URLs. The biggest downside is that DCOM is a Microsoft-only
solution. It also doesn’t work well across firewalls (a problem with most RPC systems)
since the firewalls must allow traffic to flow between certain ports used by ORPC and
DCOM.
CORBA
Even with DCE fixing some of the shortcomings in Sun’s RPC, certain deficiencies still
remain. For example, if a server is not running, a client cannot connect to it to call a
remote procedure. It is an administrator’s responsibility to ensure that the needed
servers are started before any clients attempt to connect to them. If a new service or
interface is added to the system, there is no means by which a client can discover this.
In some environments, it might helpful for a client to be able to find out about services
and their interfaces at run-time. Finally, object oriented languages expect
polymorphism in function calls (the function may behave differently for different types
of data). Traditional RPC has no support for this.
CORBA (Common Object Request Broker Architecture) was created to address
these, and other, issues. It is an architecture created by an industry consortium of over
500 companies called the Object Management Group (OMG). The specification for this
architecture has been evolving since 1989. The goal is to provide support for
distributed heterogeneous object-oriented applications. Objects may be hosted across a
network of computers (a single object is not distributed). The specification is
independent of any programming language, operating system, or network to enable
interoperability across these platforms.
Under CORBA, when a client wishes to invoke an operation (method) on an object,
it makes a request and gets a response. Both the request and response pass through the
object request broker (ORB). The ORB represents the entire set of interface libraries,
stub functions, and servers that hide the mechanisms for communication, activation, and
storage of server objects from the client. It lets objects discover each other at run time
and invoke services.
When a client makes a request, the ORB:
marshals arguments (at the client).
locates a server for the object. If necessary, it creates a process on the server
end to handle the request.
if the server is remote, transmits the request (using RPC or sockets).
unmarshals arguments into server format (at the server).
Module StudentObject {
Struct StudentInfo {
String name; int
id; float gpa;
};
exception Unknown {};
interface Student {
StudentInfo getinfo(in string name)
raises(unknown);
void putinfo(in StudentInfo data);
};
};
Beneath the scenes, this code results in a stub function being called, parameters
marshaled, and sent to the server. The client and server stubs can be used only if the
name of the class and method is known at compile time. Otherwise, CORBA supports
dynamic binding – assembling a method invocation at run time via the Dynamic
Invocation Interface (DII). This interface provides calls to set the class, build the
argument list, and make the call. The server counterpart (for creating a server interface
dynamically) is called the Dynamic Skeleton Interface (DSI)5. A client can discover
names of classes and methods at run time via the interface repository. This is a name
server that can be queried to discover what classes a server supports and which objects
are instantiated.
CORBA standardized the functional interfaces and capabilities but left the actual
implementation and data representation formats to individual ORB vendors. This led to
the situation where one CORBA implementation might not necessarily be able to
communicate with another. Applications generally needed some reworking to move
from one vendor’s CORBA product to another.
In 1996, CORBA 2.0 added interoperability as a goal in the specification. The
standard defined a network protocol called IIOP (the Internet Inter-ORB Protocol)
which would work across any TCP/IP based CORBA implementations. In fact, since
there was finally a standardized, documented protocol, IIOP itself could be used in
systems that do not even provide a CORBA API. For example, it could be used as a
transport for an implementation of Java RMI (RMI over IIOP; RMI will be covered
further on).
The hope in providing a well-documented network protocol such as IIOP along
with the full-featured set of capabilities of CORBA was to usher in a wide spectrum of
diverse Internet services. Organizations can host CORBA-aware services. Clients
throughout the Internet will be able to query these services, find out their interfaces
dynamically, and invoke functions. The pervasiveness of these services could be as
ubiquitous as HTML web access.
5 Some systems use the term stub to refer to the client stub and skeleton to refer to the server stub. CORBA is one of them.
Microsoft uses the term proxy for the client stub and stubfor the server stub.
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 12
Remote Procedure Call
CORBA summary
Basically, CORBA builds on top of earlier RPC systems and offers the following
additional capabilities:
Static or dynamic method invocations (RPC only supports static binding).
Every ORB supports run time metadata for describing every server interface
known to the system.
An ORB can broker calls within a single process, multiple processes on the
same machine, or distributed processes.
Polymorphic messaging – an ORB invokes a function on a target object. The
same function may have different effects depending on the type of the object.
Automatically instantiate objects that are not running Communicate with
other ORBs.
CORBA also provides a comprehensive set of services (known as COS, for COrba
Services) for managing objects:
Life-Cycle Services: provides operations for creating, copying, moving, and
deleting components.
Java RMI
CORBA aims at providing a comprehensive set of services for managing objects in a
heterogeneous environment (different languages, operating systems, networks). Java, in
its initial inception, supported the downloading of code from a remote site but its only
support for distributed communication was via sockets. In 1995, Sun (the creator of
Java) began creating an extension to Java called Java RMI (Remote Method
Invocation). Java RMI enables a programmer to create distributed applications where
methods of remote objects can be invoked from other Java virtual machines.
A remote call can be made once the application (client) has a reference to the
remote object. This is done by looking up the remote object in the naming service (the
RMI registry) provided by RMI and receiving a reference as a return value. Java RMI is
conceptually similar to RPC butt supports the semantics of object invocation in
different address spaces.
One area in which the design of Java differs from CORBA and most RPC systems
is that RMI is built for Java only. Sun RPC, DCE RPC, Microsoft’s DCOM and ORPC,
and CORBA are designed to be language, architecture, and (except for Microsoft)
operating system independent. While those capabilities are lost, the gain is that RMI fits
cleanly into the language and has no need for standardized data representations (Java
uses the same byte ordering everywhere). The design goals for Java RMI are:
it should fit the language, be integrated into the language, and be simple to
use
support seamless remote invocation of objects
support callbacks from servers to applets
preserve safety of the Java object environment
support distributed garbage collection
support multiple transports
The distributed object model is similar to the local Java object model in the
following ways:
1. A reference to an object can be passed as an argument or returned as a
result.
2. A remote object can be cast to any of the set of remote interfaces supported
by the implementation using the Java syntax for casting.
3. The built-in Java instanceof operator can be used to test the remote
interfaces supported by a remote object.
The object model differs from the local Java object model in the following ways:
1. Classes of remote objects interact with remote interfaces, never with the
implementation class of those interfaces.
2. Non-remote arguments to (and results from) a remote method invocation
are passed by copy, not by reference.
3. A remote object is passed by reference, not by copying the actual remote
implementation.
4. Clients must deal with additional exceptions.
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 14
Remote Procedure Call
Stubs
Java RMI works by creating stub functions. The stubs are generated with the rmic
compiler.
Locating objects
A bootstrap name server is provided for storing named references to remote objects.
A remote object reference can be stored using the URL-based methods of the class
java.rmi.Naming. For example,
BankAccount acct = new BankAcctImpl();
String url = "rmi://java.sun.com/account";
// bind url to remote object java.rmi.Naming.bind(url,
acct);
RMI architecture
RMI is a three-layer architecture (Figure 4). The top layer is the stub/skeleton layer. It
transmits data to the remote reference layer via marshal streams. Marshal streams
XML RPC
SOAP
Microsoft .NET
Program identification
Every program (a collection of RPC procedures) is identified by some value. This value
is a 32-bit integer that you have to select. The restrictions imposed on this number are:
0x00000000—0x1fffffff defined by Sun for standard services
The identifier names the program. The version list is a list of definitions, each
of the form:
version identifier {
procedure_list
} = value;
The value is an unsigned integer. Usually you only have one version in a program
definition. The identifier is a string that names the version of the program. The
rpcgen compiler will generate a #define for it and the program identifier in the header
file so that these values can be passed as arguments to clnt_create.
Every version definition contains a list of procedure. This procedure list contains a
sequence of definitions, each of the form:
data_type procedure_name ( data_type ) = value;
Data types
Constants may be used in place of an integer value. Constant definitions are converted
to a #define statement by rpcgen:
const MAXSIZE = 512;
Structures are like C structures. rpcgen transfers the structure definition and adds a
typedef for the name of that structure. For example,
struct intpair { int a, b };
is translated into:
struct intpair { int a, b };
typedef struct intpair intpair;
Enumeration types are also similar to C:
Rutgers University – CS 417: Distributed Systems ©2000-2002 Paul Krzyzanowski 18
Remote Procedure Call
Unions are not like C. A union is a specification of data types based on some
criteria:
union identifier switch ( declaration ) {
case_list
}
For example:
const MAXBUF=30;
union time_results switch (int status) {
case 0: char timeval[MAXBUF];
case 1: void;
case 2: int reason;
}
If you set status to 0, you must assign data to time_results_u.timeval .
defines a variable size array with a maximum size of 50 longs. The number may be
omitted if there is no bound on the size. This declaration is translated to:
typedef struct {
u_int x_vals_len;
long *x_vals_val;
} x_coords;
Pointers are like C. However, a pointer is not sent over the network (the value
would be meaningless on the other machine). What is sent is a boolean value (true for
pointer, false for null) followed by the data to which the pointer points.
Strings are declared as if they were variable length arrays:
string name<50>;
Opaque data is untyped data that contains an arbitrary sequence of bytes. It may be
of a fixed or variable length:
opaque extra_bytes[512];
opaque more<512>;
and head is a pointer to the start of the list (you allocated memory for it). After the
XDR routines translate the data, you can use xdr_free(xdr_list, head)to free
the data.
(the RPC definition), server.c (the server function), and client.c (the client
program). The program is a simple one: the client program (client) accepts a machine
name as an argument. It is assumed that before client is run the server program (server)
is running on that machine. The client first requests the time from the server, which is
returned as a 32-bit value. It then sends that result back to the server to get an ASCII
string containing the date/time represented by that value.
Makefile
all: client server
client: client.o date_clnt.o date.h cc
-o client client.o date_clnt.o -lnsl
server: server.o date_svc.o date.h cc
-o server server.o date_svc.o -lnsl
date_svc.o:
$(CC) $(CFLAGS) -c date_svc.c
date_clnt.o:
$(CC) $(CFLAGS) -c date_clnt.c
client.o: date.h
server.o: date.h
date.h:
date.x
rpcgen date.x
clean:
rm
-f
client
client.
o
server
server.
o
date_cl
nt.*
date_sv
c.*
date.h
tar: tar cvf rpcdemo.tar date.x client.c server.c
Makefile
date.x
/* date.x - description of remote date service */
server.c
#include <rpc/rpc.h>
#include "date.h"
client.c
/* client code */
#include <stdio.h>
#include <rpc/rpc.h>
#include <netconfig.h>
#include "date.h"
main(argc,
argv) int argc;
char **argv;
{
CLIENT *cl; /* rpc handle */
char *server; long *lresult; /* return
from bin_date_1 */ char **sresult; /* return
from str_date_1 */
if (argc != 2) {
fprintf(stderr, "usage: %s hostname\n", argv[0]);
exit(1);
}
server = argv[1]; /* get the name of the server */
/* create the client handle */
if ((cl=clnt_create(server, DATE_PROG,
DATE_VERS, "netpath")) == NULL) {
/* failed! */
clnt_pcreateerror(server);
exit(1);
}
/* call the procedure bin_date */
if ((lresult=bin_date_1(NULL, cl))==NULL) {
/* failed ! */
clnt_perror(cl, server);
exit(1);
}
printf("time on %s is %ld\n", server, *lresult); /*
have the server convert the result to a date string */
if ((sresult=str_date_1(lresult, cl)) == NULL) {
/* failed ! */
clnt_perror(cl, server);
exit(1);
}
printf("date is %s\n", *sresult);
clnt_destroy(cl); /* get rid of the handle */
exit(0);
}
References
The Component Object Model Specification, Draft version 0.9, October 24, 1995,
© 1992-1995 Microsoft Corp,
http://www.microsoft.com/oledev/olecom/title.htm [probably more than you’ll
want to know about Microsoft’s COM]