Nothing Special   »   [go: up one dir, main page]

Network Programming (NP) Unit Wise Materials

Download as pdf or txt
Download as pdf or txt
You are on page 1of 139

KATRAGADDA INNOVATIVE TRUST FOR EDUCATION

NETWORK
PROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
2|Page NETWORKPROGRAMMING

UNIT-I

Introduction and TCP/IP


INTRODUCTION

When writing programs that communicate across a computer network, one must first invent a
protocol, an agreement on how those programs will communicate. Before delving into the
design details of a protocol, high-level decisions must be made about which program is
expected to initiate communication and when responses are expected. For example, a Web
server is typically thought of as a long-running program (or daemon) that sends network
messages only in response to requests coming in from the network. The other side of the
protocol is a Web client, such as a browser, which always initiates communication with the
server. This organization into client and server is used by most network-aware applications.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
3|Page NETWORKPROGRAMMING

OSI Model

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
4|Page NETWORKPROGRAMMING

A common way to describe the layers in a network is to use the International Organization for
Standardization (ISO) open systems interconnection (OSI) model for computer
communications. This is a seven-layer model, along with the approximate mapping to the
Internet protocol suite.

The sockets programming interfaces described are interfaces from the upper three layers (the
"application") into the transport layer. Why do sockets provide the interface from the upper
three layers of the OSI model into the transport layer? There are two reasons for this design:
First, the upper three layers handle all the details of the application (FTP, Telnet, or HTTP,
for example) and know little about the communication details. The lower four layers know
little about the application, but handle all the communication details: sending data, waiting
for acknowledgments, sequencing data that arrives out of order, calculating and verifying
checksums, and so on. The second reason is that the upper three layers often form what is
called a user process while the lower four layers are normally provided as part of the
operating system (OS) kernel. Unix provides this separation between the user process and the
kernel, as do many other contemporary operating systems. Therefore, the interface between
layers 4 and 5 is the natural place to build the API.

APPLICATION LEVEL VIEW OF A SOCKET

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
5|Page NETWORKPROGRAMMING

KERNEL LEVEL VIEW OF A SOCKET (IPv4)

represents SOCKET

The Big Picture

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
6|Page NETWORKPROGRAMMING

IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the
workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses. IPv4
provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.

IPv6 Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for
IPv4. The major change is a larger address comprising 128 bits, to deal with the explosive
growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP,
SCTP, and ICMPv6. We often use the word "IP" as an adjective, as in IP layer and IP
address, when the distinction between IPv4 and IPv6 is not needed.

TCP Transmission Control Protocol. TCP is a connection-oriented protocol that provides a


reliable, full-duplex byte stream to its users. TCP sockets are an example of stream sockets.
TCP takes care of details such as acknowledgments, timeouts, retransmissions, and the like.
Most Internet application programs use TCP. Notice that TCP can use either IPv4 or IPv6.

UDP User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an
example of datagram sockets. There is no guarantee that UDP datagrams ever reach their
intended destination. As with TCP, UDP can use either IPv4 or IPv6.

SCTP Stream Control Transmission Protocol. SCTP is a connection-oriented protocol that


provides a reliable full-duplex association. The word "association" is used when referring to
a connection in SCTP because SCTP is multihomed, involving a set of IP addresses and a
single port for each side of an association. SCTP provides a message service, which
maintains record boundaries. As with TCP and UDP, SCTP can use either IPv4 or IPv6, but it
can also use both IPv4 and IPv6 simultaneously on the same association.

ICMP Internet Control Message Protocol. ICMP handles error and control information
between routers and hosts. These messages are normally generated by and processed by the
TCP/IP networking software itself, not user processes, although we show the ping and
traceroute programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to
distinguish it from ICMPv6.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
7|Page NETWORKPROGRAMMING

IGMP Internet Group Management Protocol. IGMP is used with multicasting, which is
optional with IPv4.
ARP Address Resolution Protocol. ARP maps an IPv4 address into a hardware address
(such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet,
token ring, and FDDI, and is not needed on point-to-point networks.

RARP Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4
address. It is sometimes used when a diskless node is booting.

ICMPv6 Internet Control Message Protocol version 6. ICMPv6 combines the functionality
of ICMPv4, IGMP, and ARP.

BPF BSD packet filter. This interface provides access to the datalink layer. It is normally
found on Berkeley-derived kernels.

DLPI Datalink provider interface. This interface also provides access to the datalink layer. It
is normally provided with SVR4.

We use the terms "IPv4/IPv6 host" and "dual-stack host" to denote hosts that
support both IPv4 and IPv6.

USER DATAGRAM PROTOCOL [UDP]:-

The User Datagram Protocol (UDP) provides a connectionless, unreliable transport service.
Connectionless means that a communication session between hosts is not established before
exchanging data. UDP is often used for communications that use broadcast or multicast
Internet Protocol (IP) packets. The UDP connectionless packet delivery service is unreliable
because it does not guarantee data packet delivery or send a notification if a packet is not
delivered.

Because delivery of UDP packets is not guaranteed, applications that use this protocol must
supply their own mechanisms for reliability if necessary. Although UDP appears to have
some limitations, it is useful in certain situations.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
8|Page NETWORKPROGRAMMING

Each UDP datagram has a length. The length of a datagram is passed to the receiving
application along with the data.

TRANSMISSION CONTROL PROTOCOL [TCP]:-

 Connection oriented: An application requests a ―connection‖ to destination and uses


connection to transfer data.
 Point-to-point: A TCP connection has two endpoints (no broadcast/multicast).
 Reliability: TCP guarantees that data will be delivered without loss, duplication or
transmission errors.
 Full duplex: Endpoints can exchange data in both directions simultaneously.
 Delivering TCP: TCP segments travel in IP datagrams. Internet routers only look at IP
header to forward datagrams. Each segment contains a sequence number.
 Flow Control: Flow control is necessary when a computer in the network transmits
data too fast for another computer to receive it .Flow control requires some form of
feedback from the receiving peer. This is executed effectively due to the receivers
buffer i.e., Window.
 TCP contains algorithms to estimate the round-trip time (RTT) between a client and
server dynamically so that it knows how long to wait for an acknowledgment. For
example, the RTT on a LAN can be milliseconds while across a WAN, it can be
seconds. Furthermore, TCP continuously estimates the RTT of a given connection,
because the RTT is affected by variations in the network traffic.

TCP Connection Establishment

Three-Way Handshake

The following scenario occurs when a TCP connection is established:

1. The server must be prepared to accept an incoming connection. This is normally done
by calling socket, bind, and listen and is called a passive open.
2. The client issues an active open by calling connect. This causes the client TCP to send
a "synchronize" (SYN) segment, which tells the server the client's initial sequence

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
9|Page NETWORKPROGRAMMING

number for the data that the client will send on the connection. Normally, there is no
data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP
options (which we will talk about shortly).

3. The server must acknowledge (ACK) the client's SYN and the server must also send
its own SYN containing the initial sequence number for the data that the server will
send on the connection. The server sends its SYN and the ACK of the client's SYN in
a single segment.
4. The client must acknowledge the server‘s SYN.

TCP Connection Termination

1. One application calls close first, and we say that this end performs the active close.
This end's TCP sends a FIN segment, which means it is finished sending data.

2. The other end that receives the FIN performs the passive close. The received FIN is
acknowledged by TCP. The receipt of the FIN is also passed to the application as an
endof- file (after any data that may have already been queued for the application to
receive), since the receipt of the FIN means the application will not receive any
additional data on the connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
10 | P a g e NETWORKPROGRAMMING

3. Sometime later, the application that received the end-of-file will close its socket. This
causes its TCP to send a FIN.

4. The TCP on the system that receives this final FIN (the end that did the active close)
acknowledges the FIN.

Since a FIN and an ACK are required in each direction, four segments are normally required.
We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with
data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close
and could be combined into one segment.

Importance of TIME_WAIT State:

Undoubtedly, one of the most misunderstood aspects of TCP with regard to network
programming is its TIME_WAIT state. The end that performs the active close goes through
this state. The duration that this endpoint remains in this state is twice the maximum segment
lifetime (MSL), sometimes called 2MSL.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
11 | P a g e NETWORKPROGRAMMING

Every implementation of TCP must choose a value for the MSL. The recommended value in
RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have
traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT
state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP
datagram can live in a network. We know this time is bounded because every datagram
contains an 8-bit hop limit with a maximum value of 255. Although this is a hop limit and not
a true time limit, the assumption is made that a packet with the maximum hop limit of 255
cannot exist in a network for more than MSL seconds.
The way in which a packet gets "lost" in a network is usually the result of routing anomalies.
A router crashes or a link between two routers goes down and it takes the routing protocols
seconds or minutes to stabilize and find an alternate path. During that time period, routing
loops can occur (router A sends packets to router B, and B sends them back to A) and packets
can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment,
the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the
final destination by some alternate path. But sometime later (up to MSL seconds after the lost
packet started on its journey), the routing loop is corrected and the packet that was lost in the
loop is sent to the final destination. This original packet is called a lost duplicate or a
wandering duplicate. TCP must handle these duplicates.

THE FOLLOWING INFORMATION HAS BEEN TAKEN FROM:


http://sit.iitkgp.ernet.in/archive/teaching/internetTech/tcp/www.scit.wlv.ac.uk/%257Ejphb/comms/
tcp.html

It should be noted that the exchange is really two independent exchanges and it is possible to
close the connection in one direction but not the other. This is known as a half close. The
following example (due to Stevens) demonstrates the use of the half-close.

Consider the Unix command

rsh remote sort < datafile


The effect of this is that the local file datafile is sorted on the remote host and the results
transferred back to the local host. The data flow is shown in the following diagram.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
12 | P a g e NETWORKPROGRAMMING

The problem here is that the sort program on the remote host will not start sorting the data
until it has read all the data, this event is indicated by the local host closing the connection
and the sort program responding to the corresponding EOF indication. However, the "back"
connection must remain open for the return of data.

Stevens suggests that the library call shutdown() be used with sockets programming to
achieve a half close.

Once the final ACK has been sent on an active close, the port/connection cannot be relaeased
and re-used for the time period 2MSL. This is twice the maximum segment life and this
constraint is imposed in case the the final ACK is lost. If the final ACK is lost then the
passive closing host will time out awaiting an ACK in response to the closing FIN and will
resend the FIN. If this arrives before the 2MSL time has expired there is no problem, after
this time the FIN does not appear to belong to whatever connection might exist between the
two clients.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
13 | P a g e NETWORKPROGRAMMING

RFC 793 defines MSL (Maximum Segment Lifetime) as 120 seconds but some
implementations use 30 or 60 seconds. It is, basically, the maximum time for which it is
reasonable to wait for a segment, i.e. if a segment doesn't reach its destination in MSL, it
probably won't get there at all at it can be assumed that it has been lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
14 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
15 | P a g e NETWORKPROGRAMMING

There are two reasons for the TIME_WAIT state:

1. To implement TCP's full-duplex connection termination reliably


2. To allow old duplicate segments to expire in the network

The first reason can be explained by assuming that the final ACK is lost. The server will
resend its final FIN, so the client must maintain state information, allowing it to resend the
final ACK. If it did not maintain this information, it would respond with an RST (a different
type of TCP segment), which would be interpreted by the server as an error. If TCP is
performing all the work necessary to terminate both directions of data flow cleanly for a
connection (its full-duplex close), then it must correctly handle the loss of any of these four
segments. This example also shows why the end that performs the active close is the end that
remains in the TIME_WAIT state: because that end is the one that might have to retransmit
the final ACK.

To understand the second reason for the TIME_WAIT state, assume we have a TCP
connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection
is closed and then sometime later, we establish another connection between the same IP
addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter
connection is called an incarnation of the previous connection since the IP addresses and
ports are the same. TCP must prevent old duplicates from a connection from reappearing at
some later time and being misinterpreted as belonging to a new incarnation of the same
connection. To do this, TCP will not initiate a new incarnation of a connection that is
currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the
MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL
seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we
successfully establish a TCP connection, all old duplicates from previous incarnations of the
connection have expired in the network.

USEFUL LINKS FOR TIME_WAIT IMPORTANCE:


 http://support.citrix.com/article/CTX117910
 http://www.pcvr.nl/tcpip/tcp_time.htm

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
16 | P a g e NETWORKPROGRAMMING

Port Numbers

ALLOCATION OF PORT NUMBERS

INTRODUCTION TO CONCURRENT SERVERS:

SOCKETPAIR:
The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the
connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair
uniquely identifies every TCP connection on a network.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
17 | P a g e NETWORKPROGRAMMING

NOTE: FOR MORE INFORMATION ABOUT FIRST 6 UNITS, PLEASE GO


THROUGH THE FOLLOWING LINK:
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
18 | P a g e NETWORKPROGRAMMING

UNIT-II

Socket Address Structure


Most socket functions require a pointer to a socket address structure as an argument. Each
supported protocol suite defines its own socket address structure.

IPv4 Socket Address Structure(SAS)

An IPv4 socket address structure, commonly called an "Internet socket address structure," is
named sockaddr_in and is defined by including the <netinet/in.h> header. The POSIX
definition of IPV4 SAS is shown below:

struct in_addr {
in_addr_t s_addr;
};

struct sockaddr_in {
uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};

The diagrammatical representation of IPV4 SAS is:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
19 | P a g e NETWORKPROGRAMMING

Datatype, Description and Header File of IPV4 SAS Members

IMP NOTE: The 32-bit IPv4 address can be accessed in two different ways. For example, if
serv is defined as an Internet socket address structure, then serv.sin_addr references the 32-
bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32-
bit IPv4 address as an in_addr_t (typically an unsigned 32-bit integer). We must be certain
that we are referencing the IPv4 address correctly, especially when it is used as an argument
to a function, because compilers often pass structures differently from integers.
Socket address structures are used only on a given host: The structure itself is not
communicated between different hosts, although certain fields (e.g., the IP address and port)
are used for communication.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
20 | P a g e NETWORKPROGRAMMING

Value-Result Arguments

Three functions, bind, connect, and sendto, pass a socket address structure from the process
to the kernel. One argument to these three functions is the pointer to the socket address
structure and another argument is the integer size of the structure. Since the kernel is passed
both the pointer and the size of what the pointer points to, it knows exactly how much data to
copy from the process into the kernel.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
21 | P a g e NETWORKPROGRAMMING

Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address
structure from the kernel to the process, the reverse direction from the previous scenario. Two
of the arguments to these four functions are the pointer to the socket address structure along
with a pointer to an integer containing the size of the structure.

The reason that the size changes from an integer to be a pointer to an integer is because the
size is both a value when the function is called (it tells the kernel the size of the structure so
that the kernel does not write past the end of the structure when filling it in) and a result when
the function returns. This type of argument is called a value-result argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
22 | P a g e NETWORKPROGRAMMING

Byte Ordering Functions


Consider a 16-bit integer that is made up of 2 bytes. There are two ways to store the two
bytes in memory: with the low-order byte at the starting address, known as little-endian byte
order, or with the high-order byte at the starting address, known as big-endian byte order.

Network Byte Order – Big Endian Byte Order


Host Byte Order – Big Endian or Little Endian Byte Order

We must deal with these byte ordering differences as network programmers because
networking protocols must specify a network byte order. For example, in a TCP segment,
there is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the
receiving protocol stack must agree on the order in which the bytes of these multibyte fields
will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte
integers.

In theory, an implementation could store the fields in a socket address structure in host byte
order and then convert to and from the network byte order when moving the fields to and
from the protocol headers, saving us from having to worry about this detail. But, both history
and the POSIX specification say that certain fields in the socket address structures must be
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
23 | P a g e NETWORKPROGRAMMING

maintained in network byte order. Our concern is therefore converting between host byte
order and network byte order. We use the following four functions to convert between these
two byte orders.

In the names of these functions, h stands for host, n stands for network, s stands for short, and
l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX
implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or
UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit
Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on
32-bit values.
NOTE: These functions are used exclusively for data functionality between sockets
(storage).

Byte Manipulation Functions


There are two groups of functions that operate on multibyte fields, without interpreting the
data, and without assuming that the data is a null-terminated C string. We need these types of
functions when dealing with socket address structures because we need to manipulate fields
such as IP addresses, which can contain bytes of 0, but are not C character strings.

The first group of functions, whose names begin with b (for byte), are from 4.2BSD and are
still provided by almost any system that supports the socket functions. The second group of
functions, whose names begin with mem (for memory), are from the ANSI C standard and
are provided with any system that supports an ANSI C library.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
24 | P a g e NETWORKPROGRAMMING

src might represent application space and dest might represent socket send buffer space
(socket receive buffer space).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
25 | P a g e NETWORKPROGRAMMING

inet_aton, inet_addr, and inet_ntoa Functions


To send IP address on the network, we have the functions that serve the purpose. The
following functions are for IPV4.

inet_pton and inet_ntop Functions


The IPV6 functions for the data communication over the network, following functions are
used. These functions can also be used for IPV4 addresses also (The ‗family‘ argument
specifies this).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
26 | P a g e NETWORKPROGRAMMING

sock_ntop Function
A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary
address. This address is normally contained in a socket address structure, requiring the caller
to know the format of the structure and the address family.

To solve this problem, sock_ntop() is used which takes pointer to a socket address structure
as an argument, calls the appropriate function and the presentation address is returned.

readn, writen, and readline Functions


Stream sockets (e.g., TCP sockets) exhibit a behavior with the read and write functions that
differ from normal file I/O. A read or write on a stream socket might input or output fewer
bytes than requested, but this is not an error condition. The reason is that buffer limits might
be reached for the socket in the kernel. All that is required to input or output the remaining
bytes is for the caller to invoke the read or write function again. Some versions of Unix also
exhibit this behavior when writing more than 4,096 bytes to a pipe. This scenario is always a
possibility on a stream socket with read, but is normally seen with write only if the socket is
nonblocking. Nevertheless, we always call our writen function instead of write, in case the
implementation returns a short count.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
27 | P a g e NETWORKPROGRAMMING

The following functions overcome this problem.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
28 | P a g e NETWORKPROGRAMMING

Elementary TCP Sockets

Socket functions for elementary TCP client/server


Socket:
socket (af, type, protocol);
Creates a socket on demand (placing it in an unconnected state), returns an integer identifying
the socket (descriptor), and specifies:
Address Family (af) - particular address of the family.
Type - Type of communication socket:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
29 | P a g e NETWORKPROGRAMMING

SOCK_STREAM - connection-oriented
SOCK_DGRAM - connection-less
SOCK_RAW - access to low-level protocols or network interfaces.
Protocol - Accommodates multiple protocols within a family.

Bind:
bind (socket, localaddr, addrlen);
Socket is created without any association to local or destination addresses, so a program uses
bind to establish a local address for it.
Socket - integer descriptor of the socket.
Localaddr - structure that specifies the local address to be bound.
Addrlen - integer length of the address (in bytes).

Listen:
listen (socket, qlength);
Server creates a socket, binds it to a well-known port, and waits for requests. To avoid
rejecting service requests that cannot be handled, a server queue is created using Listen. It
provides a mechanism to create the queue and then listen for incoming connections (passive
mode). Listen only works with sockets using a reliable stream service.
Socket - Integer descriptor.
Qlength - length of the request queue for that socket (max. = 5).

Connect:
connect (socket, destaddr, addrlen);
Binds a permanent destination to a socket placing it in a connected state. Sockets using
connection-less service do not have to use connect (specify the address in every datagram),
but may.
Socket - socket descriptor.
Destaddr - socket_addr structure (also includes protocol port number) specifying the
destination address.
Addrlen - length of destination address (in bytes).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
30 | P a g e NETWORKPROGRAMMING

Accept:
accept (socket, addr, addrlen);
Bind associates a socket with port, but that socket is not connected to a foreign destination.
When a request comes in, Accept establishes the full connection. It blocks until a connection
request arrives.
Addr - pointer to the sockaddr structure.
Addrlen - pointer to integer size of address.

Close: (A system call from traditional UNIX Environment)


close (socket descriptor);
When a client or server finishes with a socket, calls close to deallocate it‘s resources. The
connection immediately terminates unless several processes share the same socket. It then
decrements the reference count (closing it completely when reference count = 0).

Order of Socket System Calls:


Client Side
Client Side (depends on connection type):
Socket
Connect
Write (may be repeated)
Read (may be repeated)
Close

Server Side
Server Side (depends on connection type):
Socket
Bind
Listen
Accept
Read (may be repeated)
Write (may be repeated)
Close (go back to Accept)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
31 | P a g e NETWORKPROGRAMMING

Shutdown:
Shutdown (socket, direction);
The shutdown function applies to full-duplex sockets (connected using a TCP socket) and is
used to partially close the connection.
Socket - socket descriptor of a connected socket.
Direction - direction in which shutdown is desired
0 = terminate further input.
1 = terminate further output.
2 = terminate input / output (close).

IMPORTANT NOTES:
File and Socket Descriptors:
A socket is a generalized UNIX file access mechanism that provides an endpoint for
communication. Descriptors (maintained in the descriptor tables) are kept per process by the
operating system to point to internal data structures for files and sockets. Descriptors are
small integer values.
File Descriptor:
Bound to a file when open is called.
Socket Descriptor:
Created using open, but does not bind it to a destination.
Unbounded - UDP specifies destination every time.
Bounded - TCP specifies destination during an open system call.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
32 | P a g e NETWORKPROGRAMMING

After a socket has been created (using open), additional system calls are required to specify
the details of it‘s use.
Passive Socket - used by a server to wait for calls.
Active Socket - used by a client to initiate a connection.

Basic I/O Functions in UNIX:


UNIX and other operating systems provide a basic set of system functions used for I/O
operations on files and other devices. Most operating systems provide similar variations to
the five standard I/O operations that BSD UNIX uses.

I/O Functions:
Open - prepare for input / output.
Close - terminate the use of a device.
Write - transfer data from memory to an output device.
Read - transfer data from an input device to memory.
Lseek - position the head of a disk drive to a specific place on the disk.

The Socket Interface:


 The Berkeley socket interface provides generalized functions that support network
communication using many possible protocols.
 Socket calls refer to all TCP/IP protocols as a single protocol family (protocol suite).
The calls allow a programmer to specify the type of service required, rather than the
name of a specific protocol.
 The socket interface was created since an API (application program interface) for
network connections is not standardized, it‘s design lies outside the scope of a
protocol suite.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
33 | P a g e NETWORKPROGRAMMING

Concurrent Servers

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org

34 | P a g e NETWORKPROGRAMMING

getsockname and getpeername Functions

These two functions return either the local protocol address associated with a socket
(getsockname) or the foreign protocol address associated with a socket (getpeername).

#include <sys/socket.h>
int getsockname(intsockfd, struct sockaddr *localaddr, socklen_t *addrlen);
int getpeername(intsockfd, struct sockaddr *peeraddr, socklen_t *addrlen);
Both return: 0 if OK, -1 on error

Notice that the final argument for both functions is a value-result argument. That is, both
functions fill in the socket address structure pointed to by localaddr or peeraddr. We
mentioned in our discussion of bind that the term "name" is misleading. These two functions
return the protocol address associated with one of the two ends of a network connection,
which for IPV4 and IPV6 is the combination of an IP address and port number. These
functions have nothing to do with domain names.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

35 | P a g e NETWORKPROGRAMMING

These two functions are required for the following reasons:

 After connect successfully returns in a TCP client that does not call bind,
getsockname returns the local IP address and local port number assigned to the
connection by the kernel.

 After calling bind with a port number of 0 (telling the kernel to choose the local port
number), getsockname returns the local port number that was assigned. getsockname
can be called to obtain the address family of a socket.

 In a TCP server that binds the wildcard IP address, once a connection is established
with a client (accept returns successfully), the server can call getsockname to obtain
the local IP address assigned to the connection. The socket descriptor argument in this
call must be that of the connected socket, and not the listening socket.

 When a server is execed by the process that calls accept, the only way the server can
obtain the identity of the client is to call getpeername.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

36 | P a g e NETWORKPROGRAMMING

UNIT-III

TCP Client/Server Example

Introduction
Our simple example is an echo server that performs the following steps:
1. The client reads a line of text from its standard input and writes the line to the server.
2. The server reads the line from its network input and echoes the line back to the client.
3. The client reads the echoed line and prints it on its standard output.

Normal Startup(w.r.to socket pair)


In order to initiate the communication between the client and server, we first start the Server
by calling socket(). The socket pair at the server is;
SP = (IPs:Ps , IPc:Pc)
where
IPc – IP address of Client
IPs – IP address of Server
Pc – Port Number of Client
Ps – Port Number of Server

Next comes bind(), then SP = (localhost:33600 , IPc:Pc)


Then listen(), now SP = (localhost:33600 , IPc:Pc) [You may enter wildcard character „*‟
for IPs, IPc, Pc when they are not known.]

So, at Server the status is ―Passive Open‖ and the format is:
Server
socket() - SP = (IPs:Ps , IPc:Pc)
bind() - SP = (localhost:33600 , IPc:Pc)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

37 | P a g e NETWORKPROGRAMMING

listen() - SP = (localhost:33600 , IPc:Pc) or (*:33600 , *:*)

Now, the Client requests the connection with the server. The function calls are;
socket(). The socket pair is;
SP = (IPc:Pc , IPs:Ps)
So, at the client side, the status is ―Active Open‖. Now, ―SIMULTANEOUS OPEN‖
situation occurs as both the ends connect with each other as,
At Client:
Call is connect() – SP = (localhost:33597, x.y.z.w:33600)
At Server:
Call is accept() – SP = (localhost:33600 , a.b.c.d:33597)
The format is:
Client
socket() - SP = (IPc:Pc , IPs:Ps)

SIMULTANEOUS OPEN
connect() – SP = (localhost:33597, x.y.z.w:33600)
accept() – SP = (localhost:33600 , a.b.c.d:33597)
At this point, Normal Startup of Client and Server is said to be occurred.

The following steps take place with our Client/Server example:


1. The client calls str_cli, which will block in the call to fgets, because we have not
typed a line of input yet.
2. When accept returns in the server, it calls fork and the child calls str_echo. This
function calls readline, which calls read, which blocks while waiting for a line to be
sent from the client.
3. The server parent, on the other hand, calls accept again, and blocks while waiting for
the next client connection.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

38 | P a g e NETWORKPROGRAMMING

Normal Termination
We can follow through the steps involved in the normal termination of our client and server:
1. When we type our EOF character, fgets returns a null pointer and the function str_cli
returns.
2. When str_cli returns to the client main function , the latter terminates by calling exit.
3. Part of process termination is the closing of all open descriptors, so the client socket is
closed by the kernel. This sends a FIN to the server, to which the server TCP responds
with an ACK. This is the first half of the TCP connection termination sequence. At
this point, the server socket is in the CLOSE_WAIT state and the client socket is in
the FIN_WAIT_2 state.
4. When the server TCP receives the FIN, the server child is blocked in a call to
readline, and readline then returns 0. This causes the str_echo function to return to the
server child main.
5. The server child terminates by calling exit.
6. All open descriptors in the server child are closed. The closing of the connected
socket by the child causes the final two segments of the TCP connection termination
to take place: a FIN from the server to the client, and an ACK from the client. At this
point, the connection is completely terminated. The client socket enters the
TIME_WAIT state.
7. Finally, the SIGCHLD signal is sent to the parent when the server child terminates.
This occurs in this example, but we do not catch the signal in our code, and the
default action of the signal is to be ignored. Thus, the child enters the zombie state.
We can verify this with the ps command.

wait and waitpid Functions

we call the wait function to handle the terminated child.

#include <sys/wait.h>
pid_t wait (int *statloc);
pid_t waitpid (pid_tpid, int *statloc, intoptions);
Both return: process ID if OK, 0 or–1 on error

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

39 | P a g e NETWORKPROGRAMMING

wait and waitpid both return two values: the return value of the function is the process ID of
the terminated child, and the termination status of the child (an integer) is returned through
the statloc pointer. There are three macros that we can call that examine the termination
status and tell us if the child terminated normally, was killed by a signal, or was just stopped
by job control. Additional macros let us then fetch the exit status of the child, or the value of
the signal that killed the child, or the value of the job-control signal that stopped the child.
We will use the WIFEXITED and WEXITSTATUS macros for this purpose. If there are no
terminated children for the process calling wait, but the process has one or more children that
are still executing, then wait blocks until the first of the existing children terminates.

waitpid gives us more control over which process to wait for and whether or not to block.
First, the pid argument lets us specify the process ID that we want to wait for. A value of -1
says to wait for the first of our children to terminate. (There are other options, dealing with
process group IDs, but we do not need them in this text.) The options argument lets us
specify additional options. The most common option is WNOHANG. This option tells the
kernel not to block if there are no terminated children.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

40 | P a g e NETWORKPROGRAMMING

Termination of Server Process


We will now start our client/server and then kill the server child process. This simulates the
crashing of the server process, so we can see what happens to the client. The following steps
take place:
1. We start the server and client and type one line to the client to verify that all is okay.
That line is echoed normally by the server child.
2. We find the process ID of the server child and kill it. As part of process termination,
all open descriptors in the child are closed. This causes a FIN to be sent to the client,
and the client TCP responds with an ACK. This is the first half of the TCP connection
termination.
3. The SIGCHLD signal is sent to the server parent and handled correctly.
4. Nothing happens at the client. The client TCP receives the FIN from the server TCP
and responds with an ACK, but the problem is that the client process is blocked in the
call to fgets waiting for a line from the terminal.
5. Running netstat at this point shows the state of the sockets.

linux % netstat -a | grep 9877


tcp 0 0 *:9877 *:* LISTEN
tcp 0 0 localhost:9877 localhost:43604 FIN_WAIT2
tcp 1 0 localhost:43604 localhost:9877 CLOSE_WAIT

6. We can still type a line of input to the client. Here is what happens at the client
starting from Step 1:
linux %tcpcli01 127.0.0.1 start client
hello the first line that we type
hello is echoed correctly here we kill the
server child on the server host
another line we then type a second line to the client
str_cli : server terminated
prematurely

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

41 | P a g e NETWORKPROGRAMMING

When we type "another line," str_cli calls writen and the client TCP sends the data to
the server. This is allowed by TCP because the receipt of the FIN by the client TCP
only indicates that the server process has closed its end of the connection and will not
be sending any more data. The receipt of the FIN does not tell the client TCP that the
server process has terminated (which in this case, it has).
When the server TCP receives the data from the client, it responds with an RST since
the process that had that socket open has terminated. We can verify that the RST was
sent by watching the packets with tcpdump.

7. The client process will not see the RST because it calls readline immediately after the
call to writen and readline returns 0 (EOF) immediately because of the FIN that was
received in Step 2. Our client is not expecting to receive an EOF at this point so it
quits with the error message "server terminated prematurely."

8. When the client terminates, all its open descriptors are closed.

Crashing of Server Host


The following steps take place:

1. When the server host crashes, nothing is sent out on the existing network connections.
That is, we are assuming the host crashes and is not shut down by an operator.
2. We type a line of input to the client, it is written by writen , and is sent by the client
TCP as a data segment. The client then blocks in the call to readline, waiting for the
echoed reply.
3. If we watch the network with tcpdump, we will see the client TCP continually
retransmitting the data segment, trying to receive an ACK from the server. Section
25.11 of TCPv2 shows a typical pattern for TCP retransmissions: Berkeley-derived
implementations retransmit the data segment 12 times, waiting for around 9 minutes
before giving up. When the client TCP finally gives up (assuming the server host has
not been rebooted during this time, or if the server host has not crashed but was
unreachable on the network, assuming the host was still unreachable), an error is
returned to the client process. Since the client is blocked in the call to readline, it
returns an error. Assuming the server host crashed and there were no responses at all
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

42 | P a g e NETWORKPROGRAMMING

to the client's data segments, the error is ETIMEDOUT. But if some intermediate
router determined that the server host was unreachable and responded with an ICMP
―destination unreachable‖ message, the error is either EHOSTUNREACH or
ENETUNREACH.

Crashing and Rebooting of Server Host


The following steps take place:
1. We start the server and then the client. We type a line to verify that the connection is
established.
2. The server host crashes and reboots. We type a line of input to the client, which is
sent as a TCP data segment to the server host.
3. When the server host reboots after crashing, its TCP loses all information about
connections that existed before the crash. Therefore, the server TCP responds to the
received data segment from the client with an RST.
4. Our client is blocked in the call to readline when the RST is received, causing readline
to return the error ECONNRESET.

Shutdown of Server Host


The previous two sections discussed the crashing of the server host, or the server host being
unreachable across the network. We now consider what happens if the server host is shut
down by an operator while our server process is running on that host.
When a Unix system is shut down, the init process normally sends the SIGTERM signal to all
processes (we can catch this signal), waits some fixed amount of time (often between 5 and
20 seconds), and then sends the SIGKILL signal (which we cannot catch) to any processes
still running. This gives all running processes a short amount of time to clean up and
terminate. If we do not catch SIGTERM and terminate, our server will be terminated by the
SIGKILL signal.
When the process terminates, all open descriptors are closed, and we then follow the same
sequence of steps discussed in TERMINATION OF SERVER PROCESS. As stated there,
we must use the select or poll function in our client to have the client detect the termination
of the server process as soon as it occurs.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

43 | P a g e NETWORKPROGRAMMING

UNIT-IV

I/O Multiplexing:The select and poll functions

Introduction
We saw our TCP client handling two inputs at the same time: standard input and a TCP
socket. We encountered a problem when the client was blocked in a call to fgets (on standard
input) and the server process was killed. The server TCP correctly sent a FIN to the client
TCP, but since the client process was blocked reading from standard input, it never saw the
EOF until it read from the socket (possibly much later). What we need is the capability to tell
the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is
ready to be read, or the descriptor is capable of taking more output). This capability is called
I/O multiplexing and is provided by the select and poll functions. We will also cover a newer
POSIX variation of the former, called pselect.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

44 | P a g e NETWORKPROGRAMMING

I/O multiplexing is typically used in networking applications in the following scenarios:

 When a client is handling multiple descriptors (normally interactive input and a


network socket), I/O multiplexing should be used.
 It is possible, but rare, for a client to handle multiple sockets at the same time.
 If a TCP server handles both a listening socket and its connected sockets, I/O
multiplexing is normally used.
 If a server handles TCP and UDP, I/O multiplexing is normally used.
 If a server handles multiple services and perhaps multiple protocols, I/O multiplexing
is normally used.

There are normally two distinct phases for an input operation:

1. Waiting for the data to be ready


2. Copying the data from the kernel to the process

For an input operation on a socket, the first step normally involves waiting for data to arrive
on the network. When the packet arrives, it is copied into a buffer within the kernel. The
second step is copying this data from the kernel's buffer into our application buffer.

I/O Models
The five I/O models those are available to us under UNIX:
 blocking I/O
 nonblocking I/O
 I/O multiplexing (select and poll)
 signal driven I/O (SIGIO)
 asynchronous I/O (the POSIX aio_functions)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

45 | P a g e NETWORKPROGRAMMING

BLOCKING I/O MODEL:

NONBLOCKING I/O MODEL:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

46 | P a g e NETWORKPROGRAMMING

I/O MULTIPLEXING

SIGNAL-DRIVEN I/O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

47 | P a g e NETWORKPROGRAMMING

ASYNCHRONOUS I/O MODEL

SELECT FUNCTION

select()—Synchronous I/O Multiplexing


This function is somewhat strange, but it's very useful. Take the following situation: you are a
server and you want to listen for incoming connections as well as keep reading from the
connections you already have.

No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What
if you're blocking on an accept() call? How are you going to recv() data at the same
time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?

select() gives you the power to monitor several sockets at the same time. It'll tell you
which ones are ready for reading, which are ready for writing, and which sockets have raised
exceptions, if you really want to know that.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

48 | P a g e NETWORKPROGRAMMING

This being said, in modern times select(), though very portable, is one of the slowest
methods for monitoring sockets. One possible alternative is libevent, or something similar,
that encapsulates all the system-dependent stuff involved with getting socket notifications.

Without any further ado, I'll offer the synopsis of select():

#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

int select(int numfds, fd_set *readfds, fd_set *writefds,


fd_set *exceptfds, struct timeval *timeout);

The function monitors "sets" of file descriptors; in particular readfds, writefds,


and exceptfds. If you want to see if you can read from standard input and some socket
descriptor, sockfd, just add the file descriptors 0 and sockfd to the set readfds. The
parameter numfds should be set to the values of the highest file descriptor plus one. In this
example, it should be set tosockfd+1, since it is assuredly higher than standard input (0).

When select() returns, readfds will be modified to reflect which of the file descriptors
you selected which is ready for reading. You can test them with the macro FD_ISSET(),
below.

Before progressing much further, I'll talk about how to manipulate these sets. Each set is of
the type fd_set. The following macros operate on this type:

FD_SET(int fd, fd_set *set); Add fd to the set.


FD_CLR(int fd, fd_set *set); Remove fd from the set.
FD_ISSET(int fd, fd_set *set); Return true if fd is in the set.
FD_ZERO(fd_set *set); Clear all entries from the set.

Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait
forever for someone to send you some data. Maybe every 96 seconds you want to print "Still
Going..." to the terminal even though nothing has happened. This time structure allows you to
specify a timeout period. If the time is exceeded and select() still hasn't found any ready
file descriptors, it'll return so you can continue processing.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

49 | P a g e NETWORKPROGRAMMING

The struct timeval has the follow fields:

struct timeval {
int tv_sec; // seconds
int tv_usec; // microseconds
};

Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of
microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000
microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000
microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter
μ (Mu) that we use for "micro". Also, when the function returns, timeout might be updated
to show the time still remaining. This depends on what flavor of Unix you're running.

Yay! We have a microsecond resolution timer! Well, don't count on it. You'll probably have
to wait some part of your standard Unix timeslice no matter how small you set yourstruct
timeval.

Other things of interest: If you set the fields in your struct timeval to 0, select() will
timeout immediately, effectively polling all the file descriptors in your sets. If you set the
parametertimeout to NULL, it will never timeout, and will wait until the first file descriptor
is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL
in the call toselect().

The following code snippet waits 2.5 seconds for something to appear on standard input:

/*
** select.c -- a select() demo
*/

#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

#define STDIN 0 // file descriptor for standard input

int main(void)
{
struct timeval tv;
fd_set readfds;
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

50 | P a g e NETWORKPROGRAMMING

tv.tv_sec = 2;
tv.tv_usec = 500000;

FD_ZERO(&readfds);
FD_SET(STDIN, &readfds);

// don't care about writefds and exceptfds:


select(STDIN+1, &readfds, NULL, NULL, &tv);

if (FD_ISSET(STDIN, &readfds))
printf("A key was pressed!\n");
else
printf("Timed out.\n");

return 0;
}

If you're on a line buffered terminal, the key you hit should be RETURN or it will time out
anyway.

Now, some of you might think this is a great way to wait for data on a datagram socket—and
you are right: it might be. Some Unices can use select in this manner, and some can't. You
should see what your local man page says on the matter if you want to attempt it.

Some Unices update the time in your struct timeval to reflect the amount of time still
remaining before a timeout. But others do not. Don't rely on that occurring if you want to be
portable. (Use gettimeofday() if you need to track time elapsed. It's a bummer, I know,
but that's the way it is.)

What happens if a socket in the read set closes the connection? Well, in that
case, select() returns with that socket descriptor set as "ready to read". When you actually
do recv() from it,recv() will return 0. That's how you know the client has closed the
connection.

One more note of interest about select(): if you have a socket that is listen()ing, you
can check to see if there is a new connection by putting that socket's file descriptor in
the readfds set.

And that, my friends, is a quick overview of the almighty select() function.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

51 | P a g e NETWORKPROGRAMMING

But, by popular demand, here is an in-depth example. Unfortunately, the difference between
the dirt-simple example, above, and this one here is significant. But have a look, then read the
description that follows it.

This program acts like a simple multi-user chat server. Start it running in one window,
then telnet to it ("telnet hostname 9034") from multiple other windows. When you type
something in onetelnet session, it should appear in all the others.

/*
** selectserver.c -- a cheezy multiperson chat server
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#define PORT "9034" // port we're listening on

// get sockaddr, IPv4 or IPv6:


void *get_in_addr(struct sockaddr *sa)
{
if (sa->sa_family == AF_INET) {
return &(((struct sockaddr_in*)sa)->sin_addr);
}

return &(((struct sockaddr_in6*)sa)->sin6_addr);


}

int main(void)
{
fd_set master; // master file descriptor list
fd_set read_fds; // temp file descriptor list for select()
int fdmax; // maximum file descriptor number

int listener; // listening socket descriptor


int newfd; // newly accept()ed socket descriptor
struct sockaddr_storage remoteaddr; // client address
socklen_t addrlen;

char buf[256]; // buffer for client data


int nbytes;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

52 | P a g e NETWORKPROGRAMMING

char remoteIP[INET6_ADDRSTRLEN];

int yes=1; // for setsockopt() SO_REUSEADDR, below


int i, j, rv;

struct addrinfo hints, *ai, *p;

FD_ZERO(&master); // clear the master and temp sets


FD_ZERO(&read_fds);

// get us a socket and bind it


memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE;
if ((rv = getaddrinfo(NULL, PORT, &hints, &ai)) != 0) {
fprintf(stderr, "selectserver: %s\n", gai_strerror(rv));
exit(1);
}

for(p = ai; p != NULL; p = p->ai_next) {


listener = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
if (listener < 0) {
continue;
}

// lose the pesky "address already in use" error message


setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int));

if (bind(listener, p->ai_addr, p->ai_addrlen) < 0) {


close(listener);
continue;
}

break;
}

// if we got here, it means we didn't get bound


if (p == NULL) {
fprintf(stderr, "selectserver: failed to bind\n");
exit(2);
}

freeaddrinfo(ai); // all done with this

// listen
if (listen(listener, 10) == -1) {
perror("listen");
exit(3);
}

// add the listener to the master set

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

53 | P a g e NETWORKPROGRAMMING

FD_SET(listener, &master);

// keep track of the biggest file descriptor


fdmax = listener; // so far, it's this one

// main loop
for(;;) {
read_fds = master; // copy it
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("select");
exit(4);
}

// run through the existing connections looking for data to read


for(i = 0; i <= fdmax; i++) {
if (FD_ISSET(i, &read_fds)) { // we got one!!
if (i == listener) {
// handle new connections
addrlen = sizeof remoteaddr;
newfd = accept(listener,
(struct sockaddr *)&remoteaddr,
&addrlen);

if (newfd == -1) {
perror("accept");
} else {
FD_SET(newfd, &master); // add to master set
if (newfd > fdmax) { // keep track of the max
fdmax = newfd;
}
printf("selectserver: new connection from %s on "
"socket %d\n",
inet_ntop(remoteaddr.ss_family,
get_in_addr((struct sockaddr*)&remoteaddr),
remoteIP, INET6_ADDRSTRLEN),
newfd);
}
} else {
// handle data from a client
if ((nbytes = recv(i, buf, sizeof buf, 0)) <= 0) {
// got error or connection closed by client
if (nbytes == 0) {
// connection closed
printf("selectserver: socket %d hung up\n", i);
} else {
perror("recv");
}
close(i); // bye!
FD_CLR(i, &master); // remove from master set
} else {
// we got some data from a client
for(j = 0; j <= fdmax; j++) {

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

54 | P a g e NETWORKPROGRAMMING

// send to everyone!
if (FD_ISSET(j, &master)) {
// except the listener and ourselves
if (j != listener && j != i) {
if (send(j, buf, nbytes, 0) == -1) {
perror("send");
}
}
}
}
}
} // END handle data from client
} // END got new incoming connection
} // END looping through file descriptors
} // END for(;;)--and you thought it would never end!

return 0;
}

Notice I have two file descriptor sets in the code: master and read_fds. The first, master,
holds all the socket descriptors that are currently connected, as well as the socket descriptor
that is listening for new connections.

The reason I have the master set is that select() actually changes the set you pass into it
to reflect which sockets are ready to read. Since I have to keep track of the connections from
one call of select() to the next, I must store these safely away somewhere. At the last
minute, I copy the master into the read_fds, and then call select().

But doesn't this mean that every time I get a new connection, I have to add it to
the master set? Yup! And every time a connection closes, I have to remove it from
the master set? Yes, it does.

Notice I check to see when the listener socket is ready to read. When it is, it means I have
a new connection pending, and I accept() it and add it to the master set. Similarly, when
a client connection is ready to read, and recv() returns 0, I know the client has closed the
connection, and I must remove it from the master set.

If the client recv() returns non-zero, though, I know some data has been received. So I get
it, and then go through the master list and send that data to all the rest of the connected
clients.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

55 | P a g e NETWORKPROGRAMMING

And that, my friends, is a less-than-simple overview of the almighty select() function.

In addition, here is a bonus afterthought: there is another function called poll() which
behaves much the same way select() does, but with a different system for managing the
file descriptor sets.

http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#select

POLL FUNCTION
poll()

Test for events on multiple sockets simultaneously

Prototypes
#include <sys/poll.h>

int poll(struct pollfd *ufds, unsigned int nfds, int timeout);

Description

This function is very similar to select() in that they both watch sets of file descriptors for
events, such as incoming data ready to recv(), socket ready to send() data to, out-of-band
data ready to recv(), errors, etc.

The basic idea is that you pass an array of nfds struct pollfds in ufds, along with a
timeout in milliseconds (1000 milliseconds in a second.) The timeout can be negative if you
want to wait forever. If no event happens on any of the socket descriptors by the
timeout, poll() will return.

Each element in the array of struct pollfds represents one socket descriptor, and
contains the following fields:

struct pollfd {
int fd; // the socket descriptor
short events; // bitmap of events we're interested in
short revents; // when poll() returns, bitmap of events that occurred
};

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

56 | P a g e NETWORKPROGRAMMING

Before calling poll(), load fd with the socket descriptor (if you set fd to a negative
number, this struct pollfd is ignored and its revents field is set to zero) and then
construct the eventsfield by bitwise-ORing the following macros:

POLLIN Alert me when data is ready to recv() on this socket.


POLLOUT Alert me when I can send() data to this socket without blocking.
POLLPRI Alert me when out-of-band data is ready to recv() on this socket.

Once the poll() call returns, the revents field will be constructed as a bitwise-OR of the
above fields, telling you which descriptors actually have had that event occur. Additionally,
these other fields might be present:

POLLERR An error has occurred on this socket.


POLLHUP The remote side of the connection hung up.
POLLNVAL Something was wrong with the socket descriptor fd—maybe it's
uninitialized?

Return Value

Returns the number of elements in the ufds array that have had event occur on them; this can
be zero if the timeout occurred. Also returns -1 on error (and errno will be set accordingly.)

Example
int s1, s2;
int rv;
char buf1[256], buf2[256];
struct pollfd ufds[2];

s1 = socket(PF_INET, SOCK_STREAM, 0);


s2 = socket(PF_INET, SOCK_STREAM, 0);

// pretend we've connected both to a server at this point


//connect(s1, ...)...
//connect(s2, ...)...

// set up the array of file descriptors.


//
// in this example, we want to know when there's normal or out-of-band
// data ready to be recv()'d...

ufds[0].fd = s1;

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

57 | P a g e NETWORKPROGRAMMING

ufds[0].events = POLLIN | POLLPRI; // check for normal or out-of-band

ufds[1] = s2;
ufds[1].events = POLLIN; // check for just normal data

// wait for events on the sockets, 3.5 second timeout


rv = poll(ufds, 2, 3500);

if (rv == -1) {
perror("poll"); // error occurred in poll()
} else if (rv == 0) {
printf("Timeout occurred! No data after 3.5 seconds.\n");
} else {
// check for events on s1:
if (ufds[0].revents & POLLIN) {
recv(s1, buf1, sizeof buf1, 0); // receive normal data
}
if (ufds[0].revents & POLLPRI) {
recv(s1, buf1, sizeof buf1, MSG_OOB); // out-of-band data
}

// check for events on s2:


if (ufds[1].revents & POLLIN) {
recv(s1, buf2, sizeof buf2, 0);
}
}

Socket Options

There are various ways to get and set the options that affect a socket:
 The getsockopt and setsockopt functions
 The fcntl function
 The ioctl function
This chapter starts by covering the setsockopt and getsockopt functions, followed by an
example that prints the default value of all the options, and then a detailed description of all
the socket options. We divide the detailed descriptions into the following categories: generic,
IPv4, IPv6, TCP, and SCTP. This detailed coverage can be skipped during a first reading of
this chapter, and the individual sections referred to when needed. A few options are discussed
in detail in a later chapter, such as the IPv4 and IPv6 multicasting options.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

58 | P a g e NETWORKPROGRAMMING

setsockopt(), getsockopt()

Set various options for a socket

Prototypes
#include <sys/types.h>
#include <sys/socket.h>

int getsockopt(int s, int level, int optname, void *optval,


socklen_t *optlen);
int setsockopt(int s, int level, int optname, const void *optval,
socklen_t optlen);

Description

Sockets are fairly configurable beasts. In fact, they are so configurable, I'm not even going to
cover it all here. It's probably system-dependent anyway. But I will talk about the basics.

Obviously, these functions get and set certain options on a socket. On a Linux box, all the
socket information is in the man page for socket in section 7. (Type: "man 7 socket" to get
all these goodies.)

As for parameters, s is the socket you're talking about, level should be set to SOL_SOCKET.
Then you set the optname to the name you're interested in. Again, see your man page for all
the options, but here are some of the most fun ones:

SO_BINDTODEVICE Bind this socket to a symbolic device name like eth0 instead of
using bind() to bind it to an IP address. Type the command
ifconfig under Unix to see the device names.
SO_REUSEADDR Allows other sockets to bind() to this port, unless there is an
active listening socket bound to the port already. This enables
you to get around those "Address already in use" error messages
when you try to restart your server after a crash.
SO_BROADCAST Allows UDP datagram (SOCK_DGRAM) sockets to send and
receive packets sent to and from the broadcast address. Does
nothing—NOTHING!!—to TCP stream sockets! Hahaha!

As for the parameter optval, it's usually a pointer to an int indicating the value in question.
For booleans, zero is false, and non-zero is true. And that's an absolute fact, unless it's
different on your system. If there is no parameter to be passed, optval can be NULL.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

59 | P a g e NETWORKPROGRAMMING

The final parameter, optlen, is filled out for you by getsockopt() and you have to
specify it for setsockopt(), where it will probably be sizeof(int).

Warning: on some systems (notably Sun and Windows), the option can be a char instead of
an int, and is set to, for example, a character value of '1' instead of an int value of 1.
Again, check your own man pages for more info with "man setsockopt" and "man 7
socket"!

Return Value

Returns zero on success, or -1 on error (and errno will be set accordingly.)

Example
int optval;
int optlen;
char *optval2;

// set SO_REUSEADDR on a socket to true (1):


optval = 1;
setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval);

// bind a socket to a device name (might not work on all systems):


optval2 = "eth1"; // 4 bytes long, so 4, below:
setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4);

// see if the SO_BROADCAST flag is set:


getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen);
if (optval != 0) {
print("SO_BROADCAST enabled on s3!\n");
}

The following options are supported for setsockopt():

SO_DEBUG

Provides the ability to turn on recording of debugging information. This option takes an int value in
the optval argument. This is a BOOL option.

SO_BROADCAST

Permits sending of broadcast messages, if this is supported by the protocol. This option takes
an int value in the optval argument. This is a BOOL option.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

60 | P a g e NETWORKPROGRAMMING

SO_REUSEADDR

Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local
addresses, if this is supported by the protocol. This option takes an int value in the optval argument.
This is a BOOLoption.

SO_KEEPALIVE

Keeps connections active by enabling periodic transmission of messages, if this is supported by the
protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes
writing to that socket are notified with an ENETRESET errno. This option takes an int value in
the optval argument. This is a BOOL option.

SO_LINGER

Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system
blocks the process during close() until it can transmit the data or until the end of the interval
indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,
and close() is issued, the system handles the call in a way that allows the process to continue as
quickly as possible. This option takes a linger structure in the optval argument.

SO_OOBINLINE

Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option
takes an int value in optval argument. This is a BOOL option.

SO_SNDBUF

Sets send buffer size information. This option takes an int value in the optval argument.

SO_RCVBUF

Sets receive buffer size information. This option takes an int value in the optval argument.

SO_DONTROUTE

Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on
a directly-connected network, and messages are directed to the appropriate network interface according
to the destination address. The effect, if any, of this option depends on what protocol is in use. This
option takes an int value in the optval argument. This is a BOOL option.

TCP_NODELAY
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

61 | P a g e NETWORKPROGRAMMING

Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option
takes an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the
option is enabled.

The following options are supported for getsockopt():

SO_DEBUG
Reports whether debugging information is being recorded. This option stores an int value in
the optval argument. This is a BOOL option.

SO_ACCEPTCONN
Reports whether socket listening is enabled. This option stores an int value in the optval argument.
This is a BOOL option.

SO_BROADCAST
Reports whether transmission of broadcast messages is supported, if this is supported by the protocol.
This option stores an int value in the optval argument. This is a BOOL option.

SO_REUSEADDR
Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local
addresses, if this is supported by the protocol. This option stores an int value in the optval argument.
This is a BOOLoption.

SO_KEEPALIVE
Reports whether connections are kept active with periodic transmission of messages, if this is supported
by the protocol.

If the connected socket fails to respond to these messages, the connection is broken and processes
writing to that socket are notified with an ENETRESET errno. This option stores an int value in
the optval argument. This is a BOOL option.

SO_LINGER
Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system
blocks the process during close() until it can transmit the data or until the end of the interval
indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,
and close() is issued, the system handles the call in a way that allows the process to continue as
quickly as possible. This option stores a linger structure in the optval argument.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

62 | P a g e NETWORKPROGRAMMING

SO_OOBINLINE
Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option
stores an int value in optval argument. This is a BOOL option.

SO_SNDBUF
Reports send buffer size information. This option stores an int value in the optval argument.

SO_RCVBUF
Reports receive buffer size information. This option stores an int value in the optval argument.

SO_ERROR
Reports information about error status and clears it. This option stores an int value in
the optval argument.

SO_TYPE
Reports the socket type. This option stores an int value in the optval argument.

SO_DONTROUTE
Reports whether outgoing messages bypass the standard routing facilities. The destination must be on
a directly-connected network, and messages are directed to the appropriate network interface according
to the destination address. The effect, if any, of this option depends on what protocol is in use. This
option stores an int value in the optval argument. This is a BOOL option.

SO_MAX_MSG_SIZE
Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no
meaning for stream-oriented sockets. This option stores an int value in the optval argument.

TCP_NODELAY
Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores
an int value in the optval argument. This is a BOOL option.

For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the
option is enabled.

http://www.mkssoftware.com/docs/man3/setsockopt.3.asp

http://www.mkssoftware.com/docs/man3/getsockopt.3.asp

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

63 | P a g e NETWORKPROGRAMMING

fcntl()
Control socket descriptors

Prototypes
#include <sys/unistd.h>
#include <sys/fcntl.h>

int fcntl(int s, int cmd, long arg);

Description

This function is typically used to do file locking and other file-oriented stuff, but it also has a
couple socket-related functions that you might see or use from time to time.

Parameter s is the socket descriptor you wish to operate on, cmd should be set to F_SETFL,
and arg can be one of the following commands. (Like I said, there's more to fcntl() than
I'm letting on here, but I'm trying to stay socket-oriented.)

O_NONBLOCK Set the socket to be non-blocking. See the section on blocking for more
details.
O_ASYNC Set the socket to do asynchronous I/O. When data is ready to
be recv()'d on the socket, the signal SIGIO will be raised. This is rare
to see, and beyond the scope of the guide. And I think it's only available
on certain systems.

Return Value

Returns zero on success, or -1 on error (and errno will be set accordingly.)

Different uses of the fcntl() system call actually have different return values, but I haven't
covered them here because they're not socket-related. See your local fcntl() man page for
more information.

Example
int s = socket(PF_INET, SOCK_STREAM, 0);

fcntl(s, F_SETFL, O_NONBLOCK); // set to non-blocking


fcntl(s, F_SETFL, O_ASYNC); // set to asynchronous I/O

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

64 | P a g e NETWORKPROGRAMMING

UNIT-V

Elementary UDP Sockets

Introduction

There are some fundamental differences between applications written using TCP versus those
that use UDP. These are because of the differences in the two transport layers: UDP is a
connectionless, unreliable, datagram protocol, quite unlike the connection-oriented, reliable
byte stream provided by TCP. Nevertheless, there are instances when it makes sense to use
UDP instead of TCP. Some popular applications are built using UDP: DNS, NFS, and
SNMP, for example.

The below figure shows the function calls for a typical UDP client/server. The client does
not establish a connection with the server. Instead, the client just sends a datagram to the
server using the sendto function, which requires the address of the destination (the server) as
a parameter. Similarly, the server does not accept a connection from a client. Instead, the
server just calls the recvfrom function, which waits until data arrives from some client.
recvfrom returns the protocol address of the client, along with the datagram, so the server can
send a response to the correct client.

The figure also shows a timeline of the typical scenario that takes place for a UDP
client/server exchange. We can compare this to the typical TCP exchange. We will also
describe the new functions that we us with UDP sockets, recvfrom and sendto, and redo our
echo client/server to use UDP. We will also describe the use of the connect function with a
UDP socket, and the concept of asynchronous errors.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

65 | P a g e NETWORKPROGRAMMING

send(), sendto()

Send data out over a socket

Prototypes
#include <sys/types.h>
#include <sys/socket.h>

ssize_t send(int s, const void *buf, size_t len, int flags);


ssize_t sendto(int s, const void *buf, size_t len,
int flags, const struct sockaddr *to,
socklen_t tolen);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

66 | P a g e NETWORKPROGRAMMING

Description

These functions send data to a socket. Generally speaking, send() is used for
TCP SOCK_STREAM connected sockets, and sendto() is used for
UDP SOCK_DGRAM unconnected datagram sockets. With the unconnected sockets, you must
specify the destination of a packet each time you send one, and that's why the last parameters
of sendto() define where the packet is going.

With both send() and sendto(), the parameter s is the socket, buf is a pointer to the data
you want to send, len is the number of bytes you want to send, and flags allows you to
specify more information about how the data is to be sent. Set flags to zero if you want it to
be "normal" data. Here are some of the commonly used flags, but check your
local send() man pages for more details:

MSG_OOB Send as "out of band" data. TCP supports this, and it's a way to
tell the receiving system that this data has a higher priority than
the normal data. The receiver will receive the
signal SIGURG and it can then receive this data without first
receiving all the rest of the normal data in the queue.
MSG_DONTROUTE Don't send this data over a router, just keep it local.
MSG_DONTWAIT If send() would block because outbound traffic is clogged,
have it return EAGAIN. This is like a "enable non-blocking just
for this send." See the section on blocking for more details.
MSG_NOSIGNAL If you send() to a remote host which is no longer recv()ing,
you'll typically get the signal SIGPIPE. Adding this flag
prevents that signal from being raised.

Return Value

Returns the number of bytes actually sent, or -1 on error (and errno will be set
accordingly.) Note that the number of bytes actually sent might be less than the number you
asked it to send! See the section on handling partial send()s for a helper function to get
around this.

Also, if the socket has been closed by either side, the process calling send() will get the
signal SIGPIPE. (Unless send() was called with the MSG_NOSIGNAL flag.)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

67 | P a g e NETWORKPROGRAMMING

Example
int spatula_count = 3490;
char *secret_message = "The Cheese is in The Toaster";

int stream_socket, dgram_socket;


struct sockaddr_in dest;
int temp;

// first with TCP stream sockets:

// assume sockets are made and connected


//stream_socket = socket(...
//connect(stream_socket, ...

// convert to network byte order


temp = htonl(spatula_count);
// send data normally:
send(stream_socket, &temp, sizeof temp, 0);

// send secret message out of band:


send(stream_socket, secret_message, strlen(secret_message)+1, MSG_OOB);

// now with UDP datagram sockets:


//getaddrinfo(...
//dest = ... // assume "dest" holds the address of the destination
//dgram_socket = socket(...

// send secret message normally:


sendto(dgram_socket, secret_message, strlen(secret_message)+1, 0,
(struct sockaddr*)&dest, sizeof dest);

recv(), recvfrom()

Receive data on a socket

Prototypes

#include <sys/types.h>
#include <sys/socket.h>

ssize_t recv(int s, void *buf, size_t len, int flags);


ssize_t recvfrom(int s, void *buf, size_t len, int flags,
struct sockaddr *from, socklen_t *fromlen);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

68 | P a g e NETWORKPROGRAMMING

Description

Once you have a socket up and connected, you can read incoming data from the remote side
using the recv() (for TCP SOCK_STREAM sockets) and recvfrom() (for
UDP SOCK_DGRAMsockets).

Both functions take the socket descriptor s, a pointer to the buffer buf, the size (in bytes) of
the buffer len, and a set of flags that control how the functions work.

Additionally, the recvfrom() takes a struct sockaddr*, from that will tell you where
the data came from, and will fill in fromlen with the size of struct sockaddr. (You
must also initializefromlen to be the size of from or struct sockaddr.)

So what wondrous flags can you pass into this function? Here are some of them, but you
should check your local man pages for more information and what is actually supported on
your system. You bitwise-or these together, or just set flags to 0 if you want it to be a
regular vanilla recv().

MSG_OOB Receive Out of Band data. This is how to get data that has been
sent to you with the MSG_OOB flag in send(). As the receiving
side, you will have had signal SIGURG raised telling you there is
urgent data. In your handler for that signal, you could
call recv()with this MSG_OOB flag.
MSG_PEEK If you want to call recv() "just for pretend", you can call it
with this flag. This will tell you what's waiting in the buffer for
when you call recv() "for real"
(i.e. without the MSG_PEEK flag. It's like a sneak preview into
the next recv() call.
MSG_WAITALL Tell recv() to not return until all the data you specified in
the len parameter. It will ignore your wishes in extreme
circumstances, however, like if a signal interrupts the call or if
some error occurs or if the remote side closes the connection,
etc. Don't be mad with it.

When you call recv(), it will block until there is some data to read. If you want to not
block, set the socket to non-blocking or check with select() or poll() to see if there is
incoming data before calling recv() or recvfrom().

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

69 | P a g e NETWORKPROGRAMMING

Return Value

Returns the number of bytes actually received (which might be less than you requested in
the len parameter), or -1 on error (and errno will be set accordingly.)

If the remote side has closed the connection, recv() will return 0. This is the normal method
for determining if the remote side has closed the connection. Normality is good, rebel!

Example
// stream sockets and recv()

struct addrinfo hints, *res;


int sockfd;
char buf[512];
int byte_count;

// get host info, make socket, and connect it


memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever
hints.ai_socktype = SOCK_STREAM;
getaddrinfo("www.example.com", "3490", &hints, &res);
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
connect(sockfd, res->ai_addr, res->ai_addrlen);

// all right! now that we're connected, we can receive some data!
byte_count = recv(sockfd, buf, sizeof buf, 0);
printf("recv()'d %d bytes of data in buf\n", byte_count);
// datagram sockets and recvfrom()

struct addrinfo hints, *res;


int sockfd;
int byte_count;
socklen_t fromlen;
struct sockaddr_storage addr;
char buf[512];
char ipstr[INET6_ADDRSTRLEN];

// get host info, make socket, bind it to port 4950


memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC; // use IPv4 or IPv6, whichever
hints.ai_socktype = SOCK_DGRAM;
hints.ai_flags = AI_PASSIVE;
getaddrinfo(NULL, "4950", &hints, &res);
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
bind(sockfd, res->ai_addr, res->ai_addrlen);

// no need to accept(), just recvfrom():

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

70 | P a g e NETWORKPROGRAMMING

fromlen = sizeof addr;


byte_count = recvfrom(sockfd, buf, sizeof buf, 0, &addr, &fromlen);

printf("recv()'d %d bytes of data in buf\n", byte_count);


printf("from IP address %s\n",
inet_ntop(addr.ss_family,
addr.ss_family == AF_INET?
((struct sockadd_in *)&addr)->sin_addr:
((struct sockadd_in6 *)&addr)->sin6_addr,
ipstr, sizeof ipstr);

Lost Datagrams

Our UDP client/server example is not reliable. If a client datagram is lost (say it is discarded
by some router between the client and server), the client will block forever in its call to
recvfrom in the function dg_cli, waiting for a server reply that will never arrive. Similarly, if
the client datagram arrives at the server but the server's reply is lost, the client will again
block forever in its call to recvfrom. A typical way to prevent this is to place a timeout on the
client's call to recvfrom.

Just placing a timeout on the recvfrom is not the entire solution. For example, if we do time
out, we cannot tell whether our datagram never made it to the server, or if the server's reply
never made it back. If the client's request was something like "transfer a certain amount of
money from account A to account B" (instead of our simple echo server), it would make a big
difference as to whether the request was lost or the reply was lost.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

71 | P a g e NETWORKPROGRAMMING

connect Function with UDP

an asynchronous error is not returned on a UDP socket unless the socket has been connected.
Indeed, we are able to call connect for a UDP socket. But this does not result in anything like
a TCP connection: There is no three-way handshake. Instead, the kernel just checks for any
immediate errors (e.g., an obviously unreachable destination), records the IP address and port
number of the peer (from the socket address structure passed to connect), and returns
immediately to the calling process.

Overloading the connect function with this capability for UDP sockets is confusing. If
theconvention that sockname is the local protocol address and peername is the foreign
protocol address is used, then a better name would have been setpeername. Similarly, a better
name for the bind function would be setsockname. With this capability, we must now
distinguish between

 An unconnected UDP socket, the default when we create a UDP socket


 A connected UDP socket, the result of calling connect on a UDP socket

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

72 | P a g e NETWORKPROGRAMMING

With a connected UDP socket, three things change, compared to the default unconnected
UDP socket:

1. We can no longer specify the destination IP address and port for an output operation.
That is, we do not use sendto, but write or send instead. Anything written to a
connected UDP socket is automatically sent to the protocol address (e.g., IP address
and port) specified by connect.
2. We do not need to use recvfrom to learn the sender of a datagram, but read, recv, or
recvmsg instead. The only datagrams returned by the kernel for an input operation on
a connected UDP socket are those arriving from the protocol address specified in
connect. Datagrams destined to the connected UDP socket's local protocol address
(e.g., IP address and port) but arriving from a protocol address other than the one to
which the socket was connected are not passed to the connected socket. This limits a
connected UDP socket to exchanging datagrams with one and only one peer.
3. Asynchronous errors are returned to the process for connected UDP sockets. The
corollary, as we previously described, is that unconnected UDP sockets do not receive
asynchronous errors.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

73 | P a g e NETWORKPROGRAMMING

Lack of Flow Control with UDP

We observe two cases:

CASE 1: SLOW CLIENT FAST SERVER

CASE 2: FAST CLIENT SLOW SERVER

WE KNOW THE STATEMENT “AT ANY MOMENT OF TIME, SENDER WILL


NOT OVERFLOW THE RECEIVER BUFFER” FROM TCP CONCEPT.

Based on this statement, we explain the concept like this:

W.r.to Client:
SLOW-BIT RATE IS LESS
FAST-BIT RATE IS MORE
W.r.to Server:
SLOW-RECEIVER BUFFER (WINDOW) SIZE IS LESS
FAST- RECEIVER BUFFER (WINDOW) SIZE IS MORE

In CASE 2, the Datagrams are lost to the maximum extent. This is the normal situation that is
present in UDP Communication.

In CASE 1, the Datagrams are maintained and delivered to the receiver (as there will be flow
control).
Consider the following example for CASE 2:
The client sent 2,000 datagrams, but the server application received only 30 of these, for a
98% loss rate. is no indication whatsoever to the server application or to the client application
that these datagrams were As we have said, UDP has no flow control and it is unreliable. It is
trivial, as we have shown, for a UDP sender overrun the receiver.
If we look at the netstat output, the total number of datagrams received by the server host (not
the server application) is 2,000 (73,208 - 71,208). The counter "dropped due to full socket
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

74 | P a g e NETWORKPROGRAMMING

buffers" indicates how many datagrams were received by UDP but were discarded because
the receiving socket's receive queue was full 775 of TCPv2). This value is 1,970 (3,491 -
1,971), which when added to the counter output by the application.

The following Output specifies this:

THE FIRST SET OF LINES IS WHEN THE DATAGRAMS ARE NOT YET OBTAINED
AT THE CLIENT SIDE (BEFORE THIS COMMUNICATION).

THE SECOND SET OF LINES IS WHEN DATAGRAMS ARE COMMUNICATED IN


THIS (CURRENT) COMMUNICATION.

This specifies clearly that there is lack of flow control with the UDP Service.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

75 | P a g e NETWORKPROGRAMMING

Determining Outgoing Interface with UDP


A connected UDP socket can also be used to determine the outgoing interface that will be
used to a particular destination. This is because of a side effect of the connect function when
applied to a UDP socket: The kernel chooses the local IP address (assuming the process has
not already called bind to explicitly assign this). This local IP address is chosen by searching
the routing table for the destination IP address, and then using the primary IP address for the
resulting interface.

In the above figure, UDP Client connects with the UDP Server using bind(). But, in order for
the datagrams to move from UDP Client to UDP Server, they should move through
intermediate routers. So, PEER System now becomes R1 but not UDP Server. This is
because we are using connect() within the UDP communication.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

76 | P a g e NETWORKPROGRAMMING

UNIT-VI

Elementary UDP Sockets


All the examples so far in this text have used numeric addresses for the hosts (e.g.,
206.6.226.33) and numeric port numbers to identify the servers (e.g., port 13 for the standard
daytime server and port 9877 for our echo server). We should, however, use names instead of
numbers for numerous reasons: Names are easier to remember; the numeric address can
change but the name can remain the same; and with the move to IPv6, numeric addresses
become much longer, making it much more error-prone to enter an address by hand. This
chapter describes the functions that convert between names and numeric values:
gethostbyname and gethostbyaddr to convert between hostnames and IPv4 addresses, and
getservbyname and getservbyport to convert between service names and port numbers.

Domain Name System (DNS)


The DNS is used primarily to map between hostnames and IP addresses. A hostname can be
either a simple name, such as solaris or freebsd, or a fully qualified domain name '(FQDN),
such as solaris.unpbook.com. Technically, an FQDN is also called an absolute name and
must end with a period, but users often omit the ending period. The trailing period tells the
resolver that this name is fully qualified and it doesn't need to search its list of possible
domains.

Resource Records

Entries in the DNS are known as resource records (RRs). There are only a few types of RRs
that we are interested in.

A A record maps a hostname into a 32-bit IPv4 address.


AAAA A AAAA record, called a "quad A" record, maps a hostname into a 128-bit
IPv6 address. The term "quad A" was chosen because a 128-bit address is four
times larger than a 32-bit address.
PTR PTR records (called "pointer records") map IP addresses into hostnames. For
an IPv4 address, then 4 bytes of the 32-bit address is reversed, each byte is

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

77 | P a g e NETWORKPROGRAMMING

converted to its decimal ASCII value (0–255), and in-addr.arpa is the


appended. The resulting string is used in the PTR query. For an IPv6 address,
the 32 4-bit nibbles of the 128-bit address are reversed, each nibble is
converted to its corresponding hexadecimal ASCII value (0–9a–f), and
ip6.arpa is appended.
MX An MX record specifies a host to act as a "mail exchanger" for the specified
host. In the example for the host freebsd above, two MX records are provided:
The first has a preference value of 5 and the second has a preference value of
10. When multiple MX records exist, they are used in order of preference,
starting with the smallest value.
CNAME CNAME stands for "canonical name." A common use is to assign CNAM
records for common services, such as ftp and www. If people use these service
names instead of the actual hostnames, it is transparent when a service is
moved to another host. For example, the following could be CNAMEs for our
host linux:

ftp IN CNAME linux.unpbook.com.


www IN CNAME linux.unpbook.com.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

78 | P a g e NETWORKPROGRAMMING

Resolvers and Name Servers


Organizations run one or more name servers, often the program known as BIND (Berkeley
Internet Name Domain). Applications such as the clients and servers that we are writing in
this text contact a DNS server by calling functions in a library known as the resolver. The
common resolver functions are gethostbyname and gethostbyaddr, both of which are
described in this chapter. The former maps a hostname into its IPv4 addresses, and the latter
does the reverse mapping.
The figure below shows a typical arrangement of applications, resolvers, and name servers.
We now write the application code. On some systems, the resolver code is contained in a
system library and is link-edited into the application when the application is built. On others,
there is a centralized resolver daemon that all applications share, and the system library code
performs RPCs to this daemon. In either case, application code calls the resolver code using
normal function calls, typically calling the functions gethostbyname and gethostbyaddr.

The resolver code reads its system-dependent configuration files to determine the location of
the organization's name servers. (We use the plural "name servers" because most
organizations run multiple name servers, even though we show only one local server in the
figure. Multiple name servers are absolutely required for reliability and redundancy.) The file
/etc/resolv.conf normally contains the IP addresses of the local name servers.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

79 | P a g e NETWORKPROGRAMMING

It might be nice to use the names of the name servers in the /etc/resolv.conf file, since the
names are easier to remember and configure, but this introduces a chicken-and-egg problem
of where to go to do the name-to-address conversion for the server that will do the name and
address conversion! The resolver sends the query to the local name server using UDP. If the
local name server does not know the answer, it will normally query other name servers across
the Internet, also using UDP. If the answers are too large to fit in a UDP packet, the resolver
will automatically switch to TCP.

gethostbyname Function (Returns: IPV4 Address)


Host computers are normally known by human-readable names. All the examples that we
have shown so far in this book have intentionally used IP addresses instead of names, so we
know exactly what goes into the socket address structures for functions such as connect and
sendto, and what is returned by functions such as accept and recvfrom. But, most applications
should deal with names, not addresses. This is especially true as we move to IPv6, since IPv6
addresses (hex strings) are much longer than IPv4 dotted-decimal numbers. (The example
AAAA record and ip6.arpa PTR record in the previous section should make this obvious.)
The most basic function that looks up a hostname is gethostbyname. If successful, it returns a
pointer to a hostent structure that contains all the IPv4 addresses for the host. However, it is
limited in that it can only return IPv4 addresses.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

80 | P a g e NETWORKPROGRAMMING

gethostbyname differs from the other socket functions that we have described in that it does
not set errno when an error occurs. Instead, it sets the global integer h_errno to one of the
following constants defined by including <netdb.h>:

 HOST_NOT_FOUND
 TRY_AGAIN
 NO_RECOVERY
 NO_DATA (identical to NO_ADDRESS)

gethostbyaddr Function (Returns:Hostname)


The function gethostbyaddr takes a binary IPv4 address and tries to find the hostname
corresponding to that address. This is the reverse of gethostbyname.

This function returns a pointer to the same hostent structure that we described with
gethostbyname. The field of interest in this structure is normally h_name, the canonical
hostname.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

81 | P a g e NETWORKPROGRAMMING

The addr argument is not a char*, but is really a pointer to an in_addr structure containing the
IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is
AF_INET.In terms of the DNS, gethostbyaddr queries a name server for a PTR record in the
inaddr.arpa domain.

getservbyname and getservbyport Functions (Returns: Port


Number and Service Name)
Services, like hosts, are often known by names, too. If we refer to a service by its name in our
code, instead of by its port number, and if the mapping from the name to port number is
contained in a file (normally /etc/services), then if the port number changes, all we need to
modify is one line in the /etc/services file instead of having to recompile the applications. The
next function, getservbyname, looks up a service given its name.

The service name servname must be specified. If a protocol is also specified (protoname is a
non-null pointer), then the entry must also have a matching protocol. Some Internet services
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

82 | P a g e NETWORKPROGRAMMING

are provided using either TCP or UDP, while others support only a single protocol (e.g., FTP
requires TCP). If protoname is not specified and the service supports multiple protocols, it is
implementation-dependent as to which port number is returned. Normally this does not
matter, because services that support multiple protocols often use the same TCP and UDP
port number,but this is not guaranteed.

The main field of interest in the servent structure is the port number. Since the port number is
returned in network byte order, we must not call htons when storing this into a socket address
structure.

The next function, getservbyport, looks up a service given its port number and an optional
protocol.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

83 | P a g e NETWORKPROGRAMMING

UNIT-VII

INTER PROCESS COMMUNICATION


In computing, Inter-process communication (IPC) is a set of methods for the exchange of
data among multiple threads in one or more processes. Processes may be running on one or
more computers connected by a network. IPC methods are divided into methods for message
passing, synchronization, shared memory, and remote procedure calls (RPC). The method of
IPC used may vary based on the bandwidth and latency of communication between the
threads, and the type of data being communicated.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

84 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

85 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

86 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

87 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

88 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

89 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

90 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

91 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

92 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

93 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

94 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

95 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

96 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

97 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

98 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

99 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

100 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

101 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

102 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

103 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

104 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

105 | P a g e NETWORKPROGRAMMING

File Locking
File locking provides a very simple yet incredibly useful mechanism for coordinating file
accesses. Before I begin to lay out the details, let me fill you in on some file locking secrets:

There are two types of locking mechanisms: mandatory and advisory. Mandatory systems
will actually prevent read()s and write()s to file. Several Unix systems support them.
Nevertheless, I'm going to ignore them throughout this document, preferring instead to talk
solely about advisory locks. With an advisory lock system, processes can still read and write
from a file while it's locked. Useless? Not quite, since there is a way for a process to check
for the existence of a lock before a read or write. See, it's a kind of cooperative locking
system. This is easily sufficient for almost all cases where file locking is necessary.

Since that's out of the way, whenever I refer to a lock from now on in this document, I'm
referring to advisory locks. So there.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

106 | P a g e NETWORKPROGRAMMING

Now, let me break down the concept of a lock a little bit more. There are two types of
(advisory!) locks: read locks and write locks (also referred to as shared locks and exclusive
locks, respectively.) The way read locks work is that they don't interfere with other read
locks. For instance, multiple processes can have a file locked for reading at the same.
However, when a process has an write lock on a file, no other process can activate either a
read or write lock until it is relinquished. One easy way to think of this is that there can be
multiple readers simultaneously, but there can only be one writer at a time.

One last thing before beginning: there are many ways to lock files in Unix systems. System V
likes lockf(), which, personally, I think sucks. Better systems supportflock() which offers
better control over the lock, but still lacks in certain ways. For portability and for
completeness, I'll be talking about how to lock files usingfcntl(). I encourage you, though, to
use one of the higher-level flock()-style functions if it suits your needs, but I want to portably
demonstrate the full range of power you have at your fingertips. (If your System V Unix
doesn't support the POSIX-y fcntl(), you'll have to reconcile the following information with
yourlockf() man page.)

Setting a lock

The fcntl() function does just about everything on the planet, but we'll just use it for file
locking. Setting the lock consists of filling out a struct flock (declared in fcntl.h) that
describes the type of lock needed, open()ing the file with the matching mode, and
calling fcntl() with the proper arguments.

struct flock fl;


int fd;

fl.l_type = F_WRLCK; /* F_RDLCK, F_WRLCK, F_UNLCK */


fl.l_whence = SEEK_SET; /* SEEK_SET, SEEK_CUR, SEEK_END */
fl.l_start = 0; /* Offset from l_whence */
fl.l_len = 0; /* length, 0 = to EOF */
fl.l_pid = getpid(); /* our PID */

fd = open("filename", O_WRONLY);

fcntl(fd, F_SETLKW, &fl); /* F_GETLK, F_SETLK, F_SETLKW */

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

107 | P a g e NETWORKPROGRAMMING

What just happened? Let's start with the struct flock since the fields in it are used
to describe the locking action taking place. Here are some field definitions:

l_type This is where you signify the type of lock you want to set. It's
either F_RDLCK, F_WRLCK, or F_UNLCK if you want to set a read lock,
write lock, or clear the lock, respectively.
l_whence This field determines where the l_start field starts from (it's like an
offset for the offset). It can be either SEEK_SET, SEEK_CUR,
or SEEK_END, for beginning of file, current file position, or end of file.
l_start This is the starting offset in bytes of the lock, relative to l_whence.
l_len This is the length of the lock region in bytes (which starts
from l_start which is relative to l_whence.
l_pid The process ID of the process dealing with the lock. Use getpid() to get
this.

The next step is to open() the file, since flock() needs a file descriptor of the file that's
being locked. Note that when you open the file, you need to open it in the same mode as you
have specified in the lock, as shown in the table, below. If you open the file in the wrong
mode for a given lock type, fcntl() will return -1 and errno will be set to EBADF.

l_type mode
F_RDLCK O_RDONLY or O_RDWR

F_WRLCK O_WRONLY or O_RDWR

Finally, the call to fcntl() actually sets, clears, or gets the lock. See, the second argument
(the cmd) to fcntl() tells it what to do with the data passed to it in the struct flock.
The following list summarizes what each fcntl() cmd does:

F_SETLKW This argument tells fcntl() to attempt to obtain the lock requested in
the struct flock structure. If the lock cannot be obtained (since someone
else has it locked already), fcntl() will wait (block) until the lock has
cleared, then will set it itself. This is a very useful command. I use it all the
time.
F_SETLK This function is almost identical to F_SETLKW. The only difference is that this
one will not wait if it cannot obtain a lock. It will return immediately with -1.
This function can be used to clear a lock by setting the l_type field in
the struct flock to F_UNLCK.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

108 | P a g e NETWORKPROGRAMMING

F_GETLK If you want to only check to see if there is a lock, but don't want to set one, you
can use this command. It looks through all the file locks until it finds one that
conflicts with the lock you specified in the struct flock. It then copies the
conflicting lock's information into the struct and returns it to you. If it can't
find a conflicting lock, fcntl() returns the struct as you passed it, except it
sets the l_type field to F_UNLCK.

In our above example, we call fcntl() with F_SETLKW as the argument, so it blocks until it
can set the lock, then sets it and continues.

Clearing a lock

Whew! After all the locking stuff up there, it's time for something easy: unlocking! Actually,
this is a piece of cake in comparison. I'll just reuse that first example and add the code to
unlock it at the end:

struct flock fl;


int fd;

fl.l_type = F_WRLCK; /* F_RDLCK, F_WRLCK, F_UNLCK */


fl.l_whence = SEEK_SET; /* SEEK_SET, SEEK_CUR, SEEK_END */
fl.l_start = 0; /* Offset from l_whence */
fl.l_len = 0; /* length, 0 = to EOF */
fl.l_pid = getpid(); /* our PID */

fd = open("filename", O_WRONLY); /* get the file descriptor */


fcntl(fd, F_SETLKW, &fl); /* set the lock, waiting if necessary */
.
.
.
fl.l_type = F_UNLCK; /* tell it to unlock the region */

fcntl(fd, F_SETLK, &fl); /* set the region to unlocked */

Now, I left the old locking code in there for high contrast, but you can tell that I just changed
the l_type field to F_UNLCK (leaving the others completely unchanged!) and
called fcntl() withF_SETLK as the command. Easy!

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

109 | P a g e NETWORKPROGRAMMING

FILE LOCKING-A DEMO PROGRAM:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char *argv[])


{
/* l_type l_whence l_start l_len l_pid */
struct flock fl = {F_WRLCK, SEEK_SET, 0, 0, 0 };
int fd;

fl.l_pid = getpid();

if (argc > 1)
fl.l_type = F_RDLCK;

if ((fd = open("lockdemo.c", O_RDWR)) == -1) {


perror("open");
exit(1);
}

printf("Press <RETURN> to try to get lock: ");


getchar();
printf("Trying to get lock...");

if (fcntl(fd, F_SETLKW, &fl) == -1) {


perror("fcntl");
exit(1);
}

printf("got lock\n");
printf("Press <RETURN> to release lock: ");
getchar();

fl.l_type = F_UNLCK; /* set to unlock same region */


Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

110 | P a g e NETWORKPROGRAMMING

if (fcntl(fd, F_SETLK, &fl) == -1) {


perror("fcntl");
exit(1);
}

printf("Unlocked.\n");

close(fd);

return 0;
}

Compile that puppy up and start messing with it in a couple windows. Notice that when
one lockdemo has a read lock, other instances of the program can get their own read locks
with no problem. It's only when a write lock is obtained that other processes can't get a lock
of any kind.

Another thing to notice is that you can't get a write lock if there are any read locks on the
same region of the file. The process waiting to get the write lock will wait until all the read
locks are cleared. One upshot of this is that you can keep piling on read locks (because a read
lock doesn't stop other processes from getting read locks) and any processes waiting for a
write lock will sit there and starve. There isn't a rule anywhere that keeps you from adding
more read locks if there is a process waiting for a write lock. You must be careful.

Practically, though, you will probably mostly be using write locks to guarantee exclusive
access to a file for a short amount of time while it's being updated; that is the most common
use of locks

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

111 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

112 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

113 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

114 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

115 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

116 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

117 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

118 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

119 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

120 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

121 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

122 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

123 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

124 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

125 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

126 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

127 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

128 | P a g e NETWORKPROGRAMMING

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

129 | P a g e NETWORKPROGRAMMING

Shared Memory Introduction


Introduction

Shared memory is the fastest form of IPC available. Once the memory is mapped into the
address space of the processes that are sharing the memory region, no kernel involvement
occurs in passing data between the processes. What is normally required, however, is some
form of synchronization between the processes that are storing and fetching information to
and from the shared memory region.

Flow of file data from client to server


Consider the normal steps involved in the client–server file copying program that we used as
an example for the various types of message passing.

• The server reads from the input file. The file data is read by the kernel into its memory and
then copied from the kernel to the process.
• The server writes this data in a message, using a pipe, FIFO, or message queue. These forms
of IPC normally require the data to be copied from the process to the kernel.
• The client reads the data from the IPC channel, normally requiring the data to be copied
from the kernel to the process.
• Finally, the data is copied from the client‘s buffer, the second argument to the write
function, to the output file.

A total of four copies of the data are normally required. Additionally, these four copies are
done between the kernel and a process, often an expensive copy (more expensive than
copying data within the kernel, or copying data within a single process).

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

130 | P a g e NETWORKPROGRAMMING

The problem with these forms of IPC—pipes, FIFOs, and message queues—is that for two
processes to exchange information, the information has to go through the kernel.

Shared memory provides a way around this by letting two or more processes share a region of
memory. The processes must, of course, coordinate or synchronize the use of the shared
memory among themselves. (Sharing a common piece of memory is similar to sharing a disk
file, such as the sequence number file used in all the file locking examples.)

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

131 | P a g e NETWORKPROGRAMMING

Shared Memory System Calls

The sections that follow will explore the shared memory system calls and discuss how they
were applied to this utility program. The discussion covers the following areas:

��Creating and Accessing Shared Memory


��Obtaining Information about Shared Memory
��Changing Shared Memory Attributes
��Attaching Shared Memory
��Detaching Shared Memory
��Destroying Shared Memory

Shared memory must be created, or it must be located if another process has already created
it. The program is given an IPC ID to refer to when it has been created or located. Once you
have this IPC ID, it is possible to inquire about the shared memory region attributes and
change some of them, such as the ownership and permissions. Before shared memory can be
read from or written to, it must be attached to the memory space of your current process. This
involves the selection of a starting address for your shared memory region.

When a process is finished with a shared memory region, it is able to detach it from its
memory space. Once all processes have finished with the shared memory region and detached
it, the region can be destroyed to give the memory back to the kernel.

Creating and Accessing Shared Memory

Shared memory is created and accessed if it already exists using the shmget(2) function. Its
function synopsis is as follows:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmget(key_t key, int size, int flag);

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

132 | P a g e NETWORKPROGRAMMING

The argument key is the value of the IPC key to use, or the value IPC_PRIVATE. The size
argument specifies the minimum size of the shared memory region required. The actual size
created will be rounded up to a platform-specific multiple of a virtual memory page size. The
flag option must contain the permission bits if shared memory is being created. Additional
flags that may be used include IPC_CREAT and IPC_EXCL, when shared memory is being
created.

The return value is the IPC ID of the shared memory region when the call is successful (this
includes the value zero). The value -1 is returned if the call fails, with errno set.

Obtaining Information About Shared Memory

Attributes of the shared memory, including its permissions and actual size, are obtained using
the shmctl(2) system call. Its function synopsis is as follows:

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmctl(int shmid, int cmd, struct shmid_ds *buf);

The argument shmid specifies the shared memory IPC ID, which is obtained from shmget(2).
The cmd is a shmctl(2) command value, while buf is an argument used with certain
commands. The valid commands for shmctl(2) are:

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

133 | P a g e NETWORKPROGRAMMING

Attaching Shared Memory


Shared memory must be attached to your process memory space before you can use it
as memory. This is performed by calling upon shmat(2):

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
void * shmat(int shmid, void *addr, int flag);

The argument shmid specifies the IPC ID of the shared memory that you want to attach to
your process. The argument addr indicates the address that you want to use for this. A null
pointer for addr specifies that the UNIX kernel should pick the address instead. The flag
argument permits the option flag SHM_RND to be specified. Specify 0 for flag if no options
apply.

When shmat(2) succeeds, a (void *) address is returned that represents the starting address of
the shared memory region. If the function fails, the value (void *)(-1) is returned instead.

The combination of the addr and the flag option SHM_RND allow three possible ways for
the memory region to be attached:

Detaching Shared Memory

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

134 | P a g e NETWORKPROGRAMMING

Detaching shared memory is automatically performed when your process terminates.


However, if you need to detach it before it terminates, you accomplish that with the shmdt(2)
function:

#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmdt(void *addr);

The shmdt(2) function simply accepts the address of the shared memory, as it was attached
by shmat(2), in argument addr. The return value is 0 when successful. Otherwise, -1 is
returned and errno holds the error code.

Destroying Shared Memory


The IPC_RMID command of shmctl(2) being used. The critical lines of code are repeated
here for your convenience:
41: /*
42: * Destroy shared memory :
43: */
44: z = shmctl(shmid,IPC_RMID,NULL);
45:
46: if ( z == -1 ) {
47: fprintf(stderr,"%s: shmctl(%d,IPC_RMID)\n",
48: strerror(errno),shmid);
49: exit(1);
50: }

Notice that argument three (buf) is not required by the IPC_RMID command for shmctl(2).
This code is exercised by the -r option of the globvar utility.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

135 | P a g e NETWORKPROGRAMMING

UNIT – VIII
REMOTE LOGIN
http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/ch18lev1sec1.html

Introduction

The handling of terminal I/O is a messy area, regardless of the operating system. The
UNIX System is no exception. The manual page for terminal I/O is usually one of the
longest in most editions of the programmer's manuals.

With the UNIX System, a schism formed in the late 1970s when System III developed a
different set of terminal routines from those of Version 7. The System III style of
terminal I/O continued through System V, and the Version 7 style became the standard
for the BSD-derived systems. As with signals, this difference between the two worlds has
been conquered by POSIX.1. In this chapter, we look at all the POSIX.1 terminal
functions and some of the platform-specific additions.

Part of the complexity of the terminal I/O system occurs because people use terminal
I/O for so many different things: terminals, hardwired lines between computers,
modems, printers, and so on.

Terminal Line Discipline and Its Modes


Overview
Terminal I/O has two modes:

1. Canonical mode input processing. In this mode, terminal input is processed as


lines. The terminal driver returns at most one line per read request.
2. Noncanonical mode input processing. The input characters are not assembled into
lines.

If we don't do anything special, canonical mode is the default. For example, if the shell
redirects standard input to the terminal and we use read and write to copy standard input
to standard output, the terminal is in canonical mode, and each read returns at most one
line. Programs that manipulate the entire screen, such as the vi editor, use noncanonical

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

136 | P a g e NETWORKPROGRAMMING

mode, since the commands may be single characters and are not terminated by
newlines. Also, this editor doesn't want processing by the system of the special
characters, since they may overlap with the editor commands. For example, the Control-
D character is often the end-of-file character for the terminal, but it's also a vi command
to scroll down one-half screen.

The Version 7 and older BSD-style terminal drivers supported three modes for terminal
input: (a) cooked mode (the input is collected into lines, and the special characters are
processed), (b) raw mode (the input is not assembled into lines, and there is no
processing of special characters), and (c) cbreak mode (the input is not assembled into
lines, but some of the special characters are processed). Figure 18.20shows a POSIX.1
function that places a terminal in cbreak or raw mode.

POSIX.1 defines 11 special input characters, 9 of which we can change. We've been
using some of these throughout the text: the end-of-file character (usually Control-D)
and the suspend character (usually Control-Z), for example. Section 18.3 describes each
of these characters.

We can think of a terminal device as being controlled by a terminal driver, usually within
the kernel. Each terminal device has an input queue and an output queue, shown
in Figure 18.1.

Figure 18.1. Logical picture of input and output queues for a


terminal device

There are several points to consider from this picture.

 If echoing is enabled, there is an implied link between the input queue and the
output queue.

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

137 | P a g e NETWORKPROGRAMMING

 The size of the input queue, MAX_INPUT (see Figure 2.11), is finite. When the input
queue for a particular device fills, the system behavior is implementation
dependent. Most UNIX systems echo the bell character when this happens.
 There is another input limit, MAX_CANON, that we don't show here. This limit is the
maximum number of bytes in a canonical input line.
 Although the size of the output queue is finite, no constants defining that size are
accessible to the program, because when the output queue starts to fill up, the
kernel simply puts the writing process to sleep until room is available.
 We'll see how the tcflush flush function allows us to flush either the input queue or
the output queue. Similarly, when we describe the tcsetattr function, we'll see how
we can tell the system to change the attributes of a terminal device only after the
output queue is empty. (We want to do this, for example, if we're changing the
output attributes.) We can also tell the system to discard everything in the input
queue when changing the terminal attributes. (We want to do this if we're
changing the input attributes or changing between canonical and noncanonical
modes, so that previously entered characters aren't interpreted in the wrong
mode.)

Most UNIX systems implement all the canonical processing in a module called
the terminal line discipline. We can think of this module as a box that sits between the
kernel's generic read and write functions and the actual device driver (see Figure 18.2).

Figure 18.2. Terminal line discipline

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

138 | P a g e NETWORKPROGRAMMING

All the terminal device characteristics that we can examine and change are contained in
a termios structure. This structure is defined in the header <termios.h>, which we use
throughout this chapter:

struct termios {
tcflag_t c_iflag; /* input flags */
tcflag_t c_oflag; /* output flags */
tcflag_t c_cflag; /* control flags */
tcflag_t c_lflag; /* local flags */
cc_t c_cc[NCCS]; /* control characters */
};

Roughly speaking, the input flags control the input of characters by the terminal device
driver (strip eighth bit on input, enable input parity checking, etc.), the output flags
control the driver output (perform output processing, map newline to CR/LF, etc.), the
control flags affect the RS-232 serial lines (ignore modem status lines, one or two stop
bits per character, etc.), and the local flags affect the interface between the driver and
the user (echo on or off, visually erase characters, enable terminal-generated signals,
job control stop signal for background output, etc.).

The type tcflag_t is big enough to hold each of the flag values and is often defined as
an unsigned int or an unsigned long. The c_cc array contains all the special characters that we
can change. NCCS is the number of elements in this array and is typically between 15 and
20 (since most implementations of the UNIX System support more than the 11 POSIX-
defined special characters). The cc_t type is large enough to hold each special character
and is typically an unsigned char.

Versions of System V that predated the POSIX standard had a header


named <termio.h> and a structure named termio. POSIX.1 added an s to the names, to
differentiate them from their predecessors.

Overview

Pseudo Terminal
The term pseudo terminal implies that it looks like a terminal to an application program,
but it's not a real terminal. The diagram shows the typical arrangement of the processes

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org
www.UandiStar.org

139 | P a g e NETWORKPROGRAMMING

involved when a pseudo terminal is being used. The key points in this figure are the
following.

Typical arrangement of processes using a pseudo terminal

 Normally, a process opens the pseudo-terminal master and then calls fork. The child
establishes a new session, opens the corresponding pseudo-terminal slave, duplicates
the file descriptor to the standard input, standard output, and standard error, and then
calls exec. The pseudo-terminal slave becomes the controlling terminal for the child
process.
 It appears to the user process above the slave that its standard input, standard output,
and standard error are a terminal device. The process can issue all the terminal I/O
functions on these descriptors. But since there is not a real terminal device beneath the
slave, functions that don't make sense (change the baud rate, send a break character,
set odd parity, etc.) are just ignored.
 Anything written to the master appears as input to the slave and vice versa. Indeed, all
the input to the slave comes from the user process above the pseudo-terminal master.
This behaves like a bidirectional pipe, but with the terminal line discipline module
above the slave, we have additional capabilities over a plain pipe.

http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/main.html

Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.

www.UandiStar.org

You might also like