Raw Sockets

Raw sockets, are those that bypass the TCP and IP layers and pass the
ICMPv4, (Internet Control Message Protocol), IGMPv4 ()Internet Group
Management Protocol – used with multicasting) and ICMPv6 packets directly to
the link layers.
This allows the application to build ICMP and IGMP entirely as user processes
instead of putting more code into the kernel. Examples are route discovery
daemon which processes router advertisement and router solicitation are built this
With raw sockets a process can read and write IPv4 datagram with IPv4 protocol
filed ( an 8 bit filed in IPv4 packet) that is not processed by the kernel. Most
kernels process datagrams containing values of 1 (ICMP), 2 (IGMP), 6 (TCP),
and 17 (UDP). But values like 89 (OSPF) routing protocol does not use TCP or
UDP but uses IP directly by setting the protocol field to 89.
With raw sockets, a process can build its own IPv4 header using the
IP_HDRINCL socket option
1. To create raw sockets, the second argument in socket function SOCK_RAW.
And the third argument is nonzero (normally) as shown below:
Int sockfd;
Sockfd = socket (AF_INET, SOCK_RAW, protocol);
In this the protocol is th one of the constants defined by IPPROTO_XXX which is
done by including <netinet/in.h> header. For example IPPROO_ICMP. Only
super user can create raw socket.
2. The IP-HDRINCL socket option can be set to: const int ON =1;
if (setsocketopt(sockfd, IPPROTO_IP, IP_HDRINCL, &ON, soze0f(ON))
<0) error
3. Bind may not be called on raw sockets. If called, it sets the local IP address and
not the port number as there is no concept of port number with raw sockets. With
regard to output, calling bind sets the IP address that will be used for datagrams
sent on the raw socket (only if IP_HDRINCL socket option is not set). If bind is
not called, the kernel sets the source IP address of the outgoing interface.
4. connect can be call on the raw socket but this is also rare. This function sets
only the foreign address and again there is no concept of port number. With
regard to output, calling connect lets us call write or send instead of sendto, since
the destination IP address is already specified.
Raw Socket Output:
The output of raw socket is governed by the following rules:
•         Normal output is performed by calling sendto or sendmsg and specifying
the destination IP address. IN case the socket has been
connected, write and send functions can be used.
•         If the IP_HDRINCL option is not set, the IP header will be built by the
kernal and it will be prepend it to the data.
•         If IP_HDRINCL is set, the header format will remain the same and the
process builds the entire IP header except the IPv4 identification field
which is set to 0 by the kernel 
•         The kernel fragments the raw packets that exceed the outgoing interface.
IPv6 Differences:
•         All fields in the protocol headers sent or received on a raw IPv6 sockets are
in network byte order.
•         There ae no option fields in IPv6 format. Almost all fields in an IPv6
header and all extension headers (Optional header that follow have their
own length field. There is a separate fragmentation header.) are available to
the application through socket options.
•         Checksum are handled differently.
IPv6_CHECKSUM Socket option
•         In case of ICMPv4, the checksum is calculated by the application. Whereas
in the application it is done by the kernel.
Raw Socket Input:
The question to be answered in this is which received IP datagrams does the
kernel pass to raw sockets.
•         Received TCP and UDP packets are never passed to a raw socket.
•         Most ICMP packets are passed to a raw socket after the kernel has finished
processing the ICMP message. BSD derived implementations pass all
received ICMP raw sockets other than echo requests, timestamp request and
address mask request. These three ICMP messages are processed entirely
by the kernel.
•         All IGMP packets are passed to a raw sockets, after the kernel has finished
processing the IGMP message.
•         All IP datagram with a protocol field that kernel does not understand are
passed to a raw socket. The only kernel processing done on these packets is
the minimal verification of some IP header field: IP version, IPv4 Header
checksum, header length and the destination IP address.
•         If the datagram arrives in fragments, nothing is passed to a raw sockets
until all fragments have arrived and have been reassembled.
When kernel has to pass IP datagram, it should satisfy all the three tests given
•         If a nonzero protocol is specified when the raw socket is created (third
argument to socket), then the received datagram‘s protocol field must
match this value or the datagram is not delivered.
•         IF a local IP address is bound to the raw socket by bind, then the
destination IP address of the received datagram must match this bound
address or the datagram is not delivered.
•         IF foreign IP address was specified for the raw socket by connect, then the
source IP address of the received datagram must match this connected
address or datagram is not delivered.
If a raw socket is created with protocol of 0, and neither bind or connect is called,
then that socket receives a copy of every raw datagram that kernel passes to raw
When a received datagram is passed to a raw IPv4 socket, the entire datagram,
including the IP header, is passed to the process.
ICMPv6 Type Filtering:
A raw ICMPv6 is a superset of ICMPv4, ARP and IGMP and hence the socket
can receive many more packets compared to ICMPv4 socket. To reduce the
number of packets passed form kernet ot the application , an application specific
filter is provided A filter is declared with a data type of struct icmp_filter which
is defined by including <netinet/icmp6.h> header. The current filter for a raw
socket is set and fetched using setsockopt and getsockopt with a level
of IPPROTO_ICMPv6 and optname
Ping Program:
In this ICMP echo request is sent to some IP address and that the node responds
with an ICMP echo reply. These two ICMP messages are supported under IPv4
and IPv6. Following figure shows the format of the ICMP messages.

Checksum is the standard Internet Checksum,

Identifier is set to the process ID of the ping process and the sequence number is
incremented by one for each packet that we send. An 8 buit timestamp is stored
when a packet is sent as optional data. The rules of ICMP requires that the
identifier, sequence number and any optional data be returned in the echo reply
Storing the timestamp in the packet lets us calculate the RTT when the reply is
Trace route Program:
Traceroute lets us determine the path that IP datagrams follow from our host
to some other destination. Its operation is simple and Chapter 8 of TCPv1
covers it in detail with numerous examples of its usage. traceroute uses the
IPv4 TTL field or the IPv6 hop limit field and two ICMP messages. It starts by
sending a UDP datagram to the destination with a TTL (or hop limit) of 1.This
datagram causes the first-hop router to return an ICMP "time exceeded in
transit" error. The TTL is then increased by one and another UDP datagram is
sent, which locates the next router in the path. When the UDP datagram
reaches the final destination, the goal is to have that host return an ICMP "port
unreachable" error. This is done by sending the UDP datagram to a
random port that is (hopefully) not in use on that host.
The figure shows our trace.h header, which all our program files include.
1–11 We include the standard IPv4 headers that define the IPv4, ICMPv4, and
UDP structures and constants. The rec structure defines the data portion of the
UDP datagram that we send, but we will see that we never need to examine
this data. It is sent mainly for debugging purposes.
Define proto structure
32–43 As with our ping program in the previous section, we handle the
protocol differences between IPv4 and IPv6 by defining a proto structure that
contains function pointers, pointers to socket address structures, and other
constants that differ between the two IP versions. The global pr will be set to
point to one of these structures that is initialized for either IPv4 or IPv6, after
the destination address is processed by the main function (since the destination
address is what specifies whether we use IPv4 or IPv6).
Include IPv6 headers
44–47 We include the headers that define the IPv6 and ICMPv6 structures and
Figure trace.h header.
1        #include      "unp.h"      
2 #include   <netinet/in_systm.h>
3        #include      <netinet/ip.h>      
4 #include   <netinet/ip_icmp.h>
5        #include      <netinet/udp.h>
6        #define BUFSIZE1500
7 struct rec {        /* of outgoing UDP data */
8        u_short rec_seq;   /* sequence number */
9        u_short rec_ttl;     /* TTL packet left with */
10      struct timeval rec_tv;     /* time packet left */
11      };
12      /* globals */
13 char       recvbuf [BUFSIZE];
14 char       sendbuf [BUFSIZE];
15 int datalen;       /* # bytes of data following ICMP header */
16 char       *host;         
17 u_short sport, dport;         
18 int nsent;          /* add 1 for each sendto () */
19 pid_t      pid;   /* our PID */
20 int probe, nprobes;   
21 int sendfd, recvfd;      /* send on UDP sock, read on raw ICMP sock */
22 int ttl, max_ttl;
23 int verbose;     
24      /* function prototypes */
25      const char *icmpcode_v4 (int);
26      const char *icmpcode_v6 (int);
27 int recv_v4 (int, struct timeval *);
28 int recv_v6 (int, struct timeval *);
29 void       sig_alrm (int);               
30 void       traceloop (void);           
31 void       tv_sub (struct timeval *, struct timeval *);
32 struct proto {           
33      const char *(*icmpcode) (int);
34      int     (*recv) (int, struct timeval *);
35      struct sockaddr *sasend;         /* sockaddr{} for send, from getaddrinfo */
36      struct sockaddr *sarecv;          /* sockaddr{} for receiving */
37      struct sockaddr *salast; /* last sockaddr{} for receiving */
38      struct sockaddr *sabind;         /* sockaddr{} for binding source port */
39      socklen_t salen;    /* length of sockaddr{}s */
40      int     icmpproto;  /* IPPROTO_xxx value for ICMP */
41      int     ttllevel;        /* setsockopt () level to set TTL */
42      int     ttloptname; /* setsockopt () name to set TTL */
43      } *pr;                   
44      #ifdef IPV6
45      #include      <netinet/ip6.h>
46      #include      <netinet/icmp6.h>
47      #endif                            
The main function is shown in Figure 28.18 (p. 759). It processes the command-
line arguments, initializes the pr pointer for either IPv4 or IPv6, and calls our
traceloop function.
Define proto structures
2–9 We define the two proto structures, one for IPv4 and one for IPv6, although
the pointers to the socket address structures are not allocated until the end of this
Set defaults
10–13 The maximum TTL or hop limit that the program uses defaults to 30,
although we provide the -m command-line option to let the user change this. For
each TTL, we send three probe packets,but this could be changed with another
command-line option. The initial destination port is 32768+666, which will be
incremented by one each time we send a UDP datagram. We hope that these ports
are not in use on the destination host when the datagrams finally reach the
destination,but there is no guarantee.
Process command-line arguments
19–37 The -v command-line option causes most received ICMP messages to be
Process hostname or IP address argument and finish initialization
38–58 The destination hostname or IP address is processed by our host_serv
function, returning a pointer to an addrinfo structure. Depending on the type of
returned address, IPv4 or IPv6, we finish initializing the proto structure, store the
pointer in the pr global, and allocate additional socket address structures of the
correct size.
59 The function traceloop , shown in Figure 28.19 , sends the datagrams and reads
the returned ICMP messages. This is the main loop of the program.

