Nothing Special   »   [go: up one dir, main page]

Unit II-final

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 35

DEPARTMENT OF COMPUTER SCIENCE

ELECTIVE: WEB TECHNOLOGY

COURSE MATERIAL- UNIT II

DNS – E-mail – FTP – TFTP – History of WWW – Basics of WWW and Browsing -
Local information on the internet – HTML – Web Browser Architecture – Web Pages
and Multimedia – Remote Login (TELNET).

DNS (Domain Name System)


• The DNS Name Space

• DNS Server

• Introduction

• How the DNS Server works

Introduction
• A domain name is a name given to a network for ease of reference by humans.

• The term domain refers to a group of computers that are known by a single
common name.

• Finally, the domain names will be translated into IP address which is shown in
fig. 5.1

• A computer can be referred with the name, and to make it a unique one, internet
naming conventions are used.
• The full name of a computer consists of its local name followed by a period and
the organization’s suffix.

• Example, if a person X works in an organization IBM. Their computer name


would be X.IBM

• As the name of the organization could be same, this method is not good
enough.

• Country specific domain name can also be add3ed such as, www.bbc.co.uk
indicating that the physical location of the web server is in England.
• Thus humans use domain names to refer to computers on the internet,
whereas computers use only IP address.

• When the domain name is typed on the computer, it is actually translated into
corresponding IP address.

• How domain name is translated into corresponding IP address? – By


using Domain Name System (DNS)

• In the initial days, the domain names(also known as host names) and their
associated IP addresses were recorded in a single file called hosts,txt.

• The Network Information Center(NIC) in the US maintained this file.

• A portion of the hypothetical hosts.txt file is shown in Table 5.3

• Every night, all the hosts attached to the Internet would obtain a copy of this file
to refresh their domain name entries.

• As the Internet grew at a breathtaking pace, so did the size of this file.

• By mid-1980s, this file had become extremely huge. Therefore, it was too large
to copy to all systems and almost impossible to keep it up-to-date.

• These problems of maintaining hosts.txt on a single server can be summarized,


as shown in Table 5.4.

The DNS Namespace


• The Internet is theoretically divided into hundreds of top-level domains. •
Each of these domains, in turn has several hosts underneath.
• Also, each domain can be further sub-divided into sub-domains, which can
be further classified into sub-sub-domains, and so on.

• For instance, if we want to register a domain called as Honda under the


category auto , which is within in (for India), the full path for this domain
would be Honda.auto.in .

• Similarly, from Fig. 5.4, it can be seen that Atul.maths.oxford.edu identifies the
complete path for a computer under the domain Atul , which is under the
domain maths , which is under the domain oxford , and which is finally under
the domain edu

DOMAIN NAME SYSTEM (DNS)


• This creates a tree-like structure as shown in the figure

• Note that a leaf represents a lowest-level domain that cannot be classified


further (but contains hosts).

• The topmost domains are classified into two main categories, General (which
means, the domains registered in the US) and countries .

• The General domains are sub-classified into categories, such as


• com (commercial),
• gov (the US federal government),

• edu (educational),

• org (non-profit organizations)

• mil (the US military) and

• net (network providers).

• The country domains specify one entry for each country, i.e., uk (United
Kingdom), jp(Japan), in (India), and so on. Each domain is fully
qualified by the path upward from it to the topmost (un named) root. The
names within a full path are separated by a dot.
• A full pathname can be up to 255 characters long including the dots, and
each component within it can be up to a maximum of 63 characters.

• Also, there could be as many dots in a domain name as you could have—
within each component, separated by dots.

DNS Server
• A domain name server is simply a computer that contains the database
and the software for mapping between domain names and IP
addresses.

• Every domain has a domain name server. It handles requests coming to


computers owned by it and also maintains the various domain entries.

• For example, IBM has hundreds of thousands of IP addresses and domain


names. IBM would like to maintain its own Domain Name System Server
(DNS Server) , also simply called Domain Name Server ,for the IBM.com
domain.

• IBM is totally responsible for maintaining the name server for IBM.com.

How does the DNS server work ?


When a request comes in, a DNS server has the following options.

1. It can supply the IP address because it already knows the IP address for the
domain.

2. It can contact another DNS server and try to locate the IP address for the
name requested. It may have to do this more than once. Every DNS server
has an entry called as alternate DNS server, which is the DNS server it should
get in touch with for unresolved domains.

3. It suggests the name of another DNS server.

4. It can return an error message because the requested domain name is invalid
or does not exist. This is shown in Fig. 5.5

How does the DNS server work ?


For using DNS, an application program performs the following operations.

1.The application program interested in obtaining the IP address of another host


on the Internet calls a library procedure called as

Resolver , sending it the domain name for which the corresponding IP address
is to be located. The resolver is an application program running on the host.

2. The resolver sends a UDP packet to the nearest DNS server.

3.The local DNS server looks up the domain name and returns the IP address to the
resolver.

4.The resolver returns the IP address back to the calling application

Electronic Mail (EMAIL)


• Introduction

• The mailbox

• Sending and receiving an email

• Email Anatomy

• The email transfer protocols

• POP server

• SMTP server

• Differences between SMTP and POP

• The complete journey of an email

• IMAP protocol

• Browser based emails

• Multipurpose Internet Mail Extensions(MIME)

• Email Privacy

• Pretty Good Privacy (PGP)

Privacy Enhanced Mail (PEM)

EMAIL – Introduction
The best features of email are given as follows.

(a)The speed of email is almost equal to that of telephonic conversations.

(b)The recording of the email messages in some form is like the postal system
(which is even better than the telephone system). Thus, email combines the
best of the features of the telephone system and the postal system, and is yet
very cheap.

From the view point of users, email performs the following five
functions

• Composition
• Transfer

• Reporting

• Displaying

• Disposition

EMAIL - The Mailbox

EMAIL - The Mailbox


• The email service provided by the Internet differs from other communication
mechanisms in one more respect.

• This feature, called as spooling , allows a user to compose and send an email
message even if his network is currently disconnected or the recipient is not
currently connected to his end of the network.
• When an email message is sent, a copy of the email is placed in a storage
area on the server’s disk, called as spool.

• A spool is a queue of messages. The messages in a spool are sent on a first


come first searched basis.

• An email is sent to a person using the person’s email address.

EMAIL - Sending and Receiving an Email


• The software that enables the email system to run smoothly, i.e., the email
software, has two parts.

• One part that runs on the client (user’s) PC is called as email client
software and the other part that runs on the email server is called as email
server software.

• The mail client software is a program that allows the user to compose an
email and specify the intended recipient’s email address.

• The composing part is very similar to simple word processing. It allows features
such as simple text to be typed in, adjusting the spacing, paragraphs,
margins, fonts and different ways of displaying characters (e.g., bold, italics,
underlining, etc.).

• Using the recipient’s email address, the email travels from the source to
the email server of the source, and then to the recipient’s email server.

• The underlying protocol used is again TCP/IP.

• That means that the bits in the contents of the email (text, image, etc.) are
broken down into packets as per TCP/IP format and re-assembled at the
recipient’s end.

• In-between the nodes, the error/flow control and routing functions are
performed as per the different protocols of different networks.

• The email software itself is divided into two parts, client portion and server
portion.

• The client portion allows you to compose a message, forward it, reply to a
message, and also display a received message.

• The server portion essentially manages the mailbox to store the messages
temporarily and deliver them when directed.

EMAIL - Email Anatomy


Each electronic mailbox on the server has a unique email address. This consists of
two parts—the name of the user and the name of the domain. The @ symbol joins
them to form the email address, as shown in Fig. 5.9

The Components of the email architecture,


User Agent (UA)

The user agent is the user interface client email software (such as Microsoft
Outlook Express, Lotus Notes, Netscape Mail, etc.) that provides the user facilitates
for reading an email message by retrieving it from the server, composing an email
message in a Word-processor like format, etc.

Mailbox

There is one mailbox per user, which acts as the email storage system for that user.
Spool
It allows storing of email messages sent by the user until they can be sent to the
intended recipient.

Mail Transfer Agent (MTA)

The mail transfer agent is the interface between the email system and the local email
server
EMAIL - The email transfer protocols

• The email transfer protocols are

• POP server
• SMTP server

POP server

• The Post Office Protocol (POP) provides a standard method of retrieving


emails from the remote server.

• In effect, SMTP transfers emails from the sender’s computer to the sender’s
email server and from there to the receiver’s email server. POP then allows
the receiver to remotely or locally log on to the receiver’s email server and
retrieve those waiting emails.

• In other words, POP (like IMAP) works only at the receiver’s end, and has no
role to play at the sender’s side.

POP has two parts,

• 1. a client POP (i.e., the receiver’s POP) and

• 2. a server POP (which uses the receiver’s email server).

• The client (i.e., the receiver) opens a TCP connection with the receiver’s POP
server on well-known port 110.

• The client user name and password to access the mailbox are sent along with
it.

• Provided these are correct, the receiver user can list and receive emails from
the mailbox.

POP supports

• delete mode (i.e., delete emails from the mailbox on the email server once
they are downloaded to the receiver’s computer )and

• keep mode (i.e., keep emails in the mailbox on the email server even after
they are downloaded to the receiver’s computer).

• The default option is delete.

• A POP session between a client and a server has three states, one after the
other, as given below.

• Authorization state

• Here, the server does a passive open and the client authenticates itself. •
Transaction state
• Here, the client is allowed to perform mailbox operations
(view/retrieve/delete/...mails).

• Update state

• Here, the server deletes messages marked for deletion, session is closed,
and TCP connection is terminated.

SMTP Server

• Simple Mail Transfer Protocol (SMTP) is at the heart of the email system.

• In SMTP, the server keeps waiting on well-known port 25. SMTP consists of two
aspects, UA and MTA,

• SMTP actually performs two transfers,

• (a) from the sender’s computer to the sender’s SMTP server, and

• (b) from the sender’s SMTP server to the receiver’s SMTP server.

• The last leg of transferring emails between the receiver’s SMTP server and the
receiver’s computer is done by one of the two other email protocols, called as
POP or IMAP

SMTP Server
• Steps involved in communication between the client and server using SMTP
server are:

• In SMTP, client sends one or more commands to the server. Server returns
responses.

SMTP Server

Phase 1:Connection establishment

• Here, the following steps happen.

• 1. Client makes active TCP connection with the server on server’s well-known
port number 25.

• 2. Server sends code 220 (service ready), else 421 (service not available).

• 3. Client sends HELO message to identify itself using its domain name.

• 4. Server responds with code 250 (request command completed) or an error.


Phase 2: Mail transfer
This phase is the most important one, as it actually involves the transfer of email
contents from the sender to the receiver. It consists of the following steps as an
example.

1.Client sends MAIL message, identifying the sender.


2.Server responds with 250 (ok).

3.Client sends RCPT message, to identify the receiver.

4.Server responds with 250 (ok).

5.Client sends DATA to indicate start of message ransfer.

6.Server responds with 354 (start mail input).


7.Client sends email header and body in consecutive lines.
8.The message is terminated with a line containing just a eriod.
9.The server responds with 250 (ok).

Phase 3:Connection termination

This phase is very simple.

Here, the client sends a QUIT command, which the server acknowledges, as
mentioned below.

1.Client sends the QUIT message.


2.Server responds with 221 (service closed) message.

3.TCP connection is closed.

SMTP Server
IMAP PROTOCOL
• POP is very popular but is offline (mail is retrieved from the server and
deleted from there).

• POP was made disconnected to achieve this functionality (i.e., retrieve mail
onto the client computer, but do not delete from the server; synchronize
changes, if any).

• This is not always desired. Hence, a different email access and retrieval
protocol is necessary.

• That protocol is Internet Mail Access Protocol (IMAP)

• IMAP is more powerful and also more complex than POP.


• It allows folder creation on the server, reading the mail before retrieval, search
for email contents on the server, etc.

• Here, work is focused on email server rather than downloading emails on the
client before doing anything else (unlike what happens in the case of POP).

• In this protocol, the server does a passive open on well-known port number
143.

• TCP three-way handshake happens and client and server can use IMAP over a
new session that gets created.

Multipurpose Internet Mail Extensions (MIME)

• The SMTP protocol can be used to send only NVT 7-bit ASCII text. •
It cannot work with some languages(French, German, etc. ...).
• Furthermore, it cannot be used to send multimedia data (binary files, video,
audio,etc.).

• Here is where the Multipurpose Internet Mail Extensions (MIME)


protocol extends SMTP to allow for non-ASCII data to be sent.
• We should note that it is not an email transfer/access/retrieval protocol, unlike
SMTP, POP, and IMAP.

Multipurpose Internet Mail Extensions (MIME)

• The way MIME works is quite simple from a conceptual viewpoint. MIME
transforms non-ASCII data at the sender’s end into NVT ASCII and delivers it
to the client SMTP for transmission.

• At the receiver’s end, it receives NVT ASCII data from the SMTP server and
transforms it back into the original (possibly non-ASCII)data.

• This is shown in Fig. 5.23

Multipurpose Internet Mail Extensions (MIME)


• For performing all the operations, the concept of MIME headers •
is used.
• MIME defines five headers that can be added to the original SMTP header
section to define the transformation parameters.

• These five headers are given below.

• MIME-Version

• Content-Type

• Content-Transfer-Encoding

• Content-Id

• Content-Description
Email Privacy

What if the email message gets trapped on its way and is read by an
unintended recipient?

To resolve this issue, the,Pretty Good Privacy (PGP) is widely used. A


slightly older protocol called as Privacy Enhanced Mail (PEM) also exists.

Pretty Good Privacy (PGP)


• Phil Zimmerman is the father of the Pretty Good Privacy (PGP) protocol.

• The most significant aspects of PGP are that it supports the basic
requirements of cryptography, is quite simple to use, and is completely free,
including its source code and documentation.

• Moreover, for those organizations that require support, a low-cost commercial


version of PGP is available from an organization called as Via crypt (now
Network Associates).

• PGP has become extremely popular and is far more widely used, as
compared to PEM

Privacy Enhanced Mail (PEM)

• PEM, unlike PGP, was the effort of a working group, and not of an individual.

• Messages sent with PEM are first translated into a canonical (common) form
so that the same conventions about white spaces, tabs, carriage returns and
linefeeds are used.

• This transformation ensures that the MTAs that sometimes modify messages
because they do not understand certain characters are not allowed to do so
here.

• Next, the same principles of public key encryption are used.

• Unlike PGP, PEM does not support compression. The encryption in PGP is
done using 128-bit keys.

• However, in PEM, this is done by using only 56-bit keys.

FTP - TFTP
FTP

• Introduction

• The issues with File transfers

• FTP Basics
• FTP Connections

• Control Connection

• Data Transfer Connection

• Client Server Communication using FTP

• Control connection

• Data transfer connection

FTP – Introduction

• FTP is used when we want to receive or send a file from or to a remote


computer.

• A special software and set of rules called File TransferProtocol (FTP) exists
for this purpose.

• FTP is a high-level (application layer) protocol that is aimed at providing a


very simple interface for any user of the Internet to transfer files.

• At a high level, a user (the client) requests the FTP software to either retrieve
from or upload a file to a remote server.

• Figure 5.30 shows at a broad level, how an FTP client can obtain a file ABC
from an FTP server

FTP - The Issues with File Transfers

• Emails are meant for short message transfers. FTP is meant for file transfers.

• When a user wants to download a file from a remote server, several issues
must be dealt with.

1. The client must have the necessary authorizations to download that file.

2. The client and server computers could be different in terms of their hardware
and/or operating systems.

3. An end user must not be concerned with these issues as long as he has the
necessary access rights

• FTP provides a simple file transfer mechanism for the end user, and internally
handles these complications.

FTP - FTP Basics

• FTP presents the user with a prompt and allows entering various commands
for accessing files on a remote computer.
• After invoking an FTP application, the user identifies a remote computer and
instructs FTP to establish a connection with it.

• FTP contacts the remote computer using the TCP/IP software.

• Once the connection is established, the user can choose to download/upload


a file from/to the remote computer

• FTP uses two connections between a client and a server.


1. One connection is used for the actual file’s data transfer

2. The other is used for control information (commands and responses). •


This separation of data transfer and commands makes FTP more efficient.
• Internally, this means that FTP uses two TCP/IP connections between the
client and the server.

. • If multiple files are to be transferred in a single FTP session, then the control connection
between the client and the server must remain active throughout the entire FTP session.

• The data transfer connection is opened and closed for each file that is to be
transferred.

• The data transfer connection opens every time the commands for transferring
files are used, and it gets closed when the file transfer is complete.

FTP - FTP Connections

The control and data transfer connections are opened and closed by the client
and the server during an FTP session.

Control connection

The process of the creation of a control connection between a client and a server is
similar to the creation of other TCP connections between a client and a server.

Specifically, two steps are involved here,

1. The server passively waits for a client (passive open). In other words, the server
waits endlessly for accepting a TCP connection from one or more clients.
2. The client actively sends an open request to the server (active open). That is, the
client always initiates the dialog with the server by sending a TCP connection
request.

This is shown in Fig. 5.32.

The opening of a control connection internally consists of the following steps.


1. The user on the client computer opens the FTP client software. The FTP client
software is a program that prompts the user for the domain name/IP address of the
server.
2. When the user enters these details, the FTP software on the client issues a TCP
connection request to the underlying TCP software on the client. Of course, it
provides the IP address of the server with which the connection is to be established.

3. The TCP software on the client computer then establishes a TCP connection
between the client and the server using a three-way handshake.

4. When a successful TCP connection is established between the client and the
server, an FTP server program is ready to serve the client’s requests for file transfer.

Data transfer connection

• The connection for data transfer, in turn, uses the control connection
established previously.

• The data transfer connection is always first requested for by the client.
Let us understand how the data transfer connection is opened.
• The client issues a passive open command for the data transfer connection.

• This means that the client has opened a data transfer connection on a
particular port number, say X , from its side.
• The client uses the control connection established earlier, to send this port
number to the server.

• The server receives the port number ( X ) from the client over the control
connection, and invokes an open request for the data transfer connection on
its side.

• This means that the server has also now opened a data transfer connection.

• This connection is always on port 20—the standard port for FTP on any
server.

Client-server Communication using FTP


• Once the control and data transfer connections are opened, the client and the
server are now ready for transferring files.

• Note that the client and the server can use different operating systems, file
formats, character sets and file structures.

• FTP must resolve all these incompatibility issues.

• Let us now study how FTP achieves this, using the control connection and the
data transfer connection.

Client-server Communication using FTP

Control connection

• The control connection is pretty simple.


• Over the control connection, the FTP communication consists of one request
and one response.
• This request-response model is sufficient for FTP, since the user sends one
command to the FTP server at a time.

Client-server Communication using FTP


Control connection

• The requests sent over the control connection are four-character commands,
such as

• QUIT (to log out of the system),

• ABOR (to abort the previous command),

• DELE (to delete a file),

• LIST (to view the directory structure),

• RETR (to retrieve a file from the server to the client),

STOR (to upload a file from the client to the server), etc.

Client-server Communication using FTP

Data transfer connection

• The data transfer connection is used to transfer files from the server to the
client or from the client to the server, as shown in Fig. 5.35.

• As we have noted before, this is decided based on the commands that travel
over the control connection

Client-server Communication using FTP

The sender must specify the following attributes of the file

1. Type of the file to be transferred

2. The structure of the data

3. The transmission mode

Type of the file to be transferred

• The file to be transferred can be an ASCII, EBCDIC or Image file.

• If the file has to be transferred as ASCII or EBCDIC, the destination must be


ready to accept it in that mode.
• If the file is to be transferred without any regard to its content, the third type is
used. This third and last type—Image

Client-server Communication using FTP

The structure of the data

FTP can transfer a file across a data transfer connection by interpreting its structure
in the following ways.

1. Byte-oriented structure

The file can be transmitted as a continuous stream of data (byte-oriented structure),


wherein no structure for the file is assumed.

2. Record-oriented structure

The other option for the structure of the file being transferred is the record-oriented
structure, where the file is divided into records and these records are then sent
one by one

Client-server Communication using FTP

The transmission mode

FTP can transfer a file by using one of the three transmission modes as described
below,

1. Stream mode

• The default mode

• Data is delivered from FTP to TCP as a continuous stream of data

2. Block mode
• Data can be delivered in terms of blocks.

• Each data block follows a three-byte header.

The first byte of the header is called as block descriptor, whereas the remaining
two bytes define the size of the block

Client-server Communication using FTP

The transmission mode

3. Compressed mode

• If the file to be transferred is big, it can be compressed before it is sent.

• Normally, the Run Length Encoding (RLE) compression method is used for
compressing a file.
• This method replaces repetitive occurrences of a data block by the first
occurrence only, and a count of how many times it repeats is stored along
with it.

Trivial File Transfer Protocol (TFTP)

• The Trivial File Transfer Protocol (TFTP) is a protocol used for transferring
files between two computers, similar to what FTP is used for.

• FTP uses the reliable TCP as the underlying transport layer protocol

• TFTP uses the unreliable UDP protocol for data transport.


• Other minor differences between FTP and TFTP are that while FTP allows
changing directory of the remote computer or to obtain a list of files in the
directory of the remote computer, TFTP does not allow this.

• Also, there is no interactivity in TFTP.

It is a protocol designed for purely transferring files

• TFTP does not allow for user authentication unlike FTP. Therefore, TFTP
must not be used on computers where sensitive/confidential information is
stored.

• Does not provide much error checking mechanism

• TFTP transfers data in fixed-size blocks of 512 bytes each.

• The recipient must acknowledge each such data block before the sender
sends the next block.

• Also, unlike FTP, there is no provision for resuming an aborted file transfer
from its last point.

History of WWW

• World Wide Web

• Hypertext

• Tim Berners-Lee

• Hypertext server

• Marc Andreessen

• Mosaic

• Web browser

• Netscape Navigator
• Internet Explorer

World Wide Web Consortium (W3C)

World Wide Web

• The most popular application running on the Internet.

• Refers to a set of Internet protocols and software, which together present


information to a user in a format called as Hypertext.

• WWW became quite popular in mid 1990s.

• TimBerners-Lee did the primary work in the development of the WWW at the
European Laboratory for Particle Physics (CERN).

• Motivation for the development of the WWW - was to try and improve the
CERN’s research-document handling and sharing mechanisms.

Hypertext server program

• In a couple of years’ time, Berners-Lee developed the necessary software


application for a hypertext server program, and made it available as a free
download on the Internet.

• A hypertext server stores documents in a hypertext format, and makes them


available over the Internet, to anyone interested. This paved the way for the
popularity of the Web

• Berners-Lee called his system of hypertext documents as the World Wide


Web (WWW).

• The Web became very popular among the scientific community in a short
span of time.

• Disadvantage: lack of availability of software to read the documents created in


the hypertext format, for the general public.

Web browser

• In 1993, Marc Andreessen and his team, at the University of Illinois, wrote a
program called as Mosaic.

• Mosaic could read a document created using the hypertext format, and interpret its
contents,
• so that they could be displayed on the user’s screen.

• This program, later known as the world’s first Web browser


• It opened the gates of the Web for the general public.

• Mosaic was a free piece of software, too.

• World wide network of computers, was accessible to anybody who had a PC,
an Internet connection, and a Web browser.

• So, business interests in the Web started developing fast.

• In 1994, Andreessen and his colleagues at the University of Illinois joined


hands with James Clarke of Silicon Graphics to form a new venture, named
as Netscape Communications. Their first product was Netscape Navigator, a
Web browser, based on Mosaic.

• Netscape Navigator was an instant hit.

• It became extensively popular in a very short time period, before Microsoft


realized the potential of the Web, and came up with their own browser—the
Internet Explorer.

• The World Wide Web Consortium (W3C) oversees the standards related to
the WWW.

The basics of WWW and Browsing

• Introduction – Basics of WWW and browsing

• How does a web server work?

• How does a web browser work?

• HTTP Commands

• Example of an HTTP interaction

• Proxy Server

Introduction
• Most of the companies and organizations have their Web sites consisting of a
number of pages, each.
• In addition, there are many portals, which can be used to do multiple
activities.

• Yahoo, for instance, can be used to send/receive emails, sell/buy goods, or


carry out auctions, etc.

• In order to attract more customers to their site, they create large Web pages
(“content”), which gives different news, information and entertainment
items.
• For instance, a site for buying/selling trains/ plane tickets can also give
information about hotels, tourist places, etc. This is called as the
content.

• Thus, WWW consists of thousands of such Web sites for thousands of


individuals and companies giving tremendous amount of information about
people, companies, events, history, news, etc.

• WWW is a huge, on-line repository of information that users can view using a
program called as a Web browser.

• Modern browsers allow a graphical user interface. So, a user can use the
mouse to make selections, navigate through the pages, etc

• The concepts of client-server communication and the use of TCP/IP software


apply here.

• Whatever is sent from the client to the server (request for a Web page), and
from the server to the client (actual Web page), is sent using TCP/IP as an
underlying protocol

The message is broken into IP packets and routed through various routers and
networks within the internet until they reach the final destination, where they are
reassembled after verifying the accuracy, etc

How Does a Web Server Work?

• A Web server is a program running on a server computer. • Additionally, it


consists of the Web site containing a number of Web pages.
• A Web page constitutes simply a special type of computer file written in a
specially designed language called as Hyper Text Markup Language
(HTML).

• Each Web page can contain text, graphics, sound, video and animation that
people want to see or hear.

• The Web server constantly and passively waits for a request for a Web page
from a browser program running on the client.

• When any such request is received, it locates that corresponding page and
sends it to the requesting client computer.

• To do this, every Web site has a server process that listens to TCP
connection requests coming from different clients all the time.
• After a TCP connection is established, the client sends one request and the
server sends one response.

• Then the server releases the connection.

• This request-response model is governed by a protocol called as Hyper Text


Transfer Protocol(HTTP).

• For instance, HTTP software on the client prepares the request for a Web page,
whereas the HTTP software on the server interprets such a request and
prepares a response to be sent back to the client.

• Thus, both client and server computers need to have HTTP software running
on them.

• A Web browser acts as the client in the WWW interaction.

• Using this program, a user requests for a Web page stored on a Web server.

• The Web server locates this Web page and sends it back to the client
computer.

• The Web browser then interprets the Web page written in the HTML
language/ format and then displays it on the client computer’s
screen.

How Does a Web Bowser Work?


STEP 1: The user on the client computer types the full file name including the
domain name of the Web server that hosts the Web page that he is interested in.

• This name is typed on a screen provided by the Web browser program


running on his computer.

• The full file name is called as Uniform Resource Locator (URL). For instance,
a URL could be

http://www.yahoo.com/index or only www.yahoo.com/index

• Because specifying http is optional, as we have mentioned.

• Here, http indicates the protocol.


• Index is the name of the file. It is stored on the Web server whose domain
name is yahoo.com

• Because it is a WWW application, it also has a www prefix.

• The forward slash (/) character indicates that the file is one of the many files
stored in the domain yahoo.com.

If the user wants another file called as newsoftheday from this site, he would type
http://www.yahoo.com/newsoftheday.

STEP 2: The browser requests DNS for the IP address corresponding to


www.yahoo.com
STEP 3: DNS replies with the IP address for www.yahoo.com

(let us say it is 120.10.23.21).

STEP 4: The browser makes a TCP connection with the computer


120.10.23.21.

STEP 5: The client makes an explicit request for the Web page to the Web server
using HTTP request.

• The HTTP request is a series of lines, which, among other things, contains
two important statements, GET and HOST , as shown with our current
example,

• GET /index.htm and Host: yahoo.com

• The GET statement indicates that the index.htm file needs to be retrieved.

• The Host parameter indicates that the index file needs to be retrieved from
the domain yahoo.com.
• STEP 6: The request is handed over to the HTTP software running on
the client machine to be transmitted to the server.

• STEP 7: The HTTP software on the client now hands over the HTTP request
to the TCP/IP software running on the client.

• STEP 8: The TCP/IP software running on the client breaks the HTTP
request into packets and sends them over TCP to the Web server (in this
case, yahoo.com)

• STEP 9: The TCP/IP software running on the Web server reassembles the
HTTP request using the packets thus received and gives it to the HTTP
software running on the Web server.

• STEP 10: The HTTP software running on the Web server interprets the HTTP
request. It realizes that the browser has asked for the file index.htm on the
server. Therefore, it requests the operating system running on the server for
that file.

• STEP 11: The operating system on the Web server locates index.htm file and
gives it to the HTTP software running on the Web server.

STEP 12: The HTTP software running on the Web server adds some headers to
the file to form an HTTP response.

• The HTTP response is a series of lines that contains this header information
(such as date and time when the response is being sent, etc.) and the HTML
text corresponding to the requested file (in this case, index.htm ).

STEP 13: The HTTP software on the Web server now hands over this HTTP
response to the TCP/IP software running on the Web server.

STEP 14: The TCP/IP software running on the Web server breaks the HTTP
response into packets and sends it over the TCP connection to the client.

Once all packets have been transmitted correctly to the client, the TCP/IP
software on the Web server informs the HTTP software on the Web
server.

STEP 15: The TCP/IP software on the client computer checks the packets for
correctness and reassembles them to form the original Web page in the
HTML format.

It informs the HTTP software on the server that the Web page was received
correctly.

STEP 16: The HTTP software on the Web server terminates the TCP connection
between itself and the client. Therefore, HTTP is called as stateless protocol.

• The TCP connection between the client and the server is established for
every page, even if all the pages requested by the client reside on the same
server.
• HTTP does not remember anything about the previous request. It does not
maintain any information about the state—and hence the term stateless.

Keeping HTTP stateless was aimed at keeping the Web simple STEP 17: The
TCP/IP software on the client now hands over the Web page to the Web browser for
interpretation.

It is only the browser, which understands the “HTML code language” to decipher
which elements(text, photo, video) should be displayed where and how.

HTTP Commands
• GET is the most common command sent by a client browser as a part of the
HTTP request to a Web server.

• This is because not many Web servers would allow a client to


delete/add/link/unlink files. This can be fatal.

• When a browser sends such an HTTP request command to a Web server, the
server sends back a status line(indicating the success or failure, as a result of
executing that command) and additional information (which can be the Web
page itself).

The status line contains error codes. For example, a status code of 200 means
success (OK), 403 means authorization failure, etc
HTTP Commands

• This GET command requests the Web server at www.mysite.com


for a file called as information.html.
• The HTTP/1.0 portion of the command indicates that the browser uses the
1.0 version of the HTTP protocol.

• In response, the server might send the following HTTP response back to the
browser.

• The first line indicates to the browser that the server is also using HTTP 1.0
as its protocol version.

• Also,the return code of 200 means that the server processed the browser’s
HTTP request successfully.

• After that, there would be a few other parameters, which are not shown. •
After these parameters, the following lines start.
<HTML> </html>

<HEAD> </head>

<TITLE></title>

• This is a Web page codified in HTML format.

• Actual contents of the Web page are sent by the Web server to the browser
with the help of these tags.

• A tag is a HTML keyword usually enclosed between less than and greater
than symbols.

• For instance, the<HTML> statement (i.e., tag) indicates that the HTML
contents of the Web page start now.

Example of an HTTP Interaction

• In this example, the browser (i.e., the client) retrieves a HTML document from
the Web server.

• We shall assume that the TCP connection in between the client and the
server is already established

• The client sends a GET command to retrieve an image with the


path /files/new/ image1.

That is, the name of the file is image1, and it is stored in the files/new directory
of the Web server.

• Instead, the Web browser could have, of course, requested for an HTML page
(i.e., a file with html extension).

• In response, the Web server sends an appropriate return code of 200, which
means that the request was successfully processed, and also the image data,
as requested.

• The browser sends a request with the GET command, as discussed. It also
sends two more parameters by using two Accept commands.

• These parameters specify that the browser is capable of handling images


in the GIF and JPEG format.

• Therefore, the server should send the image file only if it is in one of these
formats.

• In response, the server sends a return code of 200 (OK).

• It also sends the information about the date and time when this response was
sent back to the browser.
• The server’s name is the same as the domain name.

• Finally, the server indicates that it is sending 3010 bytes of data (i.e., the image
file is made up of bits equivalent to 3010 bytes). This is followed by the actual
data of the image file (not shown in the figure).

Proxy Server

• Disadvantage of web server: Not all web server understands HTTP


protocol.

• To solve this issue, a proxy server is used as an interpreter between the web
browser and the web server for transforming a non-HTTP protocol to HTTP
and vice-versa.

• Usually, the proxy server is installed on a dedicated computer in an organization, and


the organization’s connection to the internet is directed via the proxy.

• Every user’s internet connection passes via the proxy.

• It also helps for authentication and access control.

• Performs the function of caching


• Cashing: Keeping in main memory all the web pages that have been requested
by all the users, so that the same cached web page can be sent for repeated
requests for the same page from different users.
Local information on the internet

• The amount of information available is mind-bogglilng.

• Finding the right kind of information is very cumbersome.

• To search the required information easily, the concept of a search engine


came into being.

• A search engine is also referred as crawler, worm or knowbot (knowledge


robot)

• Google, Altavista, Yahoo and Infoseek are some of the most popular search
engine.

HTML – Hyper Text Markup Language

Introduction

• To create web pages a special language called Hyper Text Markup


Language (HTML) is used.

• Initially HTML files contained only text.

• Later HTML allowed graphics, sound and video.

• Is interesting and interactive.

• Basically, HTML is used to specify where and how to display headings, where
new paragraph starts, which text to display in which font and color etc.

• HTML uses tags to define the contents of the web pages.

• Tags are enclosed in angle brackets.

• Example, <p> tag indicates the beginning of a new paragraph.

• Most tags ends with the corresponding </> tag. The <p> tag would end with a
</p> tag.

• As an another example, the tag pair <B> and </B> can be used to change
the text font to boldface. This is shown in Figure 6.8

• When a browser comes across this portion of HTML document, it realizes that
the portion of the text embedded within <B> and </B> tags need to be
displayed in boldface. Therefore it displays the test in boldface, as shown in
Figure 6.9
HTML Example
• The HTML code in the page are as follows,

• <HTML> tag indicates the beginning of an HTML page and </HTML> tag
indicates the end of the HTML page.

• <BODY> tag indicates the beginning of the contents of the page and
</BODY> tag indicates the end of the contents of the page.

• <H1> tag indicates that the text starting here is a heading and should be
displayed in different font until the closing tag </H1> is found.

HTML Tags Summary


Hyperlinks
• The anchor tag can be used to create a link to another document. This is
called as hyperlink or Uniform Resource Locator (URL).

• The tag causes some text to be displayed as underlined.

• If we click on that text in the Web browser, our browser opens the site/page
that the hyperlink refers to.

• The tag used is <a>. The general syntax for doing this is as follows.
<a href=”url”>Text to be displayed</a>
Here,

A = Create an anchor

Href = Target URL

Text = Text to be displayed as substitute for the URL

For example, if we code the following in our HTML page:

<a href=”http://www.yahoo.com/”>Visit Yahoo!</a>

The result is visit Yahoo!

Web Browser Architecture

• Web browsers have a more complex structure than the Web servers.

• It is the responsibility of the browser to display the document on the user’s


screen when it receives it from the server.

• As a result, a browser consists of several large software components that


work together that provide an abstracted view of a seamless service.
• A browser contains some pieces of software that are mandatory and some
that are optional depending upon the usage.

• HTTP client program shown in the above figure as (2) and HTML interpreter
program (3) are mandatory.

• Some other interpreter programs as in (4), Java interpreter program (5) and
other optional interpreter program (6) are optional.

• The browser also has a controller, shown as (1), which manages all of them. •
The controller is like the control unit in a computer’s CPU.
• It interprets both mouse clicks/selections and keyboard inputs.

• Based on these inputs, it calls the rest of the browser’s components to


perform the specific tasks.
• For instance, when a user types a URL, the controller calls the HTTP client
program to fetch the requested Web page from a remote Web server whose
address is given by the URL.

When the Web page is received, the controller calls the HTML interpreter to interpret
the tags and display the Web page on the screen.

• The HTML interpreter takes an HTML document as input and produces a


formatted version of it for displaying it on the screen.

• For this, it interprets the various HTML tags and translates them into display
commands based on the display hardware in the user’s computer.

For instance, when the interpreter sees a tag to make the text bold, it instructs the
display hardware to display the text in the bold format.

Optional Clients

• Apart from the HTTP client and an HTML interpreter, a browser can contain
additional clients.

• For supporting FTP and Email applications, a browser contains FTP and
email client programs.

• These enable the browsers to perform FTP and email services. •


The browser invokes them automatically on behalf of the user.
• Example: When a user select the link for email and FTP the controller of the
browser automatically invoke the email client program and the FTP client
program respectively.
Invoking email Client from within a Browser

• The mailto protocol allows the invocation of an email client program through
the browser.

• For this the HTML page must specify the mailto tag.

• For this anchor tag has to be used along with the mailto: protocol indicator
and email address.

• For example, if john has created his personal website, he might include link in
the HTML page to send an email to the id John@hotmail.com like this:

<A HREF=mailto:John@hotmail.com>Send me an email!</A>

It will be displayed as Send me an email!

• It will invoke the mailto protocol, email client protocol and inserts the address
in the From and To fields.
• The user can enter the subject and email text and send it.

Invoking the FTP Application from within a Browser

• The FTP protocol is used here.

• The file that has to be downloaded from the server has to be specified.

• The controller in the browser uses the first field of the URL to determine which
client application(e.g, HTTP, FTP) has to be invoked.

• If the first field of the URL is HTTP, it passes control to the HTTP client, which
in turn, sends a request for a web page to the web server.

• If this field is FTP, the controller calls the FTP client program. •
For Example: consider the following portion of the web page:
Our website contains many excellent documents. For Example, <BR> <A
HREF=ftp://ftp.yankee.com/books/Internet.doc> A document on the internet</A> •
As a result, the browser displays the following:
Our website contains many excellent documents. For Example, A document on the
internet.

Web Pages and Multimedia

• A web page can also contain graphics.

• Non-textual information such as a graphics image or a digitized picture is not


stored as part of the HTML document.

• They reside as separate files at separate locations on the web server.


• There are references to them from the original document, thus establishing a
link between them.

• When a browser retrieves a web page and encounters such a reference to an


image or a picture, it follows the specified link mentioned in the HTML
document and goes to the location on the web server where the image is
actually stored.

• It uses HTTP protocol for each such file to establish and break the link
between the client and web server each time.

• The browser then obtains a copy of the image and inserts it in the place of the
link in the displayed document on the user’s computer screen.

• For linking the image with an HTML document, the IMG tag is used. •
Example: <IMG SRC = “ANA.GIF”>
• The file ANA.gif is not stored as a normal text file.
• Such a file contains binary data that corresponds to the pixels in an image.

• It is transmitted in a compressed fashion and is decompressed by the


browser.

• Disadvantage: The browser needs to understand multiple compression


algorithm and corresponding formats.

• To solve this problem, the idea of plug-in was developed.

• A plug-in is a program that resides on the server along with the compressed
file.

• This program understands the format of the file and therefore has a method of
decompressing it.

• This program is also sent from the server along with the required file.

• The browser on the client uses this program to interpret the file, decompress it
and to display it.

• Once a plug-in program is sent it is stored there, and need not be sent again
and again.

• If a web page contains an image, audio and video there will be three hyperlinks
and the client will make three connection to the server using HTTP protocol.

• It also uses corresponding plug-ins.

• That is why web pages with many heavy multimedia tend to be slow.

• To improve the performance, a good web page design recommends that very
heavy multimedia contents should be avoided.
• Sample HTML code and the corresponding results of embedding an image file
in a web page is shown in fig 6.18.

<html>

<head>

<title>IMAGE EXAMPLE</title>

</head>

<body>

<p>An image is shown below</p>

<p>For this, the <b>IMG SRC</b> is used.</p>

<p>Here is the output</p>

<p><a href=“http://images/gift”></a></p>
</body>

</html>
Fig 6.18 HTML Code for Displaying an Image

Remote Login (TELNET).

• A remote login facility permits a user who is using one computer to interact with a
program on another computer. The service extends the login concept used by
conventional timesharing computer systems to permit access to a remote
timesharing system.

• Remote Login is a process in which user can login into remote site i.e.
computer and use services that are available on the remote computer. With
the help of remote login a user is able to understand result of transferring
result of processing from the remote computer to the local computer.

It is implemented using Telnet.

Procedure of Remote Login :

1. When the user types something on local computer, then local operating
system accepts character.
2. Local computer does not interpret the characters, it will send them to
TELNET client.
3. TELNET client transforms these characters to a universal character set
called Network Virtual Terminal (NVT) characters and it will pass them
to the local TCP/IP protocol Stack.
4. Commands or text which is in the form of NVT, travel through Internet
and it will arrive at the TCP/IP stack at remote computer.
5. Characters are then delivered to operating system and which later on
passed to TELNET server.
6. Then TELNET server changes that characters to characters which can
be understandable by remote computer.
7. Remote operating system receives character from a pseudo-terminal
driver, which is a piece of software that pretends that characters are
coming from a terminal.
8. Operating system then passes character to the appropriate application
program.
NVT Character Set :

• With NVT Character set, TELNET client translates characters into NVT
form and deliver to network.
• TELNET server translates data and commands from NVT form to the
other form that will be understandable by remote computer.
• NVT uses 2 sets of characters, one for data and other for control. Size of
both characters is 8-bit bytes.
• For data, NVT is an 8-bit character set in which 7 lowest bits are same as
ASCII and highest order bit is 0.
• For control characters, NVT uses an 8-bit character set in which the
highest bit is set to 1.

You might also like