Nothing Special   »   [go: up one dir, main page]

World Wide Web: Uniform Resource Locators (URL)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

World Wide Web

Uniform Resource Locators (URL)

HTTP Hypertext Transfer Protocol


RFC 1945 (HTTP 1.0) RFC 2616 (HTTP 1.1)

Web consists of a large set of documents, called Web pages, that are accessible over the Internet. Each Web page is classified as a hypermedia document.
The suffix media is used to indicate that a document can contain items other than text (e.g., graphics images); the prefix hyper is used because a document can contain selectable links that refer to other, related documents.

Two main building blocks are used to implement the Web on top of the global Internet. Web browser, Web server Pages that contain a mixture of text and other items are represented using HyperText Markup Language (HTML). An HTML document consists of a file that contains text along with embedded commands, called tags, that give guidelines for display.

Each Web page is assigned a unique name that is used to identify it. The name,which is called a Uniform Resource Locator (URL), begins with a specification of the scheme used to access the item.
http://hostname[:port]/path[; parameters][?query]

URLs

HTTP

HTTP characteristics

http://discovery.bits-pilani.ac.in/index.html
http://bitsaa.bitspilani.ac.in/bitsaa.bits?l=campusnews /campusnews.bits

http://www.bitspilani.ac.in:12349/Default .aspx Relative URLs:

/arcd/arc_nucleus.htm

HTTP is the protocol that supports communication between web browsers and web servers. HTTP is an application-level protocol with the lightness and speed necessary for distributed, hypermedia information systems The RFC states that the HTTP protocol generally takes place over a TCP connection, but the protocol itself is not dependent on a specific transport layer.

Application Level. Request/Response Stateless.


Each H'ITP request is self-contained; the server does not keep a history of previous requests or previous sessions.

Bi-Directional Transfer Capability Negotiation Support For Caching


To improve response time, a browser caches a copy of each Web page it retrieves. If a user requests a page again, HTTP allows the browser to interrogate the server to determine whether the contents of the page has changed since the copy was cached.

Support For Intermediaries.


HTTP allows a machine along the path between a browser and a server to act as a proxy server that caches Web

Request - Response

Well Known Address

HTTP Versions

HTTP has a simple structure:


client sends a request server returns a reply.

The well known TCP port for HTTP servers is port 80. Other ports can be used as well...

The original version now goes by the name HTTP Version 0.9
HTTP 0.9 was used for many years.

HTTP can support multiple requestreply exchanges over a single TCP connection.

Starting with HTTP 1.0 the version number is part of every request.
tells the server what version the client can talk (what options are supported, etc).

HTTP 1.0+ Request

Request Line
Method URI HTTP-Version\r\n

Request Method

Lines of text (ASCII). Lines end with CRLF \r\n First line is called Request-Line

The Request Method can be:


GET HEAD POST DELETE OPTIONS PUT TRACE

The request line contains 3 tokens (words). space characters separate the tokens. Newline (\n) seems to work by itself (but the protocol requires CRLF)

future expansion is supported

Methods

Methods (cont.)

More Methods
TRACE: used to trace HTTP forwarding through proxies, tunnels, etc. OPTIONS: used to determine the capabilities of the server, or characteristics of a named resource.

GET: retrieve information identified by the URI. HEAD: retrieve meta-information about the URI. POST: send information to a URI and retrieve result.

PUT: Store information in location named by URI.

DELETE: remove entity identified by URI.

HTTP Version Number


HTTP/1.0 or HTTP/1.1 HTTP 0.9 did not include a version number in a request line. If a server gets a request line with no HTTP version number, it assumes 0.9

The Header Lines


After the Request-Line come a number (possibly zero) of HTTP header lines. Each header line contains an attribute name followed by a : followed by a space and the attribute value.
The Name and Value are just text.

Headers
Request Headers provide information to the server about the client
what kind of client what kind of content will be accepted who is making the request

There can be 0 headers (HTTP 1.0) HTTP 1.1 requires a Host: header

Example HTTP Headers


GET /about.html HTTP/1.1 Host: www.bits-pilani.ac.in //must in HTTP 1.1 Connection: Keep-Alive User-Agent: Mozilla/4.06 [en] (X11; U; Linux 2.1.121 i686) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,utf-8 <blank line>

End of the Headers

POST
A POST request includes some content (some data) after the headers (after the blank line). There is no format for the data (just raw bytes). A POST request must include a ContentLength line in the headers:
Content-length: 267

Each header ends with a CRLF ( \r\n ) The end of the header section is marked with a blank line.
just CRLF

For GET and HEAD requests, the end of the headers is the end of the request!

Example POST Request


POST /about.html HTTP/1.1 Host: www.bits-pilani.ac.in //must in HTTP 1.1 Connection: Keep-Alive User-Agent: Mozilla/4.06 [en] (X11; U; Linux 2.1.121 i686) Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,utf-8 Content Length:35

Typical Method Usage


GET used to retrieve an HTML document. HEAD used to find out if a document has changed. POST used to submit a form.

HTTP Response

ASCII Status Line Headers Section

Status-Line Headers . . .
blank line

Content...

Content can be anything (not just text)


typically an HTML document or some kind of image.

idno=2007A1PS001&item=test1&name=Krishna

Response Status Line


HTTP-Version

Status Codes
1xx Informational 2xx Success 3xx Redirection 4xx Client Error 5xx Server Error

Example Status Lines


HTTP/1.0 200 OK HTTP/1.0 301 Moved Permanently HTTP/1.0 400 Bad Request HTTP/1.0 500 Internal Server Error

Status-Code

Message

Status Code is 3 digit number (for computers) Message is text (for humans)

Response Headers

Response Header Examples


Date: Sat, 30 Jan 2010 12:48:17 IST Server: Apache/1.17 Content-Type: text/html Content-Length: 1756 //len of content that arrives after headers Content-Encoding: gzip

Content

Provide the client with information about the returned entity (document).
what kind of document how big the document is how the document is encoded when the document was last modified

Content can be anything (sequence of raw bytes).

Content-Length header is required for any response that includes content.


Content-Type header also required.

Response headers end with blank line

Single Request/Reply
The client sends a complete request. The server sends back the entire reply. The server closes its socket.

Persistent Connections
HTTP 1.1 supports persistent connections (this is the default). Multiple requests can be handled over a single TCP connection. The Connection: header is used to exchange information about persistence (HTTP/1.1) 1.0 Clients used a Keep-alive: header

Persistent Connections And Lengths


In HTTP 1.0, a client opens a TCP connection and sends a GET request. The server transmits a copy of the requested item, and then closes the TCP connection. Until it encounters an end of file condition, the client reads data from the TCP connection. Finally, the client closes its end of the connection.

If the client needs another document it must open a new connection.

This was the default for HTTP 1.0

Persistent Connections And Lengths


Data Length And Program Output

Data Length And Program Output


HTTP/1.1 200 OK Server: Microsoft-IIS/5.0 Date: Fri, 08 Oct 2010 05:08:14 GMT Connection: close Content-Type: text/html

The chief advantage of persistent connections lies in reduced overhead A browser using a persistent connection can further optimize by pipelining requests (i.e., send requests back-to-back without waiting for a response). The chief disadvantage of using a persistent connection lies in the need to identify the beginning and end of each item sent over the connection. There are two possible techniques that handle the situation:
either send a length followed by the item or send a sentinel value after the item to mark the end.

to avoid ambiguity between sentinel values and data, HlTP uses the approach of sending a length followed by an item of that size.

It may not be convenient or even possible for a server to know the length of an item before sending. Servers use the Common Gateway Interface (CGI) mechanism to create dynamic documents. To provide for dynamic Web pages, the HTTP standard specifies that if the server does not know the length of an item a priori, the server can inform the browser that it will close the connection after transmitting the item

Conditional Requests

HTTP Proxy Server


Proxy Server
Security by filtering Performance by Caching

HlTP allows a sender to make a request conditional For example


If-Modified-Since: Sat, 01 Jan 2000 05:00:01 GMT

Browser

Proxy

HTTP Server

You might also like