World Wide Web: Uniform Resource Locators (URL)
World Wide Web: Uniform Resource Locators (URL)
World Wide Web: Uniform Resource Locators (URL)
Web consists of a large set of documents, called Web pages, that are accessible over the Internet. Each Web page is classified as a hypermedia document.
The suffix media is used to indicate that a document can contain items other than text (e.g., graphics images); the prefix hyper is used because a document can contain selectable links that refer to other, related documents.
Two main building blocks are used to implement the Web on top of the global Internet. Web browser, Web server Pages that contain a mixture of text and other items are represented using HyperText Markup Language (HTML). An HTML document consists of a file that contains text along with embedded commands, called tags, that give guidelines for display.
Each Web page is assigned a unique name that is used to identify it. The name,which is called a Uniform Resource Locator (URL), begins with a specification of the scheme used to access the item.
http://hostname[:port]/path[; parameters][?query]
URLs
HTTP
HTTP characteristics
http://discovery.bits-pilani.ac.in/index.html
http://bitsaa.bitspilani.ac.in/bitsaa.bits?l=campusnews /campusnews.bits
/arcd/arc_nucleus.htm
HTTP is the protocol that supports communication between web browsers and web servers. HTTP is an application-level protocol with the lightness and speed necessary for distributed, hypermedia information systems The RFC states that the HTTP protocol generally takes place over a TCP connection, but the protocol itself is not dependent on a specific transport layer.
Request - Response
HTTP Versions
The well known TCP port for HTTP servers is port 80. Other ports can be used as well...
The original version now goes by the name HTTP Version 0.9
HTTP 0.9 was used for many years.
HTTP can support multiple requestreply exchanges over a single TCP connection.
Starting with HTTP 1.0 the version number is part of every request.
tells the server what version the client can talk (what options are supported, etc).
Request Line
Method URI HTTP-Version\r\n
Request Method
Lines of text (ASCII). Lines end with CRLF \r\n First line is called Request-Line
The request line contains 3 tokens (words). space characters separate the tokens. Newline (\n) seems to work by itself (but the protocol requires CRLF)
Methods
Methods (cont.)
More Methods
TRACE: used to trace HTTP forwarding through proxies, tunnels, etc. OPTIONS: used to determine the capabilities of the server, or characteristics of a named resource.
GET: retrieve information identified by the URI. HEAD: retrieve meta-information about the URI. POST: send information to a URI and retrieve result.
Headers
Request Headers provide information to the server about the client
what kind of client what kind of content will be accepted who is making the request
There can be 0 headers (HTTP 1.0) HTTP 1.1 requires a Host: header
POST
A POST request includes some content (some data) after the headers (after the blank line). There is no format for the data (just raw bytes). A POST request must include a ContentLength line in the headers:
Content-length: 267
Each header ends with a CRLF ( \r\n ) The end of the header section is marked with a blank line.
just CRLF
For GET and HEAD requests, the end of the headers is the end of the request!
HTTP Response
Status-Line Headers . . .
blank line
Content...
idno=2007A1PS001&item=test1&name=Krishna
Status Codes
1xx Informational 2xx Success 3xx Redirection 4xx Client Error 5xx Server Error
Status-Code
Message
Status Code is 3 digit number (for computers) Message is text (for humans)
Response Headers
Content
Provide the client with information about the returned entity (document).
what kind of document how big the document is how the document is encoded when the document was last modified
Single Request/Reply
The client sends a complete request. The server sends back the entire reply. The server closes its socket.
Persistent Connections
HTTP 1.1 supports persistent connections (this is the default). Multiple requests can be handled over a single TCP connection. The Connection: header is used to exchange information about persistence (HTTP/1.1) 1.0 Clients used a Keep-alive: header
The chief advantage of persistent connections lies in reduced overhead A browser using a persistent connection can further optimize by pipelining requests (i.e., send requests back-to-back without waiting for a response). The chief disadvantage of using a persistent connection lies in the need to identify the beginning and end of each item sent over the connection. There are two possible techniques that handle the situation:
either send a length followed by the item or send a sentinel value after the item to mark the end.
to avoid ambiguity between sentinel values and data, HlTP uses the approach of sending a length followed by an item of that size.
It may not be convenient or even possible for a server to know the length of an item before sending. Servers use the Common Gateway Interface (CGI) mechanism to create dynamic documents. To provide for dynamic Web pages, the HTTP standard specifies that if the server does not know the length of an item a priori, the server can inform the browser that it will close the connection after transmitting the item
Conditional Requests
Proxy Server
Security by filtering Performance by Caching
Browser
Proxy
HTTP Server