Living Standard — Last Updated 1 November 2024
A string is a valid non-empty URL if it is a valid URL string but it is not the empty string.
A string is a valid URL potentially surrounded by spaces if, after stripping leading and trailing ASCII whitespace from it, it is a valid URL string.
A string is a valid non-empty URL potentially surrounded by spaces if, after stripping leading and trailing ASCII whitespace from it, it is a valid non-empty URL.
This specification defines the URL about:legacy-compat
as a reserved,
though unresolvable, about:
URL, for use in DOCTYPEs in HTML documents when needed for
compatibility with XML tools. [ABOUT]
This specification defines the URL about:html-kind
as a reserved,
though unresolvable, about:
URL, that is used as an
identifier for kinds of media tracks. [ABOUT]
This specification defines the URL about:srcdoc
as a reserved, though
unresolvable, about:
URL, that is used as the URL of iframe
srcdoc
documents.
[ABOUT]
The fallback base URL of a Document
object document is the
URL record obtained by running these steps:
If document is an iframe
srcdoc
document, then:
Assert: document's about base URL is non-null.
Return document's about base URL.
If document's URL matches
about:blank
and document's about base URL is non-null, then return
document's about base URL.
Return document's URL.
The document base URL of a Document
object is the
URL record obtained by running these steps:
If there is no base
element that has an href
attribute in the Document
, then return the
Document
's fallback base URL.
Otherwise, return the frozen base URL of the first base
element
in the Document
that has an href
attribute, in
tree order.
A URL matches about:blank
if its scheme is "about
", its path contains a single string "blank
", its
username and password are the empty string, and its host is null.
Such a URL's query and fragment can be non-null. For example, the URL
record created by parsing "about:blank?foo#bar
" matches about:blank
.
A URL matches about:srcdoc
if its scheme is "about
", its path contains a single string "srcdoc
",
its query is null, its username and password are the empty string, and its host is null.
The reason that matches about:srcdoc
ensures that the
URL's query is null is because it is not
possible to create an iframe
srcdoc
document whose URL has a non-null query, unlike Document
s whose URL matches about:blank
. In other
words, the set of all URLs that match
about:srcdoc
only vary in their fragment.
Parsing a URL is the process of taking a string and obtaining the URL record that it represents. While this process is defined in URL, the HTML standard defines several wrappers to abstract base URLs and encodings. [URL]
Most new APIs are to use parse a URL. Older APIs and HTML elements might have reason to use encoding-parse a URL. When a custom base URL is needed or no base URL is desired, the URL parser can of course be used directly as well.
To parse a URL, given a string url, relative to a
Document
object or environment settings object environment,
run these steps. They return failure or a URL.
Let baseURL be environment's base
URL, if environment is a Document
object; otherwise
environment's API base URL.
Return the result of applying the URL parser to url, with baseURL.
To encoding-parse a URL,
given a string url, relative to a Document
object or environment
settings object environment, run these steps. They return failure or a
URL.
Let encoding be UTF-8.
If environment is a Document
object, then set encoding
to environment's character
encoding.
Otherwise, if environment's relevant global object is a
Window
object, set encoding to environment's relevant
global object's associated
Document
's character
encoding.
Let baseURL be environment's base
URL, if environment is a Document
object; otherwise
environment's API base URL.
Return the result of applying the URL parser to url, with baseURL and encoding.
To encoding-parse-and-serialize a
URL, given a string url, relative to a Document
object or
environment settings object environment, run these steps. They return
failure or a string.
Let url be the result of encoding-parsing a URL given url, relative to environment.
If url is failure, then return failure.
Return the result of applying the URL serializer to url.
When a document's document base URL changes, all elements in that document are affected by a base URL change.
The following are base URL change steps, which run when an element is affected by a base URL change (as defined by DOM):
If the URL identified by the hyperlink is being shown to the user, or if any
data derived from that URL is affecting the display, then the href
attribute's value should be reparsed, relative to the element's node
document and the UI updated appropriately.
For example, the CSS :link
/:visited
pseudo-classes
might have been affected.
If the hyperlink has a ping
attribute and its
URL(s) are being shown to the user, then the ping
attribute's tokens should be reparsed, relative to the element's node
document and the UI updated appropriately.
q
, blockquote
, ins
, or
del
element with a cite
attributeIf the URL identified by the cite
attribute is being
shown to the user, or if any data derived from that URL is affecting the display,
then the cite
attribute's value should be reparsed, relative to the element's node document and the UI updated
appropriately.
The element is not directly affected.
For instance, changing the base URL doesn't affect the image displayed by
img
elements, although subsequent accesses of the src
IDL attribute from script will return a new absolute
URL that might no longer correspond to the image being shown.
A response whose type is "basic
", "cors
", or "default
" is CORS-same-origin.
[FETCH]
A response whose type is "opaque
" or "opaqueredirect
" is CORS-cross-origin.
A response's unsafe response is its internal response if it has one, and the response itself otherwise.
To create a potential-CORS request, given a url, destination, corsAttributeState, and an optional same-origin fallback flag, run these steps:
Let mode be "no-cors
" if corsAttributeState
is No CORS, and "cors
"
otherwise.
If same-origin fallback flag is set and mode is "no-cors
", set mode to "same-origin
".
Let credentialsMode be "include
".
If corsAttributeState is Anonymous, set credentialsMode to "same-origin
".
Let request be a new request whose URL is url, destination is destination, mode is mode, credentials mode is credentialsMode, and whose use-URL-credentials flag is set.
The Content-Type metadata of a resource must be obtained and interpreted in a manner consistent with the requirements of MIME Sniffing. [MIMESNIFF]
The computed MIME type of a resource must be found in a manner consistent with the requirements given in MIME Sniffing. [MIMESNIFF]
The rules for sniffing images specifically, the rules for distinguishing if a resource is text or binary, and the rules for sniffing audio and video specifically are also defined in MIME Sniffing. These rules return a MIME type as their result. [MIMESNIFF]
It is imperative that the rules in MIME Sniffing be followed exactly. When a user agent uses different heuristics for content type detection than the server expects, security problems can occur. For more details, see MIME Sniffing. [MIMESNIFF]
meta
elementsThe algorithm for extracting a character encoding from a meta
element,
given a string s, is as follows. It either returns a character encoding or
nothing.
Let position be a pointer into s, initially pointing at the start of the string.
Loop: Find the first seven characters in s after position that are an ASCII case-insensitive match for the word "charset
". If no such match is found, return nothing.
Skip any ASCII whitespace that immediately follow the word "charset
" (there might not be any).
If the next character is not a U+003D EQUALS SIGN (=), then move position to point just before that next character, and jump back to the step labeled loop.
Skip any ASCII whitespace that immediately follow the equals sign (there might not be any).
Process the next character as follows:
This algorithm is distinct from those in the HTTP specifications (for example, HTTP doesn't allow the use of single quotes and requires supporting a backslash-escape mechanism that is not supported by this algorithm). While the algorithm is used in contexts that, historically, were related to HTTP, the syntax as supported by implementations diverged some time ago. [HTTP]