spec.txt | spec.txt | |||
---|---|---|---|---|
--- | --- | |||
title: CommonMark Spec | title: CommonMark Spec | |||
author: John MacFarlane | author: John MacFarlane | |||
version: 0.21 | version: 0.22 | |||
date: 2015-07-14 | date: 2015-08-23 | |||
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' | |||
... | ... | |||
# Introduction | # Introduction | |||
## What is Markdown? | ## What is Markdown? | |||
Markdown is a plain text format for writing structured documents, | Markdown is a plain text format for writing structured documents, | |||
based on conventions used for indicating formatting in email and | based on conventions used for indicating formatting in email and | |||
usenet posts. It was developed in 2004 by John Gruber, who wrote | usenet posts. It was developed in 2004 by John Gruber, who wrote | |||
skipping to change at line 207 | skipping to change at line 207 | |||
In the examples, the `→` character is used to represent tabs. | In the examples, the `→` character is used to represent tabs. | |||
# Preliminaries | # Preliminaries | |||
## Characters and lines | ## Characters and lines | |||
Any sequence of [character]s is a valid CommonMark | Any sequence of [character]s is a valid CommonMark | |||
document. | document. | |||
A [character](@character) is a unicode code point. | A [character](@character) is a Unicode code point. Although some | |||
code points (for example, combining accents) do not correspond to | ||||
characters in an intuitive sense, all code points count as characters | ||||
for purposes of this spec. | ||||
This spec does not specify an encoding; it thinks of lines as composed | This spec does not specify an encoding; it thinks of lines as composed | |||
of characters rather than bytes. A conforming parser may be limited | of [character]s rather than bytes. A conforming parser may be limited | |||
to a certain encoding. | to a certain encoding. | |||
A [line](@line) is a sequence of zero or more [character]s | A [line](@line) is a sequence of zero or more [character]s | |||
other than newline (`U+000A`) or carriage return (`U+000D`), | ||||
followed by a [line ending] or by the end of file. | followed by a [line ending] or by the end of file. | |||
A [line ending](@line-ending) is a newline (`U+000A`), carriage return | A [line ending](@line-ending) is a newline (`U+000A`), a carriage return | |||
(`U+000D`), or carriage return + newline. | (`U+000D`) not followed by a newline, or a carriage return and a | |||
following newline. | ||||
A line containing no characters, or a line containing only spaces | A line containing no characters, or a line containing only spaces | |||
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). | |||
The following definitions of character classes will be used in this spec: | The following definitions of character classes will be used in this spec: | |||
A [whitespace character](@whitespace-character) is a space | A [whitespace character](@whitespace-character) is a space | |||
(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), | |||
form feed (`U+000C`), or carriage return (`U+000D`). | form feed (`U+000C`), or carriage return (`U+000D`). | |||
[Whitespace](@whitespace) is a sequence of one or more [whitespace | [Whitespace](@whitespace) is a sequence of one or more [whitespace | |||
character]s. | character]s. | |||
A [unicode whitespace character](@unicode-whitespace-character) is | A [Unicode whitespace character](@unicode-whitespace-character) is | |||
any code point in the unicode `Zs` class, or a tab (`U+0009`), | any code point in the Unicode `Zs` class, or a tab (`U+0009`), | |||
carriage return (`U+000D`), newline (`U+000A`), or form feed | carriage return (`U+000D`), newline (`U+000A`), or form feed | |||
(`U+000C`). | (`U+000C`). | |||
[Unicode whitespace](@unicode-whitespace) is a sequence of one | [Unicode whitespace](@unicode-whitespace) is a sequence of one | |||
or more [unicode whitespace character]s. | or more [Unicode whitespace character]s. | |||
A [space](@space) is `U+0020`. | A [space](@space) is `U+0020`. | |||
A [non-whitespace character](@non-space-character) is any character | A [non-whitespace character](@non-whitespace-character) is any character | |||
that is not a [whitespace character]. | that is not a [whitespace character]. | |||
An [ASCII punctuation character](@ascii-punctuation-character) | An [ASCII punctuation character](@ascii-punctuation-character) | |||
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, | |||
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, | |||
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. | |||
A [punctuation character](@punctuation-character) is an [ASCII | A [punctuation character](@punctuation-character) is an [ASCII | |||
punctuation character] or anything in | punctuation character] or anything in | |||
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. | |||
## Tabs | ## Tabs | |||
Tabs in lines are not expanded to [spaces][space]. However, | Tabs in lines are not expanded to [spaces][space]. However, | |||
in contexts where indentation is significant for the | in contexts where indentation is significant for the | |||
document's structure, tabs behave as if they were replaced | document's structure, tabs behave as if they were replaced | |||
by spaces with a tab stop of 4 characters. | by spaces with a tab stop of 4 characters. | |||
. | . | |||
→foo→baz→→bim | →foo→baz→→bim | |||
skipping to change at line 303 | skipping to change at line 309 | |||
. | . | |||
. | . | |||
>→foo→bar | >→foo→bar | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>foo→bar</p> | <p>foo→bar</p> | |||
</blockquote> | </blockquote> | |||
. | . | |||
. | ||||
foo | ||||
→bar | ||||
. | ||||
<pre><code>foo | ||||
bar | ||||
</code></pre> | ||||
. | ||||
## Insecure characters | ## Insecure characters | |||
For security reasons, the Unicode character `U+0000` must be replaced | For security reasons, the Unicode character `U+0000` must be replaced | |||
with the replacement character (`U+FFFD`). | with the replacement character (`U+FFFD`). | |||
# Blocks and inlines | # Blocks and inlines | |||
We can think of a document as a sequence of | We can think of a document as a sequence of | |||
[blocks](@block)---structural elements like paragraphs, block | [blocks](@block)---structural elements like paragraphs, block | |||
quotations, lists, headers, rules, and code blocks. Some blocks (like | quotations, lists, headers, rules, and code blocks. Some blocks (like | |||
skipping to change at line 564 | skipping to change at line 579 | |||
<hr /> | <hr /> | |||
</li> | </li> | |||
</ul> | </ul> | |||
. | . | |||
## ATX headers | ## ATX headers | |||
An [ATX header](@atx-header) | An [ATX header](@atx-header) | |||
consists of a string of characters, parsed as inline content, between an | consists of a string of characters, parsed as inline content, between an | |||
opening sequence of 1--6 unescaped `#` characters and an optional | opening sequence of 1--6 unescaped `#` characters and an optional | |||
closing sequence of any number of `#` characters. The opening sequence | closing sequence of any number of unescaped `#` characters. | |||
of `#` characters cannot be followed directly by a | The opening sequence of `#` characters cannot be followed directly by a | |||
[non-whitespace character]. The optional closing sequence of `#`s must be | [non-whitespace character]. The optional closing sequence of `#`s must be | |||
preceded by a [space] and may be followed by spaces only. The opening | preceded by a [space] and may be followed by spaces only. The opening | |||
`#` character may be indented 0-3 spaces. The raw contents of the | `#` character may be indented 0-3 spaces. The raw contents of the | |||
header are stripped of leading and trailing spaces before being parsed | header are stripped of leading and trailing spaces before being parsed | |||
as inline content. The header level is equal to the number of `#` | as inline content. The header level is equal to the number of `#` | |||
characters in the opening sequence. | characters in the opening sequence. | |||
Simple headers: | Simple headers: | |||
. | . | |||
skipping to change at line 697 | skipping to change at line 712 | |||
. | . | |||
Spaces are allowed after the closing sequence: | Spaces are allowed after the closing sequence: | |||
. | . | |||
### foo ### | ### foo ### | |||
. | . | |||
<h3>foo</h3> | <h3>foo</h3> | |||
. | . | |||
A sequence of `#` characters with a | A sequence of `#` characters with anything but [space]s following it | |||
[non-whitespace character] following it | ||||
is not a closing sequence, but counts as part of the contents of the | is not a closing sequence, but counts as part of the contents of the | |||
header: | header: | |||
. | . | |||
### foo ### b | ### foo ### b | |||
. | . | |||
<h3>foo ### b</h3> | <h3>foo ### b</h3> | |||
. | . | |||
The closing sequence must be preceded by a space: | The closing sequence must be preceded by a space: | |||
skipping to change at line 1637 | skipping to change at line 1651 | |||
5. **Start condition:** line begins with the string | 5. **Start condition:** line begins with the string | |||
`<![CDATA[`.\ | `<![CDATA[`.\ | |||
**End condition:** line contains the string `]]>`. | **End condition:** line contains the string `]]>`. | |||
6. **Start condition:** line begins the string `<` or `</` | 6. **Start condition:** line begins the string `<` or `</` | |||
followed by one of the strings (case-insensitive) `address`, | followed by one of the strings (case-insensitive) `address`, | |||
`article`, `aside`, `base`, `basefont`, `blockquote`, `body`, | `article`, `aside`, `base`, `basefont`, `blockquote`, `body`, | |||
`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, | `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, | |||
`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, | `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, | |||
`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, | `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, | |||
`html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, | `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, | |||
`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`, | `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, | |||
`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`, | `section`, `source`, `summary`, `table`, `tbody`, `td`, | |||
`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed | `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed | |||
by [whitespace], the end of the line, the string `>`, or | by [whitespace], the end of the line, the string `>`, or | |||
the string `/>`.\ | the string `/>`.\ | |||
**End condition:** line is followed by a [blank line]. | **End condition:** line is followed by a [blank line]. | |||
7. **Start condition:** line begins with an [open tag] | 7. **Start condition:** line begins with a complete [open tag] | |||
(with any [tag name]) followed only by [whitespace] or the end | or [closing tag] (with any [tag name] other than `script`, | |||
of the line.\ | `style`, or `pre`) followed only by [whitespace] | |||
or the end of the line.\ | ||||
**End condition:** line is followed by a [blank line]. | **End condition:** line is followed by a [blank line]. | |||
All types of [HTML blocks] except type 7 may interrupt | All types of [HTML blocks] except type 7 may interrupt | |||
a paragraph. Blocks of type 7 may not interrupt a paragraph. | a paragraph. Blocks of type 7 may not interrupt a paragraph. | |||
(This restricted is intended to prevent unwanted interpretation | (This restriction is intended to prevent unwanted interpretation | |||
of long tags inside a wrapped paragraph as starting HTML blocks.) | of long tags inside a wrapped paragraph as starting HTML blocks.) | |||
Some simple examples follow. Here are some basic HTML blocks | Some simple examples follow. Here are some basic HTML blocks | |||
of type 6: | of type 6: | |||
. | . | |||
<table> | <table> | |||
<tr> | <tr> | |||
<td> | <td> | |||
hi | hi | |||
skipping to change at line 1851 | skipping to change at line 1866 | |||
. | . | |||
<i class="foo"> | <i class="foo"> | |||
*bar* | *bar* | |||
</i> | </i> | |||
. | . | |||
<i class="foo"> | <i class="foo"> | |||
*bar* | *bar* | |||
</i> | </i> | |||
. | . | |||
. | ||||
</ins> | ||||
*bar* | ||||
. | ||||
</ins> | ||||
*bar* | ||||
. | ||||
These rules are designed to allow us to work with tags that | These rules are designed to allow us to work with tags that | |||
can function as either block-level or inline-level tags. | can function as either block-level or inline-level tags. | |||
The `<del>` tag is a nice example. We can surround content with | The `<del>` tag is a nice example. We can surround content with | |||
`<del>` tags in three different ways. In this case, we get a raw | `<del>` tags in three different ways. In this case, we get a raw | |||
HTML block, because the `<del>` tag is on a line by itself: | HTML block, because the `<del>` tag is on a line by itself: | |||
. | . | |||
<del> | <del> | |||
*foo* | *foo* | |||
</del> | </del> | |||
skipping to change at line 2814 | skipping to change at line 2837 | |||
> foo | > foo | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>bar | <p>bar | |||
baz | baz | |||
foo</p> | foo</p> | |||
</blockquote> | </blockquote> | |||
. | . | |||
Laziness only applies to lines that would have been continuations of | Laziness only applies to lines that would have been continuations of | |||
paragraphs had they been prepended with `>`. For example, the | paragraphs had they been prepended with [block quote marker]s. | |||
`>` cannot be omitted in the second line of | For example, the `> ` cannot be omitted in the second line of | |||
``` markdown | ``` markdown | |||
> foo | > foo | |||
> --- | > --- | |||
``` | ``` | |||
without changing the meaning: | without changing the meaning: | |||
. | . | |||
> foo | > foo | |||
--- | --- | |||
. | . | |||
<blockquote> | <blockquote> | |||
<p>foo</p> | <p>foo</p> | |||
</blockquote> | </blockquote> | |||
<hr /> | <hr /> | |||
. | . | |||
Similarly, if we omit the `>` in the second line of | Similarly, if we omit the `> ` in the second line of | |||
``` markdown | ``` markdown | |||
> - foo | > - foo | |||
> - bar | > - bar | |||
``` | ``` | |||
then the block quote ends after the first line: | then the block quote ends after the first line: | |||
. | . | |||
> - foo | > - foo | |||
skipping to change at line 2857 | skipping to change at line 2880 | |||
<blockquote> | <blockquote> | |||
<ul> | <ul> | |||
<li>foo</li> | <li>foo</li> | |||
</ul> | </ul> | |||
</blockquote> | </blockquote> | |||
<ul> | <ul> | |||
<li>bar</li> | <li>bar</li> | |||
</ul> | </ul> | |||
. | . | |||
For the same reason, we can't omit the `>` in front of | For the same reason, we can't omit the `> ` in front of | |||
subsequent lines of an indented or fenced code block: | subsequent lines of an indented or fenced code block: | |||
. | . | |||
> foo | > foo | |||
bar | bar | |||
. | . | |||
<blockquote> | <blockquote> | |||
<pre><code>foo | <pre><code>foo | |||
</code></pre> | </code></pre> | |||
</blockquote> | </blockquote> | |||
skipping to change at line 2884 | skipping to change at line 2907 | |||
foo | foo | |||
``` | ``` | |||
. | . | |||
<blockquote> | <blockquote> | |||
<pre><code></code></pre> | <pre><code></code></pre> | |||
</blockquote> | </blockquote> | |||
<p>foo</p> | <p>foo</p> | |||
<pre><code></code></pre> | <pre><code></code></pre> | |||
. | . | |||
Note that in the following case, we have a paragraph | ||||
continuation line: | ||||
. | ||||
> foo | ||||
- bar | ||||
. | ||||
<blockquote> | ||||
<p>foo | ||||
- bar</p> | ||||
</blockquote> | ||||
. | ||||
To see why, note that in | ||||
```markdown | ||||
> foo | ||||
> - bar | ||||
``` | ||||
the `- bar` is indented too far to start a list, and can't | ||||
be an indented code block because indented code blocks cannot | ||||
interrupt paragraphs, so it is a [paragraph continuation line]. | ||||
A block quote can be empty: | A block quote can be empty: | |||
. | . | |||
> | > | |||
. | . | |||
<blockquote> | <blockquote> | |||
</blockquote> | </blockquote> | |||
. | . | |||
. | . | |||
skipping to change at line 3581 | skipping to change at line 3628 | |||
<pre><code>bar | <pre><code>bar | |||
</code></pre> | </code></pre> | |||
</li> | </li> | |||
<li> | <li> | |||
<pre><code>baz | <pre><code>baz | |||
</code></pre> | </code></pre> | |||
</li> | </li> | |||
</ul> | </ul> | |||
. | . | |||
A list item can begin with at most one blank line. | ||||
In the following example, `foo` is not part of the list | ||||
item: | ||||
. | ||||
- | ||||
foo | ||||
. | ||||
<ul> | ||||
<li></li> | ||||
</ul> | ||||
<p>foo</p> | ||||
. | ||||
Here is an empty bullet list item: | Here is an empty bullet list item: | |||
. | . | |||
- foo | - foo | |||
- | - | |||
- bar | - bar | |||
. | . | |||
<ul> | <ul> | |||
<li>foo</li> | <li>foo</li> | |||
<li></li> | <li></li> | |||
skipping to change at line 4814 | skipping to change at line 4876 | |||
``` | ``` | |||
. | . | |||
<pre><code class="language-foo+bar">foo | <pre><code class="language-foo+bar">foo | |||
</code></pre> | </code></pre> | |||
. | . | |||
## Entities | ## Entities | |||
With the goal of making this standard as HTML-agnostic as possible, all | With the goal of making this standard as HTML-agnostic as possible, all | |||
valid HTML entities (except in code blocks and code spans) | valid HTML entities (except in code blocks and code spans) | |||
are recognized as such and converted into unicode characters before | are recognized as such and converted into Unicode characters before | |||
they are stored in the AST. This means that renderers to formats other | they are stored in the AST. This means that renderers to formats other | |||
than HTML need not be HTML-entity aware. HTML renderers may either escape | than HTML need not be HTML-entity aware. HTML renderers may either escape | |||
unicode characters as entities or leave them as they are. (However, | Unicode characters as entities or leave them as they are. (However, | |||
`"`, `&`, `<`, and `>` must always be rendered as entities.) | `"`, `&`, `<`, and `>` must always be rendered as entities.) | |||
[Named entities](@name-entities) consist of `&` | [Named entities](@name-entities) consist of `&` + any of the valid | |||
+ any of the valid HTML5 entity names + `;`. The | HTML5 entity names + `;`. The | |||
[following document](https://html.spec.whatwg.org/multipage/entities.json) | [following document](https://html.spec.whatwg.org/multipage/entities.json) | |||
is used as an authoritative source of the valid entity names and their | is used as an authoritative source of the valid entity names and their | |||
corresponding codepoints. | corresponding code points. | |||
. | . | |||
& © Æ Ď | & © Æ Ď | |||
¾ ℋ ⅆ | ¾ ℋ ⅆ | |||
∲ ≧̸ | ∲ ≧̸ | |||
. | . | |||
<p> & © Æ Ď | <p> & © Æ Ď | |||
¾ ℋ ⅆ | ¾ ℋ ⅆ | |||
∲ ≧̸</p> | ∲ ≧̸</p> | |||
. | . | |||
[Decimal entities](@decimal-entities) | [Decimal entities](@decimal-entities) | |||
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these | |||
entities need to be recognised and transformed into their corresponding | entities need to be recognised and transformed into their corresponding | |||
unicode codepoints. Invalid unicode codepoints will be replaced by | Unicode code points. Invalid Unicode code points will be replaced by | |||
the "unknown codepoint" character (`U+FFFD`). For security reasons, | the "unknown code point" character (`U+FFFD`). For security reasons, | |||
the codepoint `U+0000` will also be replaced by `U+FFFD`. | the code point `U+0000` will also be replaced by `U+FFFD`. | |||
. | . | |||
# Ӓ Ϡ � � | # Ӓ Ϡ � � | |||
. | . | |||
<p># Ӓ Ϡ � �</p> | <p># Ӓ Ϡ � �</p> | |||
. | . | |||
[Hexadecimal entities](@hexadecimal-entities) | [Hexadecimal entities](@hexadecimal-entities) consist of `&#` + either | |||
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits | `X` or `x` + a string of 1-8 hexadecimal digits + `;`. They will also | |||
+ `;`. They will also be parsed and turned into the corresponding | be parsed and turned into the corresponding Unicode code points in the | |||
unicode codepoints in the AST. | AST. | |||
. | . | |||
" ആ ಫ | " ആ ಫ | |||
. | . | |||
<p>" ആ ಫ</p> | <p>" ആ ಫ</p> | |||
. | . | |||
Here are some nonentities: | Here are some nonentities: | |||
. | . | |||
skipping to change at line 5144 | skipping to change at line 5206 | |||
The rules given below capture all of these patterns, while allowing | The rules given below capture all of these patterns, while allowing | |||
for efficient parsing strategies that do not backtrack. | for efficient parsing strategies that do not backtrack. | |||
First, some definitions. A [delimiter run](@delimiter-run) is either | First, some definitions. A [delimiter run](@delimiter-run) is either | |||
a sequence of one or more `*` characters that is not preceded or | a sequence of one or more `*` characters that is not preceded or | |||
followed by a `*` character, or a sequence of one or more `_` | followed by a `*` character, or a sequence of one or more `_` | |||
characters that is not preceded or followed by a `_` character. | characters that is not preceded or followed by a `_` character. | |||
A [left-flanking delimiter run](@left-flanking-delimiter-run) is | A [left-flanking delimiter run](@left-flanking-delimiter-run) is | |||
a [delimiter run] that is (a) not followed by [unicode whitespace], | a [delimiter run] that is (a) not followed by [Unicode whitespace], | |||
and (b) either not followed by a [punctuation character], or | and (b) either not followed by a [punctuation character], or | |||
preceded by [unicode whitespace] or a [punctuation character]. | preceded by [Unicode whitespace] or a [punctuation character]. | |||
For purposes of this definition, the beginning and the end of | For purposes of this definition, the beginning and the end of | |||
the line count as unicode whitespace. | the line count as Unicode whitespace. | |||
A [right-flanking delimiter run](@right-flanking-delimiter-run) is | A [right-flanking delimiter run](@right-flanking-delimiter-run) is | |||
a [delimiter run] that is (a) not preceded by [unicode whitespace], | a [delimiter run] that is (a) not preceded by [Unicode whitespace], | |||
and (b) either not preceded by a [punctuation character], or | and (b) either not preceded by a [punctuation character], or | |||
followed by [unicode whitespace] or a [punctuation character]. | followed by [Unicode whitespace] or a [punctuation character]. | |||
For purposes of this definition, the beginning and the end of | For purposes of this definition, the beginning and the end of | |||
the line count as unicode whitespace. | the line count as Unicode whitespace. | |||
Here are some examples of delimiter runs. | Here are some examples of delimiter runs. | |||
- left-flanking but not right-flanking: | - left-flanking but not right-flanking: | |||
``` | ``` | |||
***abc | ***abc | |||
_abc | _abc | |||
**"abc" | **"abc" | |||
_"abc" | _"abc" | |||
skipping to change at line 6468 | skipping to change at line 6530 | |||
just a backslash: | just a backslash: | |||
. | . | |||
[link](foo\bar) | [link](foo\bar) | |||
. | . | |||
<p><a href="foo%5Cbar">link</a></p> | <p><a href="foo%5Cbar">link</a></p> | |||
. | . | |||
URL-escaping should be left alone inside the destination, as all | URL-escaping should be left alone inside the destination, as all | |||
URL-escaped characters are also valid URL characters. HTML entities in | URL-escaped characters are also valid URL characters. HTML entities in | |||
the destination will be parsed into the corresponding unicode | the destination will be parsed into the corresponding Unicode | |||
codepoints, as usual, and optionally URL-escaped when written as HTML. | code points, as usual, and optionally URL-escaped when written as HTML. | |||
. | . | |||
[link](foo%20bä) | [link](foo%20bä) | |||
. | . | |||
<p><a href="foo%20b%C3%A4">link</a></p> | <p><a href="foo%20b%C3%A4">link</a></p> | |||
. | . | |||
Note that, because titles can often be parsed as destinations, | Note that, because titles can often be parsed as destinations, | |||
if you try to omit the destination and keep the title, you'll | if you try to omit the destination and keep the title, you'll | |||
get unexpected results: | get unexpected results: | |||
skipping to change at line 6678 | skipping to change at line 6740 | |||
A [link label](@link-label) begins with a left bracket (`[`) and ends | A [link label](@link-label) begins with a left bracket (`[`) and ends | |||
with the first right bracket (`]`) that is not backslash-escaped. | with the first right bracket (`]`) that is not backslash-escaped. | |||
Between these brackets there must be at least one [non-whitespace character]. | Between these brackets there must be at least one [non-whitespace character]. | |||
Unescaped square bracket characters are not allowed in | Unescaped square bracket characters are not allowed in | |||
[link label]s. A link label can have at most 999 | [link label]s. A link label can have at most 999 | |||
characters inside the square brackets. | characters inside the square brackets. | |||
One label [matches](@matches) | One label [matches](@matches) | |||
another just in case their normalized forms are equal. To normalize a | another just in case their normalized forms are equal. To normalize a | |||
label, perform the *unicode case fold* and collapse consecutive internal | label, perform the *Unicode case fold* and collapse consecutive internal | |||
[whitespace] to a single space. If there are multiple | [whitespace] to a single space. If there are multiple | |||
matching reference link definitions, the one that comes first in the | matching reference link definitions, the one that comes first in the | |||
document is used. (It is desirable in such cases to emit a warning.) | document is used. (It is desirable in such cases to emit a warning.) | |||
The contents of the first link label are parsed as inlines, which are | The contents of the first link label are parsed as inlines, which are | |||
used as the link's text. The link's URI and title are provided by the | used as the link's text. The link's URI and title are provided by the | |||
matching [link reference definition]. | matching [link reference definition]. | |||
Here is a simple example: | Here is a simple example: | |||
End of changes. 35 change blocks. | ||||
47 lines changed or deleted | 109 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |