Nothing Special   »   [go: up one dir, main page]

 spec.txt   spec.txt 
--- ---
title: CommonMark Spec title: CommonMark Spec
author: John MacFarlane author: John MacFarlane
version: 0.21 version: 0.22
date: 2015-07-14 date: 2015-08-23
license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
... ...
# Introduction # Introduction
## What is Markdown? ## What is Markdown?
Markdown is a plain text format for writing structured documents, Markdown is a plain text format for writing structured documents,
based on conventions used for indicating formatting in email and based on conventions used for indicating formatting in email and
usenet posts. It was developed in 2004 by John Gruber, who wrote usenet posts. It was developed in 2004 by John Gruber, who wrote
skipping to change at line 207 skipping to change at line 207
In the examples, the `→` character is used to represent tabs. In the examples, the `→` character is used to represent tabs.
# Preliminaries # Preliminaries
## Characters and lines ## Characters and lines
Any sequence of [character]s is a valid CommonMark Any sequence of [character]s is a valid CommonMark
document. document.
A [character](@character) is a unicode code point. A [character](@character) is a Unicode code point. Although some
code points (for example, combining accents) do not correspond to
characters in an intuitive sense, all code points count as characters
for purposes of this spec.
This spec does not specify an encoding; it thinks of lines as composed This spec does not specify an encoding; it thinks of lines as composed
of characters rather than bytes. A conforming parser may be limited of [character]s rather than bytes. A conforming parser may be limited
to a certain encoding. to a certain encoding.
A [line](@line) is a sequence of zero or more [character]s A [line](@line) is a sequence of zero or more [character]s
other than newline (`U+000A`) or carriage return (`U+000D`),
followed by a [line ending] or by the end of file. followed by a [line ending] or by the end of file.
A [line ending](@line-ending) is a newline (`U+000A`), carriage return A [line ending](@line-ending) is a newline (`U+000A`), a carriage return
(`U+000D`), or carriage return + newline. (`U+000D`) not followed by a newline, or a carriage return and a
following newline.
A line containing no characters, or a line containing only spaces A line containing no characters, or a line containing only spaces
(`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line). (`U+0020`) or tabs (`U+0009`), is called a [blank line](@blank-line).
The following definitions of character classes will be used in this spec: The following definitions of character classes will be used in this spec:
A [whitespace character](@whitespace-character) is a space A [whitespace character](@whitespace-character) is a space
(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), (`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`),
form feed (`U+000C`), or carriage return (`U+000D`). form feed (`U+000C`), or carriage return (`U+000D`).
[Whitespace](@whitespace) is a sequence of one or more [whitespace [Whitespace](@whitespace) is a sequence of one or more [whitespace
character]s. character]s.
A [unicode whitespace character](@unicode-whitespace-character) is A [Unicode whitespace character](@unicode-whitespace-character) is
any code point in the unicode `Zs` class, or a tab (`U+0009`), any code point in the Unicode `Zs` class, or a tab (`U+0009`),
carriage return (`U+000D`), newline (`U+000A`), or form feed carriage return (`U+000D`), newline (`U+000A`), or form feed
(`U+000C`). (`U+000C`).
[Unicode whitespace](@unicode-whitespace) is a sequence of one [Unicode whitespace](@unicode-whitespace) is a sequence of one
or more [unicode whitespace character]s. or more [Unicode whitespace character]s.
A [space](@space) is `U+0020`. A [space](@space) is `U+0020`.
A [non-whitespace character](@non-space-character) is any character A [non-whitespace character](@non-whitespace-character) is any character
that is not a [whitespace character]. that is not a [whitespace character].
An [ASCII punctuation character](@ascii-punctuation-character) An [ASCII punctuation character](@ascii-punctuation-character)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`, `*`, `+`, `,`, `-`, `.`, `/`, `:`, `;`, `<`, `=`, `>`, `?`, `@`,
`[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`. `[`, `\`, `]`, `^`, `_`, `` ` ``, `{`, `|`, `}`, or `~`.
A [punctuation character](@punctuation-character) is an [ASCII A [punctuation character](@punctuation-character) is an [ASCII
punctuation character] or anything in punctuation character] or anything in
the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
## Tabs ## Tabs
Tabs in lines are not expanded to [spaces][space]. However, Tabs in lines are not expanded to [spaces][space]. However,
in contexts where indentation is significant for the in contexts where indentation is significant for the
document's structure, tabs behave as if they were replaced document's structure, tabs behave as if they were replaced
by spaces with a tab stop of 4 characters. by spaces with a tab stop of 4 characters.
. .
→foo→baz→→bim →foo→baz→→bim
skipping to change at line 303 skipping to change at line 309
. .
. .
>→foo→bar >→foo→bar
. .
<blockquote> <blockquote>
<p>foo→bar</p> <p>foo→bar</p>
</blockquote> </blockquote>
. .
.
foo
→bar
.
<pre><code>foo
bar
</code></pre>
.
## Insecure characters ## Insecure characters
For security reasons, the Unicode character `U+0000` must be replaced For security reasons, the Unicode character `U+0000` must be replaced
with the replacement character (`U+FFFD`). with the replacement character (`U+FFFD`).
# Blocks and inlines # Blocks and inlines
We can think of a document as a sequence of We can think of a document as a sequence of
[blocks](@block)---structural elements like paragraphs, block [blocks](@block)---structural elements like paragraphs, block
quotations, lists, headers, rules, and code blocks. Some blocks (like quotations, lists, headers, rules, and code blocks. Some blocks (like
skipping to change at line 564 skipping to change at line 579
<hr /> <hr />
</li> </li>
</ul> </ul>
. .
## ATX headers ## ATX headers
An [ATX header](@atx-header) An [ATX header](@atx-header)
consists of a string of characters, parsed as inline content, between an consists of a string of characters, parsed as inline content, between an
opening sequence of 1--6 unescaped `#` characters and an optional opening sequence of 1--6 unescaped `#` characters and an optional
closing sequence of any number of `#` characters. The opening sequence closing sequence of any number of unescaped `#` characters.
of `#` characters cannot be followed directly by a The opening sequence of `#` characters cannot be followed directly by a
[non-whitespace character]. The optional closing sequence of `#`s must be [non-whitespace character]. The optional closing sequence of `#`s must be
preceded by a [space] and may be followed by spaces only. The opening preceded by a [space] and may be followed by spaces only. The opening
`#` character may be indented 0-3 spaces. The raw contents of the `#` character may be indented 0-3 spaces. The raw contents of the
header are stripped of leading and trailing spaces before being parsed header are stripped of leading and trailing spaces before being parsed
as inline content. The header level is equal to the number of `#` as inline content. The header level is equal to the number of `#`
characters in the opening sequence. characters in the opening sequence.
Simple headers: Simple headers:
. .
skipping to change at line 697 skipping to change at line 712
. .
Spaces are allowed after the closing sequence: Spaces are allowed after the closing sequence:
. .
### foo ### ### foo ###
. .
<h3>foo</h3> <h3>foo</h3>
. .
A sequence of `#` characters with a A sequence of `#` characters with anything but [space]s following it
[non-whitespace character] following it
is not a closing sequence, but counts as part of the contents of the is not a closing sequence, but counts as part of the contents of the
header: header:
. .
### foo ### b ### foo ### b
. .
<h3>foo ### b</h3> <h3>foo ### b</h3>
. .
The closing sequence must be preceded by a space: The closing sequence must be preceded by a space:
skipping to change at line 1637 skipping to change at line 1651
5. **Start condition:** line begins with the string 5. **Start condition:** line begins with the string
`<![CDATA[`.\ `<![CDATA[`.\
**End condition:** line contains the string `]]>`. **End condition:** line contains the string `]]>`.
6. **Start condition:** line begins the string `<` or `</` 6. **Start condition:** line begins the string `<` or `</`
followed by one of the strings (case-insensitive) `address`, followed by one of the strings (case-insensitive) `address`,
`article`, `aside`, `base`, `basefont`, `blockquote`, `body`, `article`, `aside`, `base`, `basefont`, `blockquote`, `body`,
`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`, `caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`,
`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`, `dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`,
`footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`,
`html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, `html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`, `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`, `section`, `source`, `summary`, `table`, `tbody`, `td`,
`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
by [whitespace], the end of the line, the string `>`, or by [whitespace], the end of the line, the string `>`, or
the string `/>`.\ the string `/>`.\
**End condition:** line is followed by a [blank line]. **End condition:** line is followed by a [blank line].
7. **Start condition:** line begins with an [open tag] 7. **Start condition:** line begins with a complete [open tag]
(with any [tag name]) followed only by [whitespace] or the end or [closing tag] (with any [tag name] other than `script`,
of the line.\ `style`, or `pre`) followed only by [whitespace]
or the end of the line.\
**End condition:** line is followed by a [blank line]. **End condition:** line is followed by a [blank line].
All types of [HTML blocks] except type 7 may interrupt All types of [HTML blocks] except type 7 may interrupt
a paragraph. Blocks of type 7 may not interrupt a paragraph. a paragraph. Blocks of type 7 may not interrupt a paragraph.
(This restricted is intended to prevent unwanted interpretation (This restriction is intended to prevent unwanted interpretation
of long tags inside a wrapped paragraph as starting HTML blocks.) of long tags inside a wrapped paragraph as starting HTML blocks.)
Some simple examples follow. Here are some basic HTML blocks Some simple examples follow. Here are some basic HTML blocks
of type 6: of type 6:
. .
<table> <table>
<tr> <tr>
<td> <td>
hi hi
skipping to change at line 1851 skipping to change at line 1866
. .
<i class="foo"> <i class="foo">
*bar* *bar*
</i> </i>
. .
<i class="foo"> <i class="foo">
*bar* *bar*
</i> </i>
. .
.
</ins>
*bar*
.
</ins>
*bar*
.
These rules are designed to allow us to work with tags that These rules are designed to allow us to work with tags that
can function as either block-level or inline-level tags. can function as either block-level or inline-level tags.
The `<del>` tag is a nice example. We can surround content with The `<del>` tag is a nice example. We can surround content with
`<del>` tags in three different ways. In this case, we get a raw `<del>` tags in three different ways. In this case, we get a raw
HTML block, because the `<del>` tag is on a line by itself: HTML block, because the `<del>` tag is on a line by itself:
. .
<del> <del>
*foo* *foo*
</del> </del>
skipping to change at line 2814 skipping to change at line 2837
> foo > foo
. .
<blockquote> <blockquote>
<p>bar <p>bar
baz baz
foo</p> foo</p>
</blockquote> </blockquote>
. .
Laziness only applies to lines that would have been continuations of Laziness only applies to lines that would have been continuations of
paragraphs had they been prepended with `>`. For example, the paragraphs had they been prepended with [block quote marker]s.
`>` cannot be omitted in the second line of For example, the `> ` cannot be omitted in the second line of
``` markdown ``` markdown
> foo > foo
> --- > ---
``` ```
without changing the meaning: without changing the meaning:
. .
> foo > foo
--- ---
. .
<blockquote> <blockquote>
<p>foo</p> <p>foo</p>
</blockquote> </blockquote>
<hr /> <hr />
. .
Similarly, if we omit the `>` in the second line of Similarly, if we omit the `> ` in the second line of
``` markdown ``` markdown
> - foo > - foo
> - bar > - bar
``` ```
then the block quote ends after the first line: then the block quote ends after the first line:
. .
> - foo > - foo
skipping to change at line 2857 skipping to change at line 2880
<blockquote> <blockquote>
<ul> <ul>
<li>foo</li> <li>foo</li>
</ul> </ul>
</blockquote> </blockquote>
<ul> <ul>
<li>bar</li> <li>bar</li>
</ul> </ul>
. .
For the same reason, we can't omit the `>` in front of For the same reason, we can't omit the `> ` in front of
subsequent lines of an indented or fenced code block: subsequent lines of an indented or fenced code block:
. .
> foo > foo
bar bar
. .
<blockquote> <blockquote>
<pre><code>foo <pre><code>foo
</code></pre> </code></pre>
</blockquote> </blockquote>
skipping to change at line 2884 skipping to change at line 2907
foo foo
``` ```
. .
<blockquote> <blockquote>
<pre><code></code></pre> <pre><code></code></pre>
</blockquote> </blockquote>
<p>foo</p> <p>foo</p>
<pre><code></code></pre> <pre><code></code></pre>
. .
Note that in the following case, we have a paragraph
continuation line:
.
> foo
- bar
.
<blockquote>
<p>foo
- bar</p>
</blockquote>
.
To see why, note that in
```markdown
> foo
> - bar
```
the `- bar` is indented too far to start a list, and can't
be an indented code block because indented code blocks cannot
interrupt paragraphs, so it is a [paragraph continuation line].
A block quote can be empty: A block quote can be empty:
. .
> >
. .
<blockquote> <blockquote>
</blockquote> </blockquote>
. .
. .
skipping to change at line 3581 skipping to change at line 3628
<pre><code>bar <pre><code>bar
</code></pre> </code></pre>
</li> </li>
<li> <li>
<pre><code>baz <pre><code>baz
</code></pre> </code></pre>
</li> </li>
</ul> </ul>
. .
A list item can begin with at most one blank line.
In the following example, `foo` is not part of the list
item:
.
-
foo
.
<ul>
<li></li>
</ul>
<p>foo</p>
.
Here is an empty bullet list item: Here is an empty bullet list item:
. .
- foo - foo
- -
- bar - bar
. .
<ul> <ul>
<li>foo</li> <li>foo</li>
<li></li> <li></li>
skipping to change at line 4814 skipping to change at line 4876
``` ```
. .
<pre><code class="language-foo+bar">foo <pre><code class="language-foo+bar">foo
</code></pre> </code></pre>
. .
## Entities ## Entities
With the goal of making this standard as HTML-agnostic as possible, all With the goal of making this standard as HTML-agnostic as possible, all
valid HTML entities (except in code blocks and code spans) valid HTML entities (except in code blocks and code spans)
are recognized as such and converted into unicode characters before are recognized as such and converted into Unicode characters before
they are stored in the AST. This means that renderers to formats other they are stored in the AST. This means that renderers to formats other
than HTML need not be HTML-entity aware. HTML renderers may either escape than HTML need not be HTML-entity aware. HTML renderers may either escape
unicode characters as entities or leave them as they are. (However, Unicode characters as entities or leave them as they are. (However,
`"`, `&`, `<`, and `>` must always be rendered as entities.) `"`, `&`, `<`, and `>` must always be rendered as entities.)
[Named entities](@name-entities) consist of `&` [Named entities](@name-entities) consist of `&` + any of the valid
+ any of the valid HTML5 entity names + `;`. The HTML5 entity names + `;`. The
[following document](https://html.spec.whatwg.org/multipage/entities.json) [following document](https://html.spec.whatwg.org/multipage/entities.json)
is used as an authoritative source of the valid entity names and their is used as an authoritative source of the valid entity names and their
corresponding codepoints. corresponding code points.
. .
&nbsp; &amp; &copy; &AElig; &Dcaron; &nbsp; &amp; &copy; &AElig; &Dcaron;
&frac34; &HilbertSpace; &DifferentialD; &frac34; &HilbertSpace; &DifferentialD;
&ClockwiseContourIntegral; &ngE; &ClockwiseContourIntegral; &ngE;
. .
<p>  &amp; © Æ Ď <p>  &amp; © Æ Ď
¾ ℋ ⅆ ¾ ℋ ⅆ
∲ ≧̸</p> ∲ ≧̸</p>
. .
[Decimal entities](@decimal-entities) [Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
entities need to be recognised and transformed into their corresponding entities need to be recognised and transformed into their corresponding
unicode codepoints. Invalid unicode codepoints will be replaced by Unicode code points. Invalid Unicode code points will be replaced by
the "unknown codepoint" character (`U+FFFD`). For security reasons, the "unknown code point" character (`U+FFFD`). For security reasons,
the codepoint `U+0000` will also be replaced by `U+FFFD`. the code point `U+0000` will also be replaced by `U+FFFD`.
. .
&#35; &#1234; &#992; &#98765432; &#0; &#35; &#1234; &#992; &#98765432; &#0;
. .
<p># Ӓ Ϡ � �</p> <p># Ӓ Ϡ � �</p>
. .
[Hexadecimal entities](@hexadecimal-entities) [Hexadecimal entities](@hexadecimal-entities) consist of `&#` + either
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits `X` or `x` + a string of 1-8 hexadecimal digits + `;`. They will also
+ `;`. They will also be parsed and turned into the corresponding be parsed and turned into the corresponding Unicode code points in the
unicode codepoints in the AST. AST.
. .
&#X22; &#XD06; &#xcab; &#X22; &#XD06; &#xcab;
. .
<p>&quot; ആ ಫ</p> <p>&quot; ആ ಫ</p>
. .
Here are some nonentities: Here are some nonentities:
. .
skipping to change at line 5144 skipping to change at line 5206
The rules given below capture all of these patterns, while allowing The rules given below capture all of these patterns, while allowing
for efficient parsing strategies that do not backtrack. for efficient parsing strategies that do not backtrack.
First, some definitions. A [delimiter run](@delimiter-run) is either First, some definitions. A [delimiter run](@delimiter-run) is either
a sequence of one or more `*` characters that is not preceded or a sequence of one or more `*` characters that is not preceded or
followed by a `*` character, or a sequence of one or more `_` followed by a `*` character, or a sequence of one or more `_`
characters that is not preceded or followed by a `_` character. characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@left-flanking-delimiter-run) is A [left-flanking delimiter run](@left-flanking-delimiter-run) is
a [delimiter run] that is (a) not followed by [unicode whitespace], a [delimiter run] that is (a) not followed by [Unicode whitespace],
and (b) either not followed by a [punctuation character], or and (b) either not followed by a [punctuation character], or
preceded by [unicode whitespace] or a [punctuation character]. preceded by [Unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of For purposes of this definition, the beginning and the end of
the line count as unicode whitespace. the line count as Unicode whitespace.
A [right-flanking delimiter run](@right-flanking-delimiter-run) is A [right-flanking delimiter run](@right-flanking-delimiter-run) is
a [delimiter run] that is (a) not preceded by [unicode whitespace], a [delimiter run] that is (a) not preceded by [Unicode whitespace],
and (b) either not preceded by a [punctuation character], or and (b) either not preceded by a [punctuation character], or
followed by [unicode whitespace] or a [punctuation character]. followed by [Unicode whitespace] or a [punctuation character].
For purposes of this definition, the beginning and the end of For purposes of this definition, the beginning and the end of
the line count as unicode whitespace. the line count as Unicode whitespace.
Here are some examples of delimiter runs. Here are some examples of delimiter runs.
- left-flanking but not right-flanking: - left-flanking but not right-flanking:
``` ```
***abc ***abc
_abc _abc
**"abc" **"abc"
_"abc" _"abc"
skipping to change at line 6468 skipping to change at line 6530
just a backslash: just a backslash:
. .
[link](foo\bar) [link](foo\bar)
. .
<p><a href="foo%5Cbar">link</a></p> <p><a href="foo%5Cbar">link</a></p>
. .
URL-escaping should be left alone inside the destination, as all URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in URL-escaped characters are also valid URL characters. HTML entities in
the destination will be parsed into the corresponding unicode the destination will be parsed into the corresponding Unicode
codepoints, as usual, and optionally URL-escaped when written as HTML. code points, as usual, and optionally URL-escaped when written as HTML.
. .
[link](foo%20b&auml;) [link](foo%20b&auml;)
. .
<p><a href="foo%20b%C3%A4">link</a></p> <p><a href="foo%20b%C3%A4">link</a></p>
. .
Note that, because titles can often be parsed as destinations, Note that, because titles can often be parsed as destinations,
if you try to omit the destination and keep the title, you'll if you try to omit the destination and keep the title, you'll
get unexpected results: get unexpected results:
skipping to change at line 6678 skipping to change at line 6740
A [link label](@link-label) begins with a left bracket (`[`) and ends A [link label](@link-label) begins with a left bracket (`[`) and ends
with the first right bracket (`]`) that is not backslash-escaped. with the first right bracket (`]`) that is not backslash-escaped.
Between these brackets there must be at least one [non-whitespace character]. Between these brackets there must be at least one [non-whitespace character].
Unescaped square bracket characters are not allowed in Unescaped square bracket characters are not allowed in
[link label]s. A link label can have at most 999 [link label]s. A link label can have at most 999
characters inside the square brackets. characters inside the square brackets.
One label [matches](@matches) One label [matches](@matches)
another just in case their normalized forms are equal. To normalize a another just in case their normalized forms are equal. To normalize a
label, perform the *unicode case fold* and collapse consecutive internal label, perform the *Unicode case fold* and collapse consecutive internal
[whitespace] to a single space. If there are multiple [whitespace] to a single space. If there are multiple
matching reference link definitions, the one that comes first in the matching reference link definitions, the one that comes first in the
document is used. (It is desirable in such cases to emit a warning.) document is used. (It is desirable in such cases to emit a warning.)
The contents of the first link label are parsed as inlines, which are The contents of the first link label are parsed as inlines, which are
used as the link's text. The link's URI and title are provided by the used as the link's text. The link's URI and title are provided by the
matching [link reference definition]. matching [link reference definition].
Here is a simple example: Here is a simple example:
 End of changes. 35 change blocks. 
47 lines changed or deleted 109 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/