Nothing Special   »   [go: up one dir, main page]

W3C

XML 1.0 Specification Errata

Abstract

This document records all known errors in the Extensible Markup Language (XML) 1.0 Specification (W3C Recommendation 10 Feb 1998); this specification has been superseded by the Second Edition of the Extensible Markup Language (XML) 1.0 Specification (W3C Recommendation 6 Oct 2000). For updates see the latest version.

The errata are numbered, classified as Substantive, Editorial or Clarification and listed in reverse chronological order of their date of publication. Early errata (1999-02-17 and before) are neither classified nor dated.

Please email error reports to xml-editor@w3.org.

Known Errors

Errata as of 2000-09-27.

E109 Substantive

Obsoletes E13
Section 2.8
  1. Add a well-formedness constraint applying to production [28] doctypedecl as follows:

    Well-formedness constraint: External Subset
    The external subset, if any, must match the production for extSubset.
  2. Add a new production [28a] as follows:

    DeclSep ::= PEReference | S
  3. In productions [28] doctypedecl and [31] extSubsetDecl, replace "PEReference |S" with "DeclSep".
  4. Add a well-formedness constraint applying to production [28a] DeclSep as follows:

    Well-formedness constraint: PE Between Declarations
    The replacement text of a parameter entity reference in a DeclSep must match the production extSubsetDecl.
  5. In the paragraph preceding production [30], change "any external parameter entities referred to in the DTD" to "any parameter entities referenced in a DeclSep". In the following sentence, change "portions of the contents of the external subset or of external parameter entities" to "portions of the contents of the external subset or of these external parameter entities".
  6. In that same paragraph, delete the sentences added by E13 "The external subset and any external parameter entities referred to in the DTD must match the production for extPE. See 4.3.2 Well-Formed Parsed Entities."
Section 4.3.2
  1. In the first paragraph, replace the sentence "An external parameter entity is well-formed if it matches the production labeled extPE." with "All external parameter entities are well-formed by definition."

  2. Remove production [79] extPE.

Rationale
There were several problems with well-formedness of parameter entities. In particular, nothing was constraining the well-formedness of internal PEs included at the top level (between complete declarations in the DTD).

E108 Substantive

Obsoletes E62
Section 2.3

Rescind E62 and restore productions [6] and [8] to their First Edition state.

Rationale
There were reports of real documents relying on white space other that #x20 in attributes of type NAMES or NMTOKENS. It was thus concluded that full SGML compatibility had already been sacrificed in the First Edition and that it was preferable not to change the status quo.

E107 Clarification

Section 2.8

Change the beginning of the first sentence to read: "XML documents should begin with an XML declaration..."

Section 4.3.1

Change the first sentence to read: "External parsed entities should each begin with a text declaration."


E106 Editorial

Section 4.3.3

In the second paragraph after production [81], replace « "ISO-8859-9" » with « "ISO-8859-n" (where n is the part number) »

Rationale
Make the spec impervious to the addition of new parts to ISO/IEC 8859 (now at -15).

E105 Clarification

Obsoletes E44
Appendix F

Replace the whole of Appendix F with the following:

F Autodetection of Character Encodings (Non-Normative)

The XML encoding declaration functions as an internal label on each entity, indicating which character encoding is in use. Before an XML processor can read the internal label, however, it apparently has to know what character encoding is in use--which is what the internal label is trying to indicate. In the general case, this is a hopeless situation. It is not entirely hopeless in XML, however, because XML limits the general case in two ways: each implementation is assumed to support only a finite set of character encodings, and the XML encoding declaration is restricted in position and content in order to make it feasible to autodetect the character encoding in use in each entity in normal cases. Also, in many cases other sources of information are available in addition to the XML data stream itself. Two cases may be distinguished, depending on whether the XML entity is presented to the processor without, or with, any accompanying (external) information. We consider the first case first.

F.1 Detection Without External Encoding Information

Because each XML entity not accompanied by external encoding information and not in UTF-8 or UTF-16 encoding must begin with an XML encoding declaration, in which the first characters must be '<?xml ', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is "#x0000003C" and '?' is " #x0000003F", and the Byte Order Mark required of UTF-16 data streams is "#xFEFF". The notation ## is used to denote any byte value except that two consecutive ##s cannot be both 00.

With a Byte Order Mark:

00 00 FE FF UCS-4, big-endian machine (1234 order)
FF FE 00 00 UCS-4, little-endian machine (4321 order)
00 00 FF FE UCS-4, unusual octet order (2143)
FE FF 00 00 UCS-4, unusual octet order (3412)
FE FF ## ## UTF-16, big-endian
FF FE ## ## UTF-16, little-endian
EF BB BF UTF-8

Without a Byte Order Mark:

00 00 00 3C UCS-4 or other encoding with a 32-bit code unit and ASCII characters encoded as ASCII values, in respectively big-endian (1234), little-endian (4321) and two unusual byte orders (2143 and 3412). The encoding declaration must be read to determine which of UCS-4 or other supported 32-bit encodings applies.
3C 00 00 00
00 00 3C 00
00 3C 00 00
00 3C 00 3F UTF-16BE or big-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in big-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which)
3C 00 3F 00 UTF-16LE or little-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which)
3C 3F 78 6D UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values; the actual encoding declaration must be read to detect which of these applies, but since all of these encodings use the same bit patterns for the relevant ASCII characters, the encoding declaration itself may be read reliably
4C 6F A7 94 EBCDIC (in some flavor; the full encoding declaration must be read to tell which code page is in use)
Other UTF-8 without an encoding declaration, or else the data stream is mislabeled (lacking a required encoding declaration), corrupt, fragmentary, or enclosed in a wrapper of some kind

Note:

In cases above which do not require reading the encoding declaration to determine the encoding, section 4.3.3 still requires that the encoding declaration, if present, be read and that the encoding name be checked to match the actual encoding of the entity. Also, it is possible that new character encodings will be invented that will make it necessary to use the encoding declaration to determine the encoding, in cases where this is not required at present.

This level of autodetection is enough to read the XML encoding declaration and parse the character-encoding identifier, which is still necessary to distinguish the individual members of each family of encodings (e.g. to tell UTF-8 from 8859, and the parts of 8859 from each other, or to distinguish the specific EBCDIC code page in use, and so on).

Because the contents of the encoding declaration are restricted to characters from the ASCII repertoire (however encoded), a processor can reliably read the entire encoding declaration as soon as it has detected which family of encodings is in use. Since in practice, all widely used character encodings fall into one of the categories above, the XML encoding declaration allows reasonably reliable in-band labeling of character encodings, even when external sources of information at the operating-system or transport-protocol level are unreliable. Character encodings such as UTF-7 that make overloaded usage of ASCII-valued bytes may fail to be reliably detected.

Once the processor has detected the character encoding in use, it can act appropriately, whether by invoking a separate input routine for each case, or by calling the proper conversion function on each character of input.

Like any self-labeling system, the XML encoding declaration will not work if any software changes the entity's character set or encoding without updating the encoding declaration. Implementors of character-encoding routines should be careful to ensure the accuracy of the internal and external information used to label the entity.

F.2 Priorities in the Presence of External Encoding Information

The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC 2376] or its successor, which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended.

  • If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding.


E104 Clarification

Obsoletes E86
Section 2.11

Replace the second paragraph with the following:

To simplify the tasks of applications, the characters passed to an application by the XML processor must be as if the XML processor normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.
Rationale

The original description wrongly had the effect of requiring normalization of sequences such as that resulting from

  <!ENTITY % e '<!ENTITY f "a&#xD;&#xA;b">'>

and was thus not equivalent to normalization on input.


E103 Clarification

Further clarifies E80
Section 4.6

Amend the first sentence of the text added by E80 to read:

If the entities lt or amp are declared, they must be declared as internal entities whose replacement text is a character reference to the respective character (less-than sign or ampersand) being escaped;
Rationale
It was still not clear how to declare lt and amp, in particular that &lt; (resp. &amp;) must denote less-than sign (resp. ampersand) and nothing else.

E102 Clarification

Appendix E

Prepend the following sentence to the first sentence of the first paragraph:

As noted in 3.2.1 Element Content, it is required that content models in element type declarations be deterministic. This requirement is for compatibility with SGML (which calls deterministic content models "unambiguous");
Rationale
It was not clear that deterministic content models are mandatory in XML and why.

E101 Editorial

Status of this document

To the sentence "Please report errors in this document to xml-editor@w3.org.", append "archives are available."


E100 Editorial

Status of this document

In the paragraph beginning "This document specifies a syntax...", add a sentence as follows:

The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/XML/#trans.

There is no E99

Errata as of 2000-08-10.

E98 Clarification Source: XML Core WG list [members only]

Section 2.3

Replace the Note with the following:

The "Namespaces in XML" Recommendation [NAMESPACES] assigns a meaning to names containing colon characters. Therefore authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.
Appendix A.2

Add a non-normative reference:

NAMESPACES
Namespaces in XML, eds. Tim Bray, Dave Hollander, Andrew Layman. 14 January 1999. Available at http://www.w3.org/TR/REC-xml-names/.

Errata as of 2000-07-27.

E97 Clarification Source: XML Core WG list [members only]

Obsoletes E40
Section 3.1

Replace the first sentence of the paragraph immediately preceding production [44] with the following:

Definition: An element with no content is said to be empty. The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.
Rationale
As the original sentence was in the conditional, there wasn't a real definition of an empty element

E96 Editorial Source: XML Core WG list [members only]

Section 4.4.8

Add the following at the end of the sole paragraph:

This behavior does not apply to parameter entity references within entity values; these are described in 4.4.5 Included in Literal.

Errata as of 2000-07-13.

E95 Editorial Source: xml-editor list

Section 3.3.3
Section 4.4.3
Appendix E
Replace all occurences of "parser" with "processor" (or "XML processor" if appropriate).
Rationale
In these cases 'processor' is really what is meant, and the term 'parser' is just a leftover from SGML terminology.

E94 Clarification Source: xml-editor list

Section 4.3.1
Add a sentence at the end as follows:
The text declaration in the external parsed entity is not considered part of its replacement text.

E93 Substantive Source: xml-editor list

Section 6
Rewrite the definitions of [^a-z], [^#xN-#xN], [^abc] and [^#xN#xN#xN] to make explicit that the universal character set is restricted to the characters allowed by production [2] Char.

E92 Clarification Source: xml-editor list

Section 4.1
Change the last sentence of the Entity Declared VC to:
Similarly, the declaration of a general entity must precede any attribute-list declaration containing a default value with a direct or indirect reference to that general entity.
Rationale
The spec was unclear on the fact that the declaration of a general entity must precede a reference to it, even when it is an indirect reference.

There is no E91


E90 Substantive Source: XML Core WG list [members only]

Obsoletes E41 and part of E63
Section 3.4
Delete the sentence "Note that for reliable parsing [...] detected", and the sentence added by erratum E63 "Parameter entity references are not recognized within an ignored conditional section".

Add the following to section 3.4 at the end of the third paragraph (i.e., after "are ignored"):

The contents of an ignored conditional section are parsed by ignoring all characters after the "[" following the keyword, except conditional section starts "<![" and ends "]]>", until the matching conditional section end is found. Parameter entity references are not recognised in this process.

Add a validity constraint applying to productions [62] includeSect and [63] ignoreSect:

Validity Constraint: Proper Conditional Section/PE Nesting.
If any of the <![, [, or ]]> of a conditional section is contained in the replacement text for a parameter-entity reference, all of them must be contained in the same replacement text.
Section 4.4
Change the definition of "Reference in DTD" (which was already changed by E41) to read:
as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue, AttValue, PI, Comment, SystemLiteral, PubidLiteral or the contents of an ignored conditional section (see section 3.4: Conditional Sections).
Rationale
How conditional sections are parsed and what type of markup is recognized inside them was still not clear.

E89 Substantive Source: XML Core WG list [members only]

Obsoletes part of E39
Section 2.4
Remove the text added by E39 to this section.
Change the first paragraph so that instead of ending "... document type declarations, and processing instructions", it ends:
... document type declarations, processing instructions, XML declarations, text declarations, and whitespace at the top-level of the document entity (that is, outside the document element and not inside any other markup).
Rationale
The XML declarations and text declarations were forgotten from the list of markup because they were perceived as PIs from their syntax. The top-level whitespace is clarified as markup because it matches S (prduction [3]) during parsing, is not reported by SAX, is not allowed by DOM and is excluded by the Infoset.

E88 Substantive Source: I18N issues with the XML Specification [members only]

Throughout the whole document
Change all occurences of "URI" to "URI reference".
Rationale
The spec is wrong in using "URI" where "URI reference" is really meant.

E87 Clarification Source: XML Core WG list [members only]

Section 1.2
Prepend the phrase "Marks a sentence describing " to the definitions of "for compatibility" and "for interoperability".
Rationale
It was not clear that the scope of these qualifiers is the sentence in which they appear.

E86 Substantive Source: XML Core WG list [members only]

Obsoleted by E104
Section 2.11
Replace the second paragraph with the following:
To simplify the tasks of applications, processors must normalize line breaks in parsed entities to #xA by either:
  1. translating the two-character sequence #xD #xA and any #xD that is not followed by #xA to #xA on input, before parsing, or
  2. using some other method such that the characters passed to the application are the same as if it did (a).
Rationale
The original description wrongly had the effect of requiring normalization of sequences such as that resulting from
  <!ENTITY % e '<!ENTITY f "a&#xD;&#xA;b">'>
and was thus not equivalent to normalization on input.

E85 Substantive Source: I18N issues with the XML Specification [members only]

Section 1.2
In the definition of match, remove the following sentence:
At user option, processors may normalize such characters to some canonical form.
Rationale
If normalization is allowed at user option, we can have processors disagreeing on the well-formedness of documents.

E84 Substantive Source: I18N issues with the XML Specification [members only]

Appendix B
From the first sentence of the first paragraph, remove ", without diacritics".
Rationale
Latin characters with diacritics are of course still base characters.

E83 Substantive Source: xml-editor list

Section 3.1
From the description of "No < in Attribute Values" WFC, remove the phrase within parentheses: (other than "&lt;").
Rationale
This is related to E65 which states that &lt; et al. are not magic.

E82 Substantive Source: xml-editor list

Section 2.8
Immediately after productions [28]-[29], add a paragraph:
Note: it is possible to construct a well-formed document containing a doctypedecl that neither points to an external subset nor contains an internal subset.

E81 Substantive Source: xml-editor list

Section 2.10
In the third paragraph, replace the sentence:
When declared, it must be given as an enumerated type whose only possible values are "default" and "preserve".
with:
When declared, it must be given as an enumerated type whose values are one or both of "default" and "preserve".
Add an example after the existing one (in the same table):
<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'>
Rationale
The wording in the spec was ambigous on whether the value of the xml:space attribute could be limited to one of the two possible values.

E80 Clarification Source: XML Core WG list [members only]

Further clarified by E103
Section 4.6
Replace the last sentence before the example box with:
If the entities lt or amp are declared, they must be declared as internal entities whose replacement text is a character reference to the character being escaped; the double escaping is required for these entities so that references to them produce a well-formed result. If the entities gt, apos or quot are declared, they must be declared as internal entities whose replacement text is the single character being escaped (or a character reference to that character; the double escaping here is unnecessary but harmless). For example:
Delete the sentence after the box.
Rationale
It was unclear whether the example declaration in 4.6 are normative or merely examples and what declarations are acceptable.

E79 Clarification Source: I18N issues with the XML Specification [members only]

Section 4.3.3
Append the following to the next to last paragraph (containing "It is a fatal error when an XML processor..."):

It is a fatal error if an XML entity is determined (via default, encoding declaration or higher-level protocol) to be in a certain encoding but contains octet sequences that are not legal in that encoding. It is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16.

Rationale
The current spec is not clear on how illegal charsets should be treated.

E78 Substantive Source: I18N issues with the XML Specification [members only]

Obsoletes E49
Section 2.8
Change the sentence following the next to last example to read:

The system identifier "hello.dtd" gives the address (a URI reference) of a DTD for the document.

Section 4.2.2
Replace the paragraph about non-ASCII characters in URIs to read as follows:
URI references require encoding and escaping of certain characters. The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC2732]. Disallowed characters must be escaped as follows:
  1. Each disallowed character is converted to UTF-8 [IETF RFC2279] as one or more bytes.
  2. Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
  3. The original character is replaced by the resulting character sequence.
Appendix A.2
Add to non-normative references:
IETF RFC2279
IETF (Internet Engineering Task Force). RFC 2279: UTF-8, a transformation format of ISO 10646, ed. F. Yergeau. 1998.

E77 Clarification Source: XML Core WG list [members only]

Section 4.3.3
Append to the first paragraph:
The terms 'UTF-8' and 'UTF-16' in this specification do not apply to character encodings labeled with any other labels, even if the encodings or the labels are very similar to UTF-8 or UTF-16.
Appendix A.2
Add to non-normative references:
IETF RFC2781
IETF (Internet Engineering Task Force). RFC 2781: UTF-16, an encoding of ISO 10646, ed. P. Hoffman, F. Yergeau. 2000.
Rationale
XML encoded in UTF-16 must have a BOM (byte order mark), and UTF-16BE and UTF-16LE don't.

E76 Clarification Source: XML Core WG list [members only]

Obsoletes E26
Section 4.2.2
Change the beginning of the paragraph after the definition of the Notation declared VC to read:
The SystemLiteral is called the entity's system identifier. It is a URI reference, meant to be dereferenced to obtain input for the XML processor to construct the entity's replacement text. It is an error for a fragment identifier (beginning with a # character) to be part of a system identifier.
Rationale
The SystemLiteral really is a URI reference and not a URI. However, since an XML processor has no use of a fragment identifier in this case, it is considered an error to have one.

E75 Clarification Source: XML Core WG list [members only]

Section 2.8
Add the following after the paragraph beginning "The markup declarations may ...":
Parameter entity references are recognised anywhere in the DTD (internal and external subsets and external parameter entities), except in literals, processing instructions, comments and the contents of ignored conditional sections (see Section 3.4: Conditional Sections). They are also recognised in entity value literals. The use of parameter entities in the internal subset is restricted as described below.
Rationale
The specification currently does not define anywhere where paremeter entity references are recognized. One has to know this from SGML.

E74 Editorial Source: Martin Dürst

Appendix F
Remove "All other heuristics and sources of information are solely for error recovery." from the first item of the second list.
Rationale
This opened all kinds of backdoors for heuristics that are undesired.

E73 Substantive Source:

Obsoletes E31, E60, and part of E38
Section 2.12
Change the last sentence of the first paragraph to:
The values of the attribute are language identifiers as defined by [IETF RFC 1766], "Tags for the Identification of Languages" or its successor on the IETF Standards Track.
Replace productions [33] to [38] and all the following text, down to but excluding the sentence "For example" just before the examples, with the following:
Note: RFC 1766 tags are constructed from two-letter language codes as defined by [ISO 639], from two-letter country codes as defined by [ISO 3166] or from language identifiers registered with the Internet Assigned Numbers Authority [IANA-LANGCODES]. It is expected that the successor to [IETF RFC 1766] will introduce three-letter language codes for languages not presently covered by [ISO 639].
Rationale
The XML processor does not deal with the value of xml:lang, it just passes it on to the application. Checking its correctness at this level has no benefit and hurts with updates to RFC1766 (forthcoming). The spec must still impose the semantics of xml:lang by pointing to RFC 1766.

E72 Clarification Source:

Section 2.3
Add a motherhood note after production [9] as follows:
Note: although the EntityValue production allows the definition of an entity consisting of a single explicit < in the literal (e.g. <!ENTITY mylt "<">), it is strongly advised to avoid this practice since any reference to that entity will cause a well-formedness error.

E71 Editorial Source: xml-editor list

Section 3.1
Change production [43] to
  [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*
Rationale
The intent of the CharData production was to exclude the string "]]>" from being parseable as CharData and therefore content. However, this string could still be parsed as content with 2 adjacent CharData productions.

E70 Substantive Source: XML Core WG list [members only]

Obsoletes E24 and E61
Section 3.3.3
Replace the whole of section 3.3.3 with the following:

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks must have been normalized on input to #xA as described in section 2.11, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
    1. For a character reference, append the referenced character to the normalized value.
    2. For an entity reference, recursively apply step (3) of this algorithm to the replacement text of the entity.
    3. For a whitespace character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    4. For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a whitespace character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a whitespace character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a whitespace character; being recursively processed, the whitespace character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read should be treated by a non-validating parser as if declared CDATA.

Here are a few example of attribute normalization. Given the declarations:
<!ENTITY d "&#xD;">
<!ENTITY a "&#xA;">
<!ENTITY da "&#xD;&#xA;">

the attribute specifications in the left column would be normalized to the character sequences of the middle column if the attribute a is declared NMTOKENS and to those of the right columns if a is declared CDATA.

Attribute specification a is NMTOKENS a is CDATA
a="

xyz"
x y z
#x20 #x20 x y z
a="&d;&d;A&a;&a;B&da;"
A #x20 B
#x20 #x20 A #x20 #x20 B #x20 #x20
a=
"&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"
#xD #xD A #xA #xA B #xD #xA
#xD #xD A #xA #xA B #xD #xA

It is noteworthy that the last example is not valid (but still well-formed) if a is declared to be of type NMTOKENS.

Rationale
The description of line-break handling was still confusing and suffered from the same defect as section 2.11, which was corrected in erratum E24.

E69 Substantive Source: XML Core WG list [members only]

Obsoletes E16
Section 2.2
In the first paragraph of section 2.2, just after the sentence beginning "Legal characters are tab, carriage return..." add the following:
The versions of Unicode and ISO 10646 cited in the references were current at the time this document was prepared. Since new characters may be added to these standards by amendments, XML processors must accept any character in the range specified for Char. XML processors may at user option check that the data characters in the document are legal characters in a particular version of Unicode or ISO 10646.
Amended to take into account i18n comments:
New characters may be added to the Unicode and ISO 10646 standards cited in the references by amendements or new editions. Consequently, XML processors must accept any character in the range specified for Char.

To take into account Petr Kuzel's comments and really finish E16, also amend the sentence to read: "Legal characters, defined by production Char below, are tab...".


E68 Substantive Source: XML Core WG list [members only]

Section 3.3.1
Add a VC to production [58] NotationType as follows:

Validity constraint: No Notation on Empty Element
For compatibility, an attribute of type NOTATION may not be declared on an element declared EMPTY.

Rationale
Since a NOTATION attribute is meant to indicate the notation (format) of the contents of an element, it doesn't make sense to have one on an EMPTY element. SGML doesn't allow it, so a VC is necessary to maintain SGML compatibility.

Errata as of 2000-04-19.

E67 Editorial Source: I18N issues with the XML Specification [members only]

Obsoletes E37
Appendix A.1
Add entries for Unicode 3.0 and ISO/IEC 10646-1:2000:
ISO/IEC 10646-1:2000
ISO (International Organization for Standardization). ISO/IEC 10646-1:2000. Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane. [Geneva]: International Organization for Standardization, 2000.
Unicode 3.0
The Unicode Consortium, The Unicode Standard, Version 3.0, Addison-Wesley, 2000. ISBN 0-201-61633-5.
Section 2.2
In the first paragraph, right after the reference to ISO/IEC 10646, add "(see also [ISO/IEC 10646-1:2000])". Right after the reference to Unicode, add "(see also D21 in section 3.6 of [Unicode 3.0])".
Section 4.3.3
Amend the first sentence of the second paragraph to read:

Entities encoded in UTF-16 must begin with the Byte Order Mark described by Annex F of [ISO/IEC 10646-1:1993], Annex H of [ISO/IEC 10646-1:2000], section 2.4 of [Unicode 2.0] and section 2.7 of [Unicode 3.0] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF).

Appendix B
In the first line after the productions and in the seventh item of the bulleted list, change "Unicode" to "Unicode 2.0".

E66 Editorial Source: I18N issues with the XML Specification [members only]

Status of this document
Delete the next to last paragraph, beginning "This specification uses the term URI..."
Appendix A.2
Delete the entries for RFC 1738 and RFC 1808. Add entries for RFC 2396 and RFC 2732:
IETF RFC2396
IETF (Internet Engineering Task Force). RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. 1998.
IETF RFC2732
IETF (Internet Engineering Task Force). RFC 2732: Format for Literal IPv6 Addresses in URL's, R. Hinden, B. Carpenter, L. Masinter. 1999.
Section 4.2.2
Within the second sentence of the paragraph after the "Notation declared" VC, add references to RFCs 2396 and 2732 as follows:

...(as defined in [IETF RFC2396], updated by [IETF RFC2732])...

Rationale:
The XML spec was relying on an Internet-Draft and some obsolete RFCs for its definition of URL.

Errata as of 2000-04-15.

E65 Substantive Source: XML Core WG list [members only]

Section 4.4.2
Remove the following text from the second sentence: ", except that the replacement text of entities used to escape markup delimiters (the entities amp, lt, gt, apos, quot) is always treated as data".
Rationale:
This was a leftover from the time the definitions of the predefined entities were changed to use double quoting, in the late stages of the drafting of the XML 1.0 spec. The only thing magic about &lt; et al. is that they are predefined when not validating.

E64 Clarification Source: XML Core WG list [members only]

Section 2.9
Append to the first paragraph:
or in parameter entities. An external markup declaration is defined as a markup declaration occuring in the external subset or in a parameter entity (external or internal, internal parameter entities being included because non-validating processors are not required to read them).

Amend the first sentence of the paragraph following production [32] to read:

In a standalone document declaration, the value "yes" indicates that there are no external markup declarations which affect the information passed from the XML processor to the application.
Rationale
External markup declarations as defined here are precisely the ones that need not be processed by a non-validating processor and hence the ones material to the standalone declaration.

E63 Clarification Source: XML Core WG list [members only]

Partially obsoleted by E90
Section 2.5
Append to the first paragraph: "Parameter entity references are not recognized within comments".
Section 2.6
Append to the last paragraph: "Parameter entity references are not recognized within processing instructions."
Section 3.4
Append the following to the second sentence of the second paragraph after the table of productions: "; parameter entity references are not recognized within an ignored conditional section."

Errata as of 2000-04-09.

E62 Substantive Source: XML Core WG list [members only]

Obsoleted by E108
Section 2.3
Change productions [6] Names and [8] Nmtokens to use #x20 (a single space character) instead of S:
"[6] Names ::= Name (#x20 Name)*"
"[8] Nmtokens ::= Nmtoken (#x20 Nmtoken)*"
Rationale:
This change is necessary to preserve SGML compatibility. In principle it makes previously valid documents invalid, but it is believed that only contrived documents, not real ones, are affected.

Errata as of 2000-02-17.

E61 Clarification Source: Richard Tobin

Obsoleted by E70
Section 3.3.3
Further clarify E24 by adding a new paragraph after the paragraph following the bulleted list (beginning "If the attribute type is not CDATA, then the XML processor must..."):
Note that if the unnormalized attribute value contains a character reference to a whitespace character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a whitespace character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a whitespace character; being recursively processed, the whitespace character is replaced with a space character (#x20) in the normalized value.
Rationale:
It was not completely clear how an attribute value containing a character reference to a whitespace character other than space is supposed to be normalised.

Errata as of 2000-01-31.

E60 Editorial Source: XML Core WG list [members only]

Obsoleted by E73
Section 2.12
Change the first sentence of the paragraph following the bullet list following production [38] to read:

There may be any number of Subcode segments; if the Langcode is an ISO639Code, and if the first subcode segment exists and consists of two letters, then it must be a country code from [ISO 3166], "Codes for the representation of names of countries."

Errata as of 2000-01-25.

E59 Editorial Source: xml-editor list

Obsoletes E28
Section 3
Change item number 2 of the list of valid cases for the "Element Valid" VC to read:
  1. The declaration matches children and the sequence of child elements belongs to the language generated by the regular expression in the content model, with optional white space (characters matching the nonterminal S) between the start tag and the first child element, between child elements or between the last child element and the end tag. Note that a CDATA section containing only white space does not match the nonterminal S, and hence cannot appear in these positions.

E58 Editorial Source: xml-editor list

Appendix A.1
Rename the existing IANA entry IANA-CHARSETS (but keep the "IANA" link target to avoid breaking external links).
Section 4.3.3
Adjust the [IANA] reference in 4.3.3 accordingly.
Appendix A.2
Add a new entry IANA-LANGCODES as follows:
IANA-LANGCODES
(Internet Assigned Numbers Authority) Registry of language tags. See http://www.isi.edu/in-notes/iana/assignments/languages/.
Section 2.12
Change the [IANA] reference to point to this new entry.

E57 Clarification Source: xml-editor list

Section 4.3.3
Amend the second paragraph after production [81] to read (the first sentence is actually unchanged):

In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. It is recommended that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA], other than those just listed, should be referred to using their registered names; other encodings should use names starting with an "x-" prefix. XML processors should match character encoding names in a case-insensitive way and should either interpret an IANA-registered name as the encoding registered at IANA for that name or treat it as unknown (processors are of course not required to support all IANA-registered encodings).


E56 Clarification Source: xml-editor list

Section 4.3.3
Amend the second sentence of the first paragraph to read "All XML processors must be able to read entities in both the UTF-8 and UTF-16 encodings."

Errata as of 2000-01-17.

E55 Editorial Source: minutes XML-Syntax 1999-02-10 [members only] E10

Obsoletes E12
Section 3.2.1
The term "content model" needs to be marked as a formally defined term.

E54 Editorial Source: minutes XML-Syntax 1999-02-24 [members only] E40

XML version of the spec
There are a couple of entities declared in the internal subset with names starting with 'xml' or 'XML', which is against the statement in 2.3 that such names are reserved for future standardization.

E53 Editorial Source: xml-editor list

Section 4
In the last paragraph before section 4.1, "Parameter entities" should be a defined term (bold).

E52 Editorial Source: xml-editor list

Section 3.2.1
In each of productions [49] and [50], the first instance of cp should be a hyperlink like the second instance.

E51 Editorial Source: xml-editor list

Section 4.4
There are incorrect hyperlinks in the row identified by "Occurs as Attribute Value" of the first table of section 4.4.

In the column headed "Character", "Not recognized" hyperlinks to "#not recognized" instead of "#not-recognized" (missing dash). In the columns headed " Internal General" and "External Parsed General", "Forbidden" hyperlinks to "#not-recognized" instead of "#forbidden".

Errata as of 2000-01-06.

E50 Substantive Source: minutes XML-Syntax 1999-05-26 [members only] E73

Section 3.2.1
Change the grammar for 'choice' in production [49] from:
"choice ::= '(' S? cp ( S? '|' S? cp )* S? ')'"
to:
"choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'"
(which amounts to changing the * into a +).
Rationale:
Eliminate unnecessary ambiguity in the grammar for cp and children, which serves no purpose and confuses some implementors.

E49 Substantive Source: minutes XML-Syntax 1999-05-19 [members only] E67

Obsoleted by E78
Section 4.2.2
Amend the paragraph beginning "An XML processor should handle a non-ASCII character..." to read as follows: "Some URIs may contain characters that are either reserved (see [IETF RFC2396], section 2.2) or non-ASCII. An XML processor should handle such a character in a URI by representing the character in UTF-8 as one or more bytes, and then escaping these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value)."
Rationale:
Original only discussed non-ASCII characters, include case of reserved characters.

E48 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E31 and minutes XML-Syntax 1999-05-19 [members only] E65

Appendix F
Modify the text from the paragraph beginning "The second possible case occurs when the XML entity..." to the end of the appendix to read:

The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. In particular, please refer to [IETF RFC2376] "XML Media Types" which defines the text/xml and application/xml MIME types and provides some useful guidance. In the interests of interoperability, however, the following rule is recommended.

Appendix A.2
Add a non-normative reference:
IETF RFC2376
IETF (Internet Engineering Task Force). RFC 2376: XML Media Types, ed. E. Whitehead, M. Murata. 1998.
Rationale:
Take RFC 2376 into account.

E47 Substantive Source: minutes XML-Syntax 1999-05-19 [members only] E62

Section 4.3.3
Prepend the following to the last sentence of the paragraph immediately preceding production [80]: "In the absence of external character encoding information (such as MIME headers), ".
Rationale:
The original only covered the case where no external information was available.

E46 Clarification Source: minutes XML-Syntax 1999-05-19 [members only] E60:

Section 3.1
Append the following to the first paragraph after production [41]: "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."

E45 Substantive Source: minutes XML-Syntax 1999-05-19 [members only] E58

Section 3.1
Change the second sentence of the paragraph right after production [44] to read: "For interoperability, the empty-element tag should be used, and should only be used, for elements which are declared EMPTY."
Rationale:
"For interoperability" and "must" were combined inappropriately.

E44 Substantive Source: minutes XML-Syntax 1999-05-19 [members only] E56 and E64

Obsoletes E1
Obsoleted by E105
Appendix F
Append the following to the second paragraph: "The notation ## is used to denote any byte value except 00." Adjust the itemized list of detection cases to read as follows:
With a Byte Order Mark:
 00 00 FE FF: UCS-4, big-endian machine (1234 order)
 FF FE 00 00: UCS-4, little-endian machine (4321 order)
 FE FF 00 ##:  UTF-16, big-endian
 FF FE ## 00:  UTF-16, little-endian
 EF BB BF: UTF-8
Without a Byte Order Mark:
 00 00 00 3C: UCS-4, big-endian machine (1234 order)
 3C 00 00 00: UCS-4, little-endian machine (4321 order)
 00 00 3C 00: UCS-4, unusual octet order (2143)
 00 3C 00 00: UCS-4, unusual octet order (3412)
 00 3C ## ##, 
 00 25 ## ##,
 00 20 ## ##,
 00 09 ## ##,
 00 0D ## ## or
 00 0A ## ##: Big-endian UTF-16 or ISO-10646-UCS-2. Note that, absent
              an encoding declaration, these cases are strictly
              speaking in error.
 3C 00 ## ##,
 25 00 ## ##,
 20 00 ## ##,
 09 00 ## ##,
 0D 00 ## ## or
 0A 00 ## ##: Little-endian UTF-16 or ISO-10646-UCS-2. Note that, absent
              an encoding declaration, these cases are strictly
              speaking in error.
 3C 3F 78 6D: UTF-8, ISO 646, ASCII, some part of ISO 8859, Shift-JIS,
              EUC, or any other 7-bit, 8-bit, or mixed-width encoding
              which ensures that the characters of ASCII have their
              normal positions, width, and values; the actual encoding
              declaration must be read to detect which of these
              applies, but since all of these encodings use the same
              bit patterns for the ASCII characters, the encoding
              declaration itself may be read reliably  
 4C 6F A7 94: EBCDIC (in some flavor; the full encoding declaration
              must be read to tell which code page is in use)
 other: UTF-8 without an encoding declaration, or else the data stream
        is corrupt, fragmentary, or enclosed in a wrapper of some kind

Add the following to the second paragraph after the list (this also takes care of the previous erratum on UTF-7): "Note: Since external parsed entities in UTF-16 may begin with any character, this autodetection does not always work. Also, because of the overloaded usage it makes of ASCII-valued bytes, the UTF-7 encoding may fail to be reliably detected."

Rationale:
Original version did not distinguish UCS-2, cases without Byte Order mark, UTF-8 with BOM, etc.

E43 Editorial and Clarification Source: minutes XML-Syntax 1999-05-12 [members only] and minutes XML-Syntax 1999-05-19 [members only] E55

Appendix C
Change "conformant SGML document" to "conforming SGML documents".
Delete the word "valid" from the first sentence, since even well-formed but not valid XML documents are also conforming SGML documents.
Appendix A
Add reference to WebSGML amendment (Annex K of ISO 8879)

E42 Clarification Source: minutes XML-Syntax 1999-05-12 [members only] E52

Section 6
Change the first sentence of the second paragraph (after the "symbol ::= expression" example) to read: "Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lower case letter."

E41 Substantive Source: minutes XML-Syntax 1999-05-12 [members only] E51

Obsoleted by E90
Section 4.4
Change the definition corresponding to "Reference in DTD" to read: "as a reference within either the internal or external subsets of the DTD, but outside of an EntityValue, AttValue, PI, Comment, SystemLiteral or PubidLiteral." (with suitable links).
Rationale:
"PI, Comment, SystemLiteral or PubidLiteral" added to maintain compatibility with SGML.

E40 Substantive Source: minutes XML-Syntax 1999-05-12 [members only] E50

Obsoleted by E97
Section 3.1
In the first sentence of the paragraph immediately following production 43, change "must" to "should".
Rationale:
For an element containing only white space, "must" is unenforceable by a processor that doesn't know the content model of the element.

E39 Clarification Source: minutes XML-Syntax 1999-05-12 [members only] E49

Partially obsoleted by E89
Section 2.4
Add the following to the second paragraph: "Note that text that matches the nonterminal S (production [3]) is markup, not character data".
Section 2.10
In the first sentence of the first paragraph, remove the phrase ", denoted by the nonterminal S in this specification" from within the parentheses.
Rationale:
Clarify the distinction between white space corresponding to production [3] and other white space.

E38 Substantive Source: minutes XML-Syntax 1999-05-12 [members only] E48

Partially obsoleted by E73
Section 2.12
Add a paragraph immediately after production [38]: "The following is a non-normative summary of the definition of language codes in RFC 1766."
Appendix A
Move the references to ISO 639 and ISO 3166 from A.1 (normative) to A.2 (other).
Rationale:
Makes clear the original intent of having RFC 1766 normative and the rest (prose, ISO 639, ISO 3166) informative.

E37 Editorial Source: minutes XML-Syntax 1999-05-12 [members only] E47

Obsoleted by E67
Section 4.3.3
In the first sentence of the second paragraph, correct the reference to 10646 to "ISO/IEC 10646 Annex F" (instead of Annex E) and the reference to Unicode to "Unicode Section 2.4" (instead of Appendix B).

E36 Clarification Source: minutes XML-Syntax 1999-05-12 [members only] E46

Obsoletes part of E5
Section 4.3.3
Correct E5 to read "It is a fatal error for a TextDecl to occur other than at the beginning of an external entity." (It is a fatal error, not merely an error).

E35 Editorial Source: minutes XML-Syntax 1999-05-12 [members only] E72

Section 2.2
In the first paragraph, remove the word "graphic" from the third sentence (beginning with "Legal characters are tab, carriage return...").

E34 Substantive Source: minutes XML-Syntax 1999-05-12 [members only] E42a

Section 4.1
In the first sentence of the definition of the "Entity Declared" WFC, change the phrase "the Name given in the entity reference must match that in an entity declaration" to "for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference must match that in an entity declaration that does not occur within the external subset or a parameter entity".
Rationale:
Suppose a standalone document containing an entity reference, the entity being declared in the external subset. Without this change, a processor that doesn't read the external subset would find a violation of the WFC, whereas a processor that does read it wouldn't. The change ensures that all processors are able to determine whether the WFC is met for standalone documents.

E33 Substantive Source: minutes XML-Syntax 1999-02-24 [members only] E45

Section 5.1
Amend the last sentence of the last paragraph to read: "Except when "standalone='yes'", they must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations."
Rationale:
Without the addition of 'Except when "standalone='yes'"', there is no guarantee that making a document standalone will cause all XML processors to reports the same results to the application.

E32 Editorial Source: minutes XML-Syntax 1999-02-24 [members only] E44

Withdrawn
Section 2.8
In the second sentence of the paragraph after production [27], change "document type definition" to "document type declaration".

E31 Substantive Source: minutes XML-Syntax 1999-02-24 [members only] E43

Obsoleted by E73
Section 3.1
Add a validity constraint to production [41] as follows: "Validity Constraint: Valid xml:lang: if the Name in an attribute specification is xml:lang, then the value, after normalization as an NMTOKEN, must match production [33]".
Rationale:
Despite a very clear intention, expressed by a full page of prose, there was nothing in the spec to enforce the validity of xml:lang.

E30 Editorial Source: minutes XML-Syntax 1999-02-24 [members only] E41

Section 2.3
Reword the second sentence of the paragraph after production [3] as follows: "A letter consists of an alphabetic or syllabic base character or an ideographic character."
Appendix B
Remove "; these classes combine to form the class of letters" from the first sentence.
Rationale:
The text was in contradiction with production [84].

E29 Substantive Source: minutes XML-Syntax 1999-02-24 [members only] E38

Section 4.1
From the definition of the "Entity Declared" WFC, remove the sentence "The declaration of a parameter entity must precede any reference to it." Remove the word "Similarly" in the next sentence for editorial cleanliness.
Rationale:
This WFC does not apply to production [69] PEReference, so the offending sentence is non sequitur. The sentence is present in the text of the Entity Declared VC, which does apply to [69].

E28 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E34

Obsoleted by E59
Section 3
To item number 2 of the list of valid cases for the "Element Valid" VC, add the following: "Note that a CDATA section containing only white space does not match the nonterminal S, and hence cannot appear between pairs of child elements."

E27 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E33

Section 2.5
After the example, add a paragraph reading "Note that the grammar does not allow a comment ending in '--->'. The following example is not well-formed." and an example: "<!-- B+, B, or B--->"

E26 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E32

Obsoleted by E76
Section 4.2.2
Modify the second sentence of the paragraph immediately following the "Notation Declared" VC to read as follows: "It is a URI, meant to be dereferenced to obtain input for the XML processor to construct the entity's replacement text."
Rationale:
It wasn't clear to some that the URI should be dereferenced and the resulting byte stream treated as input to the XML processor to construct the entity's replacement text.

E25 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E30

Section 4
Amend the first sentence of the third paragraph to read: "An unparsed entity is a resource whose contents may or may not be text, and if text, may be other than XML."
Rationale:
The original text could be interpreted as saying that unparsed entities which are text can't be XML, which is wrong.

E24 Clarification Source: minutes XML-Syntax 1999-02-24 [members only] E28 and E37

Obsoleted by E70
Section 3.3.3
Replace the first paragraph, the itemized list of steps and the following paragraph with the following:

Before the value of an attribute is passed to the application or checked for validity, but after the end-of-line normalization described in section 2.11 has been performed, the XML processor must normalize the attribute value as follows:

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Rationale:
The fact that the existing text describes an algorithm for filling in an initially empty string with the normalized value was widely misunderstood. There was also confusion regarding white space treatment.

E23 Editorial Source: minutes XML-Syntax 1999-02-24 [members only] E26

Section 4.3
Amend the last paragraph to read: "Examples of text declarations containing encoding declarations:"

E22 Substantive Source: minutes XML-Syntax 1999-02-24 [members only] E25

Section 4.7
Add a Validity Constraint to production [82] as follows:

"Validity Constraint: Unique Notation Name: only one notation declaration can declare a given Name."

Rationale:
The spec as written allows multiple declarations of NOTATIONs with the same name, which is wrong.

E21 Substantive Source: minutes XML-Syntax 1999-02-24 [members only] E23

Section 5.1
Change the third paragraph to read: "Validating processors must, at user option, report violations of the constraints expressed by...".
Rationale:
"at user option" was missing, in contradiction with 1.2

E20 Clarification Source: minutes XML-Syntax 1999-02-17 [members only] E20

Section 6
To the item about A B add the sentence: "Concatenation has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D)." To the items about A+ and A* add analogous sentences.

E19 Clarification Source: minutes XML-Syntax 1999-02-17 [members only] E19

Section 3.2.1
Change the sentence

"For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should not be empty, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,)."

to

"For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,)."

Rationale:
Per 4.4.8, parameter entities are always padded with one space at each end, so the replacement text is never empty. Interoperability thus requires that there be at least one non-blank character.

E18 Clarification Source: minutes XML-Syntax 1999-02-17 [members only] E18

Section 2.4
Delete the second sentence of the third paragraph, which reads: "They are also legal within the literal entity value of an internal entity declaration; see "4.3.2 Well-Formed Parsed Entities". "
Rationale:
This sentence is bogus. When & or < are in a literal entity value they are being used as a markup delimiter, thus the whole second sentence is just confusing static.

Errata as of 1999-02-17.

E17

Section 2.1
In the list headed "Matching the document production implies that:", the second list item has forward references to "start-tag" and "end-tag" which need to be marked as defined terms.

E16

Obsoleted by E69
Section 2.2
The sentence "Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646" is somewhat in conflict with production [2] for Char, whose ranges include many character positions that are not yet defined by the Unicode/ISO 10646 standards. Change the text to make it clear that production [2] is normative; in practical terms this means that newly-added characters such as the Euro (&#x20ac; or &#8364;) are legal in XML documents.

E15

Section 2.8
In production [24], the quotation-mark literals aren't quoted. Should be ("'" VersionNum "'" | '"' VersionNum '"')

E14

Section 2.8
Just before production [28], the word "fuller" should be "further".

E13

Obsoleted by E109
Section 2.8
To the paragraph before production [30], add "The external subset and any external parameter entities referred to in the DTD must match the production for extPE. See 4.3.2 Well-Formed Parsed Entities".

E12

Obsoleted by E55
Section 3.2.1
The term "element content" needs to be marked as a formally defined term.

E11

Section 3.2.1
Change the word "parenthetized" to "parenthesized."

E10

Section 3.2.2
The name of the literal token PCDATA has no justification. Add to the paragraph after production [51]: 'The keyword PCDATA derives historically from the term "parsed character data."'

E9

Section 3.3
The text "For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name, and at least one attribute definition in each attribute-list declaration." could be read as forbidding more than one attribute of the same name in a DTD. Change it to read "For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name in an attribute-list declaration, and at least one attribute definition in each attribute-list declaration."

E8

Section 3.3.1
Immediately before production [54], delete ', as noted:' and read '. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3 Attribute-List Declarations'.

E7

Section 3.3.1
The spec as written allows multiple attributes of type NOTATION on a single element, which defeats the purpose; add a Validity Contraint to production [58] as follows: "Validity Constraint: One Notation per Element Type: No element type may have more than one NOTATION attribute specified."

E6

Section 4.
In the first sentence, the phrase "identified by name" should be replaced by "identified by entity name". Also, the phrase "see below" should be removed, and phrase "document entity" made into a term-definition link.

E5

Partially obsoleted by E36
Section 4.3.3
In the paragraph beginning "In the absence of information", delete the phrase "for an encoding declaration to occur other than at the beginning of an external entity". Add a new paragraph after that one reading "It is an error for a TextDecl to occur other than at the beginning of an external entity."

E4

Section 4.4.5
The first example has "&YN;" - it should be "%YN;".

E3

Section 6.
The notation used in Productions [13] ([-'()+,./:=?;!*#@$_%]) and [26] ([a-zA-Z0-9_.:]) is not described in the notation section, although the semantics are obvious. Need to add descriptions to the first definition-list in the Notation section.

E2

Appendix A.2
The citations for the papers by Anne Brüggemann-Klein need improving: 'A. Brüggemann-Klein und D. Wood. Deterministic Regular Languages. Extended abstract in A. Finkel, M. Jantzen, Hrsg., STACS 1992, S. 173-184. Springer-Verlag, Berlin 1992. Lecture Notes in Computer Science 577. Full version titled "One-Unambiguous Regular Languages" in Information and Computation 140 (2): 229--253, February 1998.' and (to replace the Regular Expressions into Finite Automata) 'A. Brüggemann-Klein. Formal Models in Document Processing. Habilitationsschrift. Faculty of Mathematics at the University of Freiburg, 1993, available at ftp://ftp.informatik.uni-freiburg.de/documents/papers/brueggem/habil.ps.'

E1

Obsoleted by E44
Appendix F
Add a note that the algorithm given here does not work for UTF-7.

Last updated $Date: 2000/10/07 00:19:17 $ by $Author: fyergeau $
xml-editor