Processing XML 1.1 documents with XML Schema 1.0 processors

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Working Group Note prepared by the W3C XML Schema Working Group, as part of the W3C XML Activity, and published on 11 May 2005. It describes methods of supporting XML 1.1 documents with schema processors designed to support XML Schema 1.0.

XML Schema 1.0 parts 1 and 2 refer normatively to XML 1.0 and makes no explicit provision for support of later versions of the XML specification; this lack is sometimes advanced as a reason for W3C specifications which depend on XML Schema not to support XML 1.1. But there are strong reasons to encourage the wide adoption of XML 1.1, which is more successfully internationalized than XML 1.0. At the time this Note is published, the question of how best to support XML 1.1 in XML Schema is still open.

This Note offers strategies for supporting XML 1.1, based on the implementation experience of some members of the XML Schema Working Group. It is hoped that the techniques described here will be helpful to other implementors and to users. Equally, the Working Group hopes that this Note will elicit discussion in the larger XML community concerning the best way for the XML Schema Working Group to balance the competing demands of flexibility in references to other specifications, stability, and interoperability. This Note is published with the full consensus of the XML Schema Working Group.

Comments on this document and the issues it raises are welcome; please send comments on this document to www-xml-schema-comments@w3.org (archive).

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This document may be updated, replaced or obsoleted by other documents at any time. The XML Schema Working Group does not currently expect to produce further versions or revisions of this document, but experience with the subject matter of this Note may lead to changes in the normative text of future versions of the XML Schema specification.

1 Introduction

As published the XML Schema specification references XML 1.0and XML Namespaces 1.0 explicitly, and incorporates by reference certain key definitions, in particular those of the Char, Name, QName and S character classes. The contents of these classes has changed in XML 1.1and XML Namespaces 1.1, so although nothing in the existing XML Schema specification specifically bars the processing of infosets produced by XML 1.1 conformant parsers, such infosets, if they exploit any of the relevant changes in XML 1.1, will not be accepted as valid by conformant XML Schema 1.0 processors.

The XML Schema WG has judged that any changes to the existing specification to support XML 1.1 go beyond what could be considered as errata, and so will have to wait for a new version of the specification. As this may take some time, this Note addresses the question of what should be done in the interim to best serve the XML community.

In the sections which follow, a non-normative strategy is set out suggesting a number of changes which processors implementing the XML Schema specification can make to enable sensible and interoperable support for XML 1.1. Any implementation of XML Schema employing such a strategy is strictly speaking non-conformant to the current version of the XML Schema specification. The XML Schema WG none-the-less believes that interoperability will best be served by the availability of such non-conformant processors until such time as a subsequent version of XML Schema addressing this issue normatively is approved.

2 Survey of XML 1.1 challenges for XML Schema 1.0

Consider the following four cases:

C1 vs. C0 in content, e.g. #x83 vs. #x03
Old vs. new name chars in element names, e.g. y (25th letter in English alphabet) vs. ĳ (25th letter in Dutch alphabet)
Old vs. new name chars in ID-typed content, e.g. y vs. ĳ
LF vs NEL in length-specified list-typed content

(ĳ == U+0133 (#x133) is common in Dutch, e.g. in the word ĳs == English ice-cream. It's a good example of something arbitrarily and irritatingly not allowed as a name character in XML 1.0 which is allowed as a name character in 1.1).

In each of the above cases, the first alternative is OK and has the same behaviour with respect to Schema validation in both XML 1.0 and XML 1.1, whereas the second alternative either is not Schema-valid under the strict XML 1.0 interpretation (1-3) or might be expected to have different behaviour between XML 1.0 and XML 1.1 (4).

In other words, if you used a conformant XML Schema validator on the following four instances (Figure 1), using the same schema document (Figure 2) each time, all four would have validity problems.

<?xml version='1.0'?>
<root>There's an &amp;#3; here: &#3;</root>

<?xml version='1.0'?>
<ĳs/>

<?xml version='1.0'?>
<root id="ĳ"/>

<?xml version='1.0'?>
<!-- There's a NEL character (U+0085) between the 'a' and the 'b' below -->
<root list="a…b"/>

Note:

<?xml version='1.0'?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="root">
  <xs:annotation>
   <xs:documentation>String content, id attr of type ID,
                     list attr of type [list of token], length 2
   </xs:documentation>
  </xs:annotation>

  <xs:complexType>
   <xs:simpleContent>
    <xs:extension base="xs:string">

     <xs:attribute name="id" type="xs:ID"/>
     
     <xs:attribute name="list">
      <xs:simpleType>
       <xs:restriction>
        <xs:simpleType>
         <xs:list itemType="xs:token"/>
        </xs:simpleType>
        <xs:length value="2"/>
       </xs:restriction>
      </xs:simpleType>
     </xs:attribute>

    </xs:extension>
   </xs:simpleContent>
  </xs:complexType>
 </xs:element>
 
 <xs:element name="ĳs"/>
 
</xs:schema>

Schema for use with XML documents in Figure 1