XQuery and XPath Data Model 3.1

1 Introduction

This document defines the XQuery and XPath Data Model 3.1, which is the data model of [XML Path Language (XPath) 3.1], [XSL Transformations (XSLT) Version 3.0], and [XQuery 3.1: An XML Query Language] .

The XQuery and XPath Data Model 3.1 (henceforth "data model") serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages. A language is closed with respect to a data model if the value of every expression in the language is guaranteed to be in the data model. XSLT 3.0, XQuery 3.1, and XPath 3.1 are all closed with respect to the data model.

The data model describes items similar to those of the [Infoset] (henceforth "Infoset") . It is written to provide a data model suitable for XPath, XQuery and XSLT, which was not a goal of the Infoset, and this leads to a number of differences, some of which are:

Support for XML Schema types. The XML Schema recommendations define features, such as structures ([Schema Part 1]) and simple data types ([Schema Part 2]), that extend the Infoset with precise type information.
Representation of collections of documents and of complex values. ([XQuery 3.1 Requirements])
Support for typed atomic values.
Support for ordered, heterogeneous sequences.

As with the Infoset, the XQuery and XPath Data Model 3.1 specifies what information in the documents is accessible but does not specify the programming-language interfaces or bindings used to represent or access the data.

The data model can represent various values including not only the input and the output of a stylesheet or query, but all values of expressions used during the intermediate calculations. Examples include the input document or document repository (represented as a Document Node or a sequence of Document Nodes), the result of a path expression (represented as a sequence of nodes), the result of an arithmetic or a logical expression (represented as an atomic value), a sequence expression resulting in a sequence of items, etc.

This document provides a precise definition of the properties of nodes in the XQuery and XPath Data Model 3.1, how they are accessed, and how they relate to values in the Infoset and PSVI.

2 Concepts

This section outlines a number of general concepts that apply throughout this specification.

In this document, examples and material labeled as "Note" are provided for explanatory purposes and are not normative.

2.1 Terminology

For a full glossary of terms, see D Glossary.

In this specification the words must, must not, should, should not, may and recommended are to be interpreted as described in [RFC 2119].

This specification distinguishes between the data model as a general concept and specific items (documents, elements, atomic values, etc.) that are concrete examples of the data model by identifying all concrete examples as instances of the data model.

[Definition: Every instance of the data model is a sequence.]

[Definition: A sequence is an ordered collection of zero or more items.] A sequence cannot be a member of a sequence. A single item appearing on its own is modeled as a sequence containing one item. Sequences are defined in 2.5 Sequences.

[Definition: An item is either a node, a function, or an atomic value.]

Sometimes it is necessary to distinguish the case where a particular property has no value in the data model. The canonical example of such a case is the namespace URI property of an expanded QName that is not in any namespace. For such properties, it is convenient to be able to speak of "the state of having no value". [Definition: When a property has no value, we say that it is absent.]

Every node is one of the seven kinds of nodes defined in Section 6 Nodes. Nodes form a tree. Each node has at most one parent (reachable via the dm:parent accessor) and descendant nodes that are reachable directly or indirectly via the dm:children, dm:attributes, and dm:namespace-nodes accessors.

[Definition: The root node is the topmost node of a tree, the node with no parent.] Every tree has exactly one root node and every other node can be reached from exactly one root node.

Note:

The term “root node” is merely a designator, based on position, for one of the nodes in the tree without implying what kind of a node it is. In the XPath 1.0 datamodel the root node was a kind of node.

[Definition: A tree whose root node is a Document Node is referred to as a document.]

[Definition: A tree whose root node is not a Document Node is referred to as a fragment.]

[Definition: An atomic value is a value in the value space of an atomic type and is labeled with the name of that atomic type.]

[Definition: An atomic type is a primitive simple type or a type derived by restriction from another atomic type.] (Types derived by list or union are not atomic.)

[Definition: The primitive simple types are the types defined in 2.1.1 Types adopted from XML Schema.]

A type is represented in the data model by an expanded-QName.

[Definition: An expanded-QName is a set of three values consisting of a possibly empty prefix, a possibly empty namespace URI, and a local name.] See 3.3.3 QNames and NOTATIONS.

[Definition: Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]

[Definition: Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]

Within this specification, the term URI refers to a Uniform Resource Identifier as defined in [RFC 3986] and extended in [RFC 3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications.

In all cases where this specification leaves the behavior implementation-defined or implementation-dependent, the implementation has the option of providing mechanisms that allow the user to influence the behavior.

2.1.1 Types adopted from XML Schema

The data model adopts the following types:

The 19 types defined in Section 3.2 Primitive datatypes^XS2 of [Schema Part 2].
Three built-in list types: xs:NMTOKENS, xs:IDREFS, and xs:ENTITIES.

The following types which were originally defined in [XQuery 1.0 and XPath 2.0 Data Model (XDM)] and were subsequently adopted by [Schema 1.1 Part 2]: xs:anyAtomicType, xs:dayTimeDuration, xs:yearMonthDuration.
In the case of a processor that supports [Schema 1.1 Part 2], the data model also includes: the new union type xs:error (a type with no instances) and the new derived type xs:dateTimeStamp.
The following types, although they use the xs: namespace, are defined here in the data model and not in XML Schema: xs:untypedAtomic , and xs:numeric, a union type whose members are xs:double, xs:float and xs:decimal.

2.2 Notation

In addition to prose, this specification defines a set of accessor functions to explain the data model. The accessors are shown with the prefix dm:. This prefix is always shown in italics to emphasize that these functions are abstract; they exist to explain the interface between the data model and specifications that rely on the data model: they are not accessible directly from the host language.

Several prefixes are used throughout this document for notational convenience. The following bindings are assumed.

xs: bound to http://www.w3.org/2001/XMLSchema
xsi: bound to http://www.w3.org/2001/XMLSchema-instance
fn: bound to http://www.w3.org/2005/xpath-functions

In practice, any prefix that is bound to the appropriate URI may be used.

The signature of accessor functions is shown using the same style as [XQuery and XPath Functions and Operators 3.1], described in Section 1.4 Function signatures and descriptions ^FO31.

This document relies on the [Infoset] and Post-Schema-Validation Infoset (PSVI). Information items and properties are indicated by the styles information item and [infoset property], respectively.

Some aspects of type assignment rely on the ability to access properties of the schema components. Such properties are indicated by the style {component property}. Note that this does not mean a lightweight schema processor cannot be used, it only means that the application must have some mechanism to access the necessary properties.

2.3 Node Identity

Each node has a unique identity. The identity of a node is distinct from its value or other visible properties; nodes may be distinct even when they have the same values for all intrinsic properties other than their identity. (The identity of atomic values, by contrast, is determined solely by their intrinsic properties. No two distinct integers, for example, have the same value; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)

Note:

The concept of node identity should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

2.4 Document Order

[Definition: A document order is defined among all the nodes accessible during a given query or transformation. Document order is a total ordering, although the relative order of some nodes is implementation-dependent. Informally, document order is the order in which nodes appear in the XML serialization of a document.] [Definition: Document order is stable, which means that the relative order of two nodes will not change during the processing of a given query or transformation, even if this order is implementation-dependent.]

Within a tree, document order satisfies the following constraints:

The root node is the first node.
Every node occurs before all of its children and descendants.
Namespace Nodes immediately follow the Element Node with which they are associated. The relative order of Namespace Nodes is stable but implementation-dependent.
Attribute Nodes immediately follow the Namespace Nodes of the element with which they are associated. If there are no Namespace Nodes associated with a given element, then the Attribute Nodes associated with that element immediately follow the element. The relative order of Attribute Nodes is stable but implementation-dependent.
The relative order of siblings is the order in which they occur in the children property of their parent node.
Children and descendants occur before following siblings.

The relative order of nodes in distinct trees is stable but implementation-dependent, subject to the following constraint: If any node in a given tree, T1, occurs before any node in a different tree, T2, then all nodes in T1 are before all nodes in T2.

2.5 Sequences

An important characteristic of the data model is that there is no distinction between an item (a node, function, or atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa.

A sequence may contain any mixture of nodes, functions, and atomic values. When a node is added to a sequence its identity remains the same. Consequently a node may occur in more than one sequence and a sequence may contain duplicate items.

Sequences never contain other sequences; if sequences are combined, the result is always a “flattened” sequence. In other words, appending “(d e)” to “(a b c)” produces a sequence of length 5: “(a b c d e)”. It does not produce a sequence of length 4: “(a b c (d e))”, such a nested sequence never occurs.

Note:

Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates. In generalizing node-sets to sequences in XPath 3.0 and XPath 3.1, duplicate removal is provided by functions on node sequences.

Note:

Arrays and maps are derived from functions and therefore can also be contained within sequences.

2.6 Namespace Names

The specifications [Namespaces in XML] and [Namespaces in XML 1.1] introduce the concept of a namespace name. In [Namespaces in XML] a namespace name is required to be a URI; in [Namespaces in XML 1.1] it is required to be an IRI; but both specifications explicitly do not require a processor to check that namespace names appearing in an instance document are in fact valid URIs or IRIs.

[Definition: This specification uses the term Namespace URI to refer to a namespace name, whether or not it is a valid URI or IRI]. Following the lead of [Namespaces in XML] and [Namespaces in XML 1.1], processors implementing this data model may raise an error if a namespace name is not a valid URI or IRI (depending on whether they support [Namespaces in XML] or [Namespaces in XML 1.1]), but they are not required to make any checks. Note that the use of a relative reference as a namespace name is deprecated and is defined to be meaningless, but it is not an error. Namespace names, whatever form they take, are treated as character strings and compared for equality using codepoint-by-codepoint comparison, subject only to whitespace normalization if they appear in a context (for example, within an attribute value) where this is appropriate.

In some interfaces, namespace names are held as values of type xs:anyURI. For example, the namespace part of an expanded QName is defined to be a value of type xs:anyURI. In [Schema Part 2], the type xs:anyURI imposes some restrictions on the value space, but there is latitude for implementors to decide exactly what these restrictions are. In [Schema 1.1 Part 2] there are no restrictions on the form of an xs:anyURI value, so any sequence of characters is acceptable within the value space. In this and related specifications, the use of the type xs:anyURI to hold a namespace name does not imply any restrictions on the value space beyond those described in this section: implementations may reject character strings that are not valid URIs or IRIs, but they are not required to do so.

2.7 Schema Information

The data model supports strongly typed languages such as [XML Path Language (XPath) 3.1] and [XQuery 3.1: An XML Query Language] that have a type system based on [Schema Part 1]. To achieve this, the data model includes (by reference) the Schema Component Model described in [Schema Part 1].

Note:

The Schema Component Model includes a number of kinds of component, such as type definitions and element and attribute declarations, and defines the properties and relationships of these components. Many of these components and properties are not used by the language specifications that rely on XDM, and where this is the case, there is no requirement for implementations to make them visible. However, this specification makes no attempt to define the minimal subset of the schema component model that is needed to support the semantics of XPath and XQuery processing.

There are two main areas where the language semantics depend on information in schema components:

Expressions are evaluated with respect to a static context, which includes schema components, specifically type definitions, element declarations, and attribute declarations. The names of such components may be used in language constructs only if the components are present in the static context.
Values including element and attribute nodes, and atomic values, have a property called a type annotation whose value is a type: this is a reference to a type definition in the Schema Component Model.

Every item in the data model has both a value and a type. In addition to nodes, the data model can represent atomic values like the number 5 or the string “Hello World.” For each of these atomic values, the data model contains both the value of the item (such as 5 or “Hello World”) and its type. The property that holds the type is sometimes referred to as the type annotation: its value is a type definition component as defined in the Schema Component Model. This may be a built-in type (a type with a name such as xs:integer or xs:string), or a user-defined type.

There is a constraint that the total set of components used during expression processing (both statically and dynamically) must constitute a valid schema. This implies, for example, that this total set does not include two different types with the same expanded name.

Note:

This makes it the responsibility of the processor to ensure that the schema components used in the static context of a query or expression during static analysis are consistent with the schema components used to validate documents during query or expression evaluation. This specification does not say how this should be achieved.

It is also a constraint that the schema available to the processor must contain at least the components and properties needed to correctly implement the semantics of the XPath and XQuery language. For example, this means that given a node with a particular type annotation T, and a function that expects an argument of type S, there must be sufficient information available to the processor to establish whether or not T is derived from S. As with other consistency constraints described in this data model, it is a precondition that these constraints are satisfied; the specifications do not speculate on what happens if they are not.

2.7.1 Representation of Types

The data model uses expanded-QNames to represent the names of schema types, which include the built-in types defined by [Schema Part 2], five additional types defined by this specification, and may include other user- or implementation-defined types.

For XML Schema types, the namespace name of the expanded-QName is the {target namespace} property of the type definition, and its local name is the {name} property of the type definition.

The data model relies on the fact that an expanded-QName uniquely identifies every named type. Although it is possible for different schemas to define different types with the same expanded-QName, at most one of them can be used in any given validation episode. The data model cannot support environments where different types with the same expanded-QName are available.

The scope over which the names of anonymous types must be meaningful and distinct is depends on the processing context. It is the responsibility of the host language to define the size and scope of the processing context.

2.7.2 Predefined Types

In addition to the 19 types defined in Section 3.2 Primitive datatypes^XS2 of [Schema Part 2], the data model defines five additional types: xs:anyAtomicType, xs:untyped, xs:untypedAtomic, xs:dayTimeDuration, and xs:yearMonthDuration. These types are defined in the XML Schema namespace with permission of the XML Schema Working Group; in implementations that support [Schema 1.1 Part 2], the XSD 1.1 definitions of xs:anyAtomicType, xs:dayTimeDuration, and xs:yearMonthDuration supersede the definitions in this specification.

xs:untyped

The datatype xs:untyped denotes the dynamic type of an element node that has not been validated, or has been validated in skip mode. The properties of xs:untyped are the same as the properties of xs:anyType except for the base type and name. The base type of xs:untyped is xs:anyType. No predefined types are derived from xs:untyped and no such derivations are allowed.

xs:untypedAtomic

The datatype xs:untypedAtomic denotes untyped atomic data, such as text that has not been assigned a more specific type. An attribute that has been validated in skip mode is represented in the Data Model by an attribute node with the type xs:untypedAtomic. No predefined types are derived from xs:untypedAtomic and no such derivations are allowed.

xs:anyAtomicType

The datatype xs:anyAtomicType is an atomic type that includes all atomic values (and no values that are not atomic). Its base type is xs:anySimpleType from which all simple types, including atomic, list, and union types are derived. All primitive atomic types, such as xs:decimal and xs:string, have xs:anyAtomicType as their base type.

No type may be derived from xs:anyAtomicType by restriction, union, or list.

xs:dayTimeDuration

The type xs:dayTimeDuration is derived from xs:duration by restricting its lexical representation to contain only the days, hours, minutes and seconds components. The value space of xs:dayTimeDuration is the set of fractional second values. The components of xs:dayTimeDuration correspond to the day, hour, minute and second components defined in Section 5.5.3.2 of [ISO 8601]. An xs:dayTimeDuration is derived from xs:duration as follows:

<xs:simpleType name='dayTimeDuration'>
  <xs:restriction base='xs:duration'>
    <xs:pattern value="[^YM]*[DT].*"/>
  </xs:restriction>
</xs:simpleType>

xs:yearMonthDuration

The type xs:yearMonthDuration is derived from xs:duration by restricting its lexical representation to contain only the year and month components. The value space of xs:yearMonthDuration is the set of xs:integer month values. The year and month components of xs:yearMonthDuration correspond to the Gregorian year and month components defined in section 5.5.3.2 of [ISO 8601], respectively.

The type xs:yearMonthDuration is derived from xs:duration as follows:

<xs:simpleType name='yearMonthDuration'>
  <xs:restriction base='xs:duration'>
    <xs:pattern value="[^DT]*"/>
  </xs:restriction>
</xs:simpleType>

A schema for xs:dayTimeDuration and xs:yearMonthDuration is provided in C Schema for the Extended XS Namespace.

2.7.3 XML and XSD Versions

Some of the types defined in XML Schema have differing definitions in XSD 1.0 and XSD 1.1; furthermore, some types are defined by reference to other specifications including XML and XML Namespaces, and these too may vary from one version of the specification to the next.

As a general policy, implementations of data types should support the latest definitive version of any referenced specification, even if that is published after the date of this specification.

This means, for example, that the xs:string data type should (at the time of writing) support the set of characters defined by the Char production in XML 1.1 Second Edition. Similarly, the xs:anyURI data type should support the definition used in XSD 1.1 (which allows any sequence of characters), and the xs:NCName data type should support the definition based on the syntax of a name as defined in both XML 1.1 Second Edition and XML 1.0 Fifth Edition (which are the same).

In practice interoperability problems can arise both because specifications are not always in synchronization with each other (for example, XSD 1.0 contains referenced to dated versions of XML 1.0 other than the latest version), and also because implementations may use third-party components (such as XML parsers, serializers, and schema validators) that were built against different versions of the base specifications. For these reasons, use of the latest version of referenced specifications is generally recommended but not required. It is implementation-dependent how a processor handles any such conflicts.

[Definition: A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string data type. ]

[Definition: A character is an instance of the Char production in [XML] . ]

[Definition: A codepoint is a non-negative integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.]

2.7.4 Type System

The diagrams below show how nodes, functions, primitive simple types, and user defined types fit together into a type system. This type system comprises two distinct subsystems that both include the primitive simple types. In the diagrams, connecting lines represent relationships between derived types and the types from which they are derived; the arrowheads point toward the type from which they are derived. The dashed line represents relationships not present in this diagram, but that appear in one of the other diagrams. Dotted lines represent additional relationships that follow an evident pattern. The information that appears in each diagram is recapitulated in tabular form.

The xs:IDREFS, xs:NMTOKENS, xs:ENTITIES types, and xs:numeric and both the user-defined list types and user-defined union types are special types in that these types are lists or unions rather than types derived by extension or restriction.

The first diagram and its corresponding table illustrate the relationship of various item types. Item types in the data model form a directed graph, rather than a hierarchy or lattice: in the relationship defined by the derived-from(A, B) function, some types are derived from more than one other type. Examples include functions (function(xs:string) as xs:int is substitutable for function(xs:NCName) as xs:int and also for function(xs:string) as xs:decimal), and union types (A is substitutable for union(A, B) and also for union(A, C). In XDM, item types include node types, function types, and built-in atomic types. The diagram, which shows only hierarchic relationships, is therefore a simplification of the full model.

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

item
	xs:anyAtomicType
	node
		attribute
			user-defined attribute types
		comment
		document
			user-defined document types
		element
			user-defined element types
		namespace
		processing-instruction
		text
	function(*)
		array(*)
		map(*)

The next diagram and table illustrate the "any type" type subsystem, in which all types are derived from distinguished type xs:anyType.

Type hierarchy graphic, anyType hierarchy

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

xs:anyType
	xs:anySimpleType
		xs:anyAtomicType
		list types
			xs:IDREFS
			xs:NMTOKENS
			xs:ENTITIES
			user-defined list types
		union types
			xs:numeric
			user-defined union types
	complex types
		xs:untyped
		user-defined complex types

The final diagram and table show all of the atomic types, including the primitive simple types and the built-in types derived from the primitive simple types. This includes all the built-in datatypes defined in [Schema Part 2].

Type hierarchy graphic, anyAtomicType hierarchy

In the table, each type whose name is indented is derived from the type whose name appears nearest above it with one less level of indentation.

xs:untypedAtomic
xs:dateTime
	xs:dateTimeStamp
xs:date
xs:time
xs:duration
	xs:yearMonthDuration
	xs:dayTimeDuration
xs:float
xs:double
xs:decimal
	xs:integer
		xs:nonPositiveInteger
			xs:negativeInteger
		xs:long
			xs:int
				xs:short
					xs:byte
		xs:nonNegativeInteger
			xs:unsignedLong
				xs:unsignedInt
					xs:unsignedShort
						xs:unsignedByte
			xs:positiveInteger
xs:gYearMonth
xs:gYear
xs:gMonthDay
xs:gDay
xs:gMonth
xs:string
	xs:normalizedString
		xs:token
			xs:language
			xs:NMTOKEN
			xs:Name
				xs:NCName
					xs:ID
					xs:IDREF
					xs:ENTITY
xs:boolean
xs:base64Binary
xs:hexBinary
xs:anyURI
xs:QName
xs:NOTATION

2.7.5 Atomic Values

An atomic value can be constructed from a lexical representation. Given a string and an atomic type, the atomic value is constructed in such a way as to be consistent with schema validation. If the string does not represent a valid value of the type, an error is raised. When xs:untypedAtomic is specified as the type, no validation takes place. The details of the construction are described in Section 18 Constructor functions ^FO31 and the related Section 19 Casting ^FO31 section of [XQuery and XPath Functions and Operators 3.1].

2.7.6 String Values

A string value can be constructed from an atomic value. Such a value is constructed by converting the atomic value to its string representation as described in Section 19 Casting ^FO31 . Using the canonical lexical representation for atomic values is not always compatible with XPath 1.0. These and other backwards incompatibilities are described in [TITLE OF XP31 SPEC, TITLE OF id-backwards-compatibility SECTION]^XP31 .

2.7.7 Negative Zero

The xs:float and xs:double data types in the data model have the same value space as in XML Schema 1.1 ([Schema 1.1 Part 2]). Specifically they include both negative and positive zero, and in this respect they differ from XML Schema 1.0.

To accommodate this difference, when converting from an xs:string to an xs:float or xs:double, it is implementation-defined whether the lexical value “-0” (and similar forms such as “-0.0”) convert to negative zero or to positive zero in the value space.

2.8 Other Items

The XPath Data Model is the abstraction over which XPath expressions are evaluated. Historically, all of the items in the data model could be derived directly (nodes) or indirectly (typed values, sequences) from an XML document. However, as the XPath expression language has matured, new features have been added which require additional types of items to appear in the data model. These items have no direct XML serialization, but they are never the less part of the data model.

2.8.1 Functions

[Definition: A function is an item that can be called. ] Functions cannot be compared for identity, equality, or otherwise, and have no serialization.

A function has the following properties:

name (xs:QName): An expanded QName, possibly absent.
parameter names (xs:QName*): A list of distinct names, one for each of the function's parameters.
signature (a FunctionTest of the form Annotation* TypedFunctionTest): The TypedFunctionTest^XP31 has one SequenceType^XP31 for each parameter, and one SequenceType for the function's result. [Definition: A function signature represents the type of a function.] The presence of annotations is language dependent; functions defined in languages, such as XPath, that have no mechanism for defining annotations will create functions in the data model with zero annotations.
implementation This enables the function, when it's called, to map instances of its parameter types into an instance of its result type. The implementation is either:
- a host language expression, which is always associated with a static context, or
- an implementation-dependent function implementation, which is optionally associated with both a static and a dynamic context.
nonlocal variable bindings (a mapping from xs:QName to item()*): This provides a value for each of the function's free variables (i.e., variables referenced by the function's implementation, other than locals and parameters).

[Definition: A function's arity is the number of its parameters. ] The number of names in a function's parameter names, and the number of parameter types in its signature, must equal the function's arity.

The space of all possible function signatures forms a hierarchy of function types. All function types are a subtype of function(*), which is itself a subtype of item(). Subtypes of function(*) are partitioned into discrete types, each representing functions that accept a particular number of arguments. Function types which have the same arity can be subtypes of each other based on their argument and return types. This subtype relationship is defined in Section 2.5.6 SequenceType Subtype Relationships ^XP31 . For example:

function(item()) as item() is a subtype of function(*)
function(item()) as xs:integer is a subtype of function(item()) as item()
function(item()) as item() is a subtype of function(xs:string) as item()

2.8.2 Map Items

[Definition: A map item is a value that represents a map (sometimes called a hash or an associative array).] A map is logically a collection of key/value pairs. Each key in the map is unique (there is no other key to which it is equal) and has associated with it a value that is a single item or sequence of items. There is no uniqueness constraint on values. The semantics of equality are described in Section 17.1.1 op:same-key ^FO31.

Note:

This version of the XPath Data Model does not specify whether or not maps have any identity other than their values. No operations currently defined on maps depend on the notion of map identity. Other specifications, for example, the XQuery Update Facility, may make identity of maps explicit.

There is a single accessor associated with maps, it is defined in the following section.

2.8.2.1 `map-entries` Accessor

dm:map-entries($map as map()) as array(array(item()))

The dm:map-entries accessor returns an array of arrays. For each key/value pair in the $map, an array will be constructed with the key in position 1 and the value in position 2. The array returned by dm:map-entries is the array of the arrays constructed for the key/value pairs. The order of the members in the array returned is implementation-dependent.

2.8.3 Array Items

[Definition: An array item is a value that represents an array.] An array is an ordered list of values; these values are called the members of the array. Unlike sequences, a member of an array can be any value (including a sequence or an array). The number of members in an array is called its size, and they are referenced by their position, in the range 1 to the size of the array.

Note:

This version of the XPath Data Model does not specify whether or not arrays have any identity other than their values. No operations currently defined on arrays depend on the notion of array identity. Other specifications, for example, the XQuery Update Facility, may make identity of arrays explicit.

The accessors associated with arrays are defined in the following sections.

2.8.3.1 `array-size` Accessor

dm:array-size($array as array()) as xs:nonNegativeInteger

The dm:array-size accessor returns the number of items in the array.

2.8.3.2 `array-get` Accessor

dm:array-get($array as array(), $position as xs:positiveInteger) as item()*

The dm:array-get accessor returns the value in the array at the position $position. An error is raised if the array does not contain a value at that position. For all positions greater than 0 and less than or equal to the array size, dm:array-get will return a value.

3 Data Model Construction

This section describes the constraints on instances of the data model.

The data model supports well-formed XML documents conforming to [Namespaces in XML] or [Namespaces in XML 1.1]. Documents that are not well-formed are, by definition, not XML. XML documents that do not conform to [Namespaces in XML] or [Namespaces in XML 1.1] are not supported (nor are they supported by [Infoset]).

In other words, the data model supports the following classes of XML documents:

Well-formed documents conforming to [Namespaces in XML] or [Namespaces in XML 1.1].
DTD-valid documents conforming to [Namespaces in XML] or [Namespaces in XML 1.1], and
W3C XML Schema-validated documents.

This document describes how to construct an instance of the data model from an infoset ([Infoset]) or a Post Schema Validation Infoset (PSVI), the augmented infoset produced by an XML Schema validation episode.

An instance of the data model can also be constructed directly through application APIs, or from non-XML sources such as relational tables in a database. Data model construction from sources other than an Infoset or PSVI is implementation-defined. Regardless of how an instance of the data model is constructed, every node and atomic value in the data model must have a typed-value that is consistent with its type.

The data model supports some kinds of values that are not supported by [Infoset]. Examples of these are document fragments and sequences of Document Nodes. The data model also supports values that are not nodes. Examples of these are sequences of atomic values, or sequences mixing nodes and atomic values. These are necessary to be able to represent the results of intermediate expressions in the data model during expression processing.

3.1 Direct Construction

Although this document describes construction of an instance of the data model in terms of infoset properties, an infoset is not a necessary precondition for building an instance of the data model.

There are no constraints on how an instance of the data model may be constructed directly, save that the resulting instance must satisfy all of the constraints described in this document.

3.2 Construction from an Infoset

An instance of the data model can be constructed from an infoset that satisfies the following general constraints:

All general and external parsed entities must be fully expanded. The Infoset must not contain any unexpanded entity reference information items.
The infoset must provide all of the properties identified as "required" in this document. The properties identified as "optional" may be used, if they are present. All other properties are ignored.

An instance of the data model constructed from an information set must be consistent with the description provided for each node kind.

Furthermore, construction of an instance of the data model from an Infoset is only guaranteed to be well-defined for Infosets that could have been derived from a conforming XML document.

3.3 Construction from a PSVI

An instance of the data model can be constructed from a PSVI, whose element and attribute information items have been strictly assessed, laxly assessed, or have not been assessed. Constructing an instance of the data model from a PSVI must be consistent with the description provided in this section and with the description provided for each node kind.

Data model construction requires that the PSVI provide unique names for all anonymous schema types.

Note:

[Schema Part 1] does not require all schema processors to provide unique names for anonymous schema types. In order to build an instance of the data model from a PSVI produced by a processor that does not provide the names, some post-processing will be required in order to assure that they are all uniquely identified before construction begins.

[Definition: An incompletely validated document is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the [validity] property in the PSVI.]

The data model supports incompletely validated documents. Elements and attributes that are not valid are treated as having unknown types.

The most significant difference between Infoset construction and PSVI construction occurs in the area of schema type assignment. Other differences can also arise from schema processing: default attribute and element values may be provided, white space normalization of element content may occur, and the user-supplied lexical form of elements and attributes with atomic schema types may be lost.

3.3.1 Mapping PSVI Additions to Node Properties

A PSVI element or attribute information item may have a [validity] property. The [validity] property may be " valid ", " invalid ", or " notKnown " and reflects the outcome of schema-validity assessment. In the data model, precise schema type information is exposed for Element and Attribute Nodes that are " valid ". Nodes that are not " valid " are treated as if they were simply well-formed XML and only very general schema type information is associated with them.

3.3.1.1 Element and Attribute Node Types

The precise definition of the schema type of an element or attribute information item depends on the properties of the PSVI. In the PSVI, [Schema Part 1] defines a [type definition] property as well as the [type definition namespace], [type definition name] and [type definition anonymous] properties, which are effectively short-cut terms for properties of the type definition. Further, the [element declaration] and [attribute declaration] properties are defined for elements and attributes, respectively. These declarations in turn will identify the [type definition] declared for the element or attribute. To distinguish the [type definition] given in the PSVI for the element or attribute instance from the [type definition] associated with the declaration, the former is referred to below as the actual type and the latter as the declared type of the element or attribute instance in question.

The type depends on the declared type, the actual type, and the [validity] and [validation attempted] properties in the PSVI. If:

The [validity] and [validation attempted] properties exist and have the values " valid " and " full ", respectively, the schema type of an element or attribute information item is represented by an expanded-QName whose namespace and local name correspond to the first applicable items in the following list:
- If the declared type exists and is a union and the actual type is (not the same as the declared type, and not a type derived from the declared type, but) one of the member types of the union, or derived from one of its member types:
  - If the {name} property of the declared type is present: the {target namespace} and {name} properties of the declared type.
  - If the {name} property of the declared type is absent: the namespace and local name of the anonymous type name supplied for the declared type.
- If there is no declared type, and the actual type is a union, then:
  - If the {name} property of the actual type is present: the {target namespace} and {name} properties of the actual type.
  - If the {name} property of the actual type is absent: the namespace and local name of the anonymous type name supplied for the actual type.
- Otherwise:
  - If [type definition anonymous] is false: the {target namespace} and {name} properties of the actual type.
  - If [type definition anonymous] is true: the namespace and local name of the anonymous type name supplied for the actual type.
The [validity] property exists and is " invalid ", or the [validation attempted] property exists and is " partial ", the schema type of an element is xs:anyType and the type of an attribute is xs:anySimpleType.
The [validity] property exists and is " notKnown ", the schema type of an element is xs:anyType and the type of an attribute is xs:anySimpleType.
The [validity] or [validation attempted] properties do not exist, the schema type of an element is xs:untyped and the type of an attribute is xs:untypedAtomic.

The prefix associated with the type names is implementation-dependent.

3.3.1.2 Typed Value Determination

This section describes how the typed value of an Element or Attribute Node is computed from an element or attribute PSVI information item, where the information item has either a simple type or a complex type with simple content. For other kinds of Element Nodes, see 6.2.4 Construction from a PSVI; for other kinds of Attribute Nodes, see 6.3.4 Construction from a PSVI.

The typed value of Attribute Nodes and some Element Nodes is a sequence of atomic values. The types of the items in the typed value of a node may differ from the type of the node itself. This section describes how the typed value of a node is derived from the properties of an information item in a PSVI.

The types of the items in the typed value of a node are determined as follows. The process begins with a type, T. If the schema type of the node itself, as represented in the PSVI, is a complex type with simple content, then T is the {content type} of the schema type of the node; otherwise, T is the schema type of the node itself. For each primitive or ordinary simple type T, the W3C XML Schema specification defines a function M mapping the lexical representation of a value onto the value itself.

Note:

For atomic and list types, the mapping is the “lexical mapping” defined for T in [Schema Part 2]; for union types, the mapping is the lexical mapping defined in [Schema Part 2] modified as appropriate by any applicable rules in [Schema Part 1]. The mapping, so modified, is a function (in the mathematical sense) which maps to a single value even in cases where the lexical mapping proper maps to multiple values.

The typed value is determined as follows:

If the nilled property of the node in question is true, then the typed value is the empty sequence.
If T is xs:anySimpleType or xs:anyAtomicType, the typed value is the [schema normalized value] as an instance of xs:untypedAtomic.
Otherwise, the typed value is the result of applying M to the string value as an instance of the appropriate value type, where the appropriate value type is the [member type definition] if T is a union type, otherwise it is simply T.

The typed value determination process is guaranteed to result in a sequence of atomic values, each having a well-defined atomic type. This sequence of atomic values, in turn, determines the typed-value property of the node in the data model.

3.3.1.3 Relationship Between Typed-Value and String-Value

Element and attribute nodes have both typed-value and string-value properties. However, implementations are allowed some flexibility in how these properties are stored. An implementation may choose to store the string-value only and derive the typed-value from it, or to store the typed-value only and derive the string-value from it, or to store both the string-value and the typed-value.

In order to permit these various implementation strategies, some variations in the string value of a node are defined as insignificant. Implementations that store only the typed value of a node are permitted to return a string value that is different from the original lexical form of the node content. For example, consider the following element:

<offset xsi:type="xs:integer">0030</offset>

Assuming that the node is valid, it has a typed value of 30 as an xs:integer. An implementation may return either "30" or "0030" as the string value of the node. Any string that is a valid lexical representation of the typed value is acceptable. In this specification, we express this rule by saying that the relationship between the string value of a node and its typed value must be "consistent with schema validation."

If an implementation stores only the string-value of a node, the following considerations apply:

Where union types occur, the implementation must be able to deliver the typed-value as an instance of the appropriate member type. For example, if the type of an element node is my:integer-or-string, which is defined as a union of xs:integer and xs:string, and the string-value of the node is "47", the implementation must be able to deliver the typed-value of the node as either the integer 47 or the string "47", depending on which member type validated the element.
Where types of xs:QName, xs:NOTATION, or types derived from one of these types occur, the implementation must be able to deliver the typed-value as a triple including a local name, a namespace prefix, and a namespace URI, even though the namespace URI is not part of the string-value (see 3.3.3 QNames and NOTATIONS).
Where an element with a complex type and element-only content occurs, it is an error to attempt to access the typed-value of the Element Node.

If an implementation stores only the typed-value of a node, it must be prepared to construct string values from not only the node, but in some cases also the descendants of that node. For example, an element with a complex type and element-only content has no typed-value but does have a string-value that is the concatenation of the string-values of all its Text Node descendants in document order.

A further caveat applies if an implementation stores the typed value of a node. If a new data model is constructed by copying portions of another data model, and the copy operation does not preserve inherited namespaces, and the type is a union type that is sensitive to the namespace context, then the typed value may be different than what would be obtained by revalidating the node within its new namespace context. Although this may stretch the semantics of “consistent with schema validation”, we accept this possibility; it is not an error.

3.3.1.4 Pattern Facets

Creating a subtype by restriction generally reduces the value space of the original schema type. For example, expressing a hat size as a restriction of decimal with a minimum value of 6.5 and maximum value of 8.0 creates a schema type whose valid values are only those in the range 6.5 to 8.0.

The pattern facet is different because it restricts the lexical space of the schema type, not its value space. Expressing a three-digit number as a restriction of integer with the pattern facet “[0-9]{3}” creates a schema type whose valid values are only those with a lexical form consisting of three digits.

The pattern facet is not reversible in practice. A given point in the value space might have several lexical representations. In general, there's no practical way to determine which, if any, of these representations satisfies the pattern facet of the type.

As a consequence, pattern facets are not respected when mapping to an Infoset or during serialization and values in the data model that were originally valid with respect to a schema that contains pattern-based restrictions may be invalid after serialization.

3.3.2 Dates and Times

The date and time types require special attention. This section applies to implementations that store the typed value of xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay, and types that are derived from them. These are known collectively as the date/time types in this specification.

The values of the date/time types are represented in the data model using seven components:

year: An xs:integer.
month: An xs:integer between 1 and 12, inclusive.
day: An xs:integer between 1 and 31, inclusive, possibly restricted further depending on the values of month and year.
hour: An xs:integer between 0 and 23, inclusive.
minute: An xs:integer between 0 and 59, inclusive.
second: An xs:decimal greater than or equal to zero and less than 60. Leap seconds are not supported.
timezone: An xs:dayTimeDuration between -PT14H00M and PT14H00M, inclusive. All timezone values must be an integral number of minutes.

Components that are intrinsic to the datatype (for example, day, month, and year in a xs:date) are required; components that can never be part of a datatype (for example, years in a xs:time) must be missing. Missing components are represented by the empty sequence. When a component is present, it contains the “local value” that has not been normalized in any way. The timezone component is optional for all the date/time datatypes.

Thus, the lexical xs:dateTime representation “2003-01-02T11:30:00-05:00” is stored as “{2003, 1, 2, 11, 30, 0.0, -PT05H00M}”. The value of the lexical representation “2003-01-16T16:30:00” is stored as “{2003, 1, 16, 16, 30, 0, ()}” because it has no timezone. The value of the lexical xs:gDay representation “---30+10:30” is stored as “{(), (), 30, (), (), (), PT10H30M}”.

The lexical form “24:00:00” is normalized in the component model. As a xs:time, it is stored as “{(), (), (), 0, 0, 0.0, ()}” and the xs:dateTime representation “1999-12-31T24:00:00” is stored as “{2000, 1, 1, 0, 0, 0.0, ()}”.

Note:

Implementations are permitted to store date/time values in any representation that's convenient for them, provided that the individual properties can be accessed and modified.

3.3.3 QNames and NOTATIONS

The QName and NOTATION data types require special attention. The following sections apply to xs:QName, xs:NOTATION, and types derived from them. These types are referred to collectively as “qualified names”.

As defined in XML Schema, the lexical space for qualified names includes a local name and an optional namespace prefix. The value space for qualified names contains a local name and an optional namespace URI. Therefore, it is not possible to derive a lexical value from the typed value, or vice versa, without access to some context that defines the namespace bindings.

When qualified names exist as values of nodes in a well-formed document, it is always possible to determine such a namespace context. However, the data model also allows qualified names to exist as freestanding atomic values, or as the name or value of a parentless attribute node, and in these cases no namespace context is available.

In this Data Model, therefore, the value space for qualified names contains a local-name, an optional namespace URI, and an optional prefix. The prefix is used only when producing a lexical representation of the value, that is, when casting the value to a string. The prefix plays no part in other operations involving qualified names: in particular, two qualified names are equal if their local names and namespace URIs match, regardless whether they have the same prefix.

The following consistency constraints apply:

If the namespace URI of a qualified name is absent, then the prefix must also be absent.
For every element node whose name has a prefix, the prefix must be one that has a binding to the namespace URI of the element name in the namespaces property of the element.
For every element node whose name has no prefix, the element must have a a binding for the empty prefix to the namespace URI of the element name, or must have no binding for the empty prefix in the case where the name of the element has no namespace URI.
For every attribute node whose name has a prefix, the attribute node must either be parentless, or the prefix must be one that has a binding to the namespace URI of the attribute name in the namespaces property of the parent element.
For every qualified name that contains a prefix and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, the prefix must be one that is bound to the namespace URI of the qualified name in the namespaces property of that element.
For every qualified name that contains a namespace URI and no prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that element node must have a binding for the empty prefix to that namespace URI in its namespace property.
For every qualified name that contains neither a namespace URI nor a prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that node must not have a binding for the empty prefix.
No qualified name that contains a prefix may be included in the typed value of an attribute node that has no parent.

4 Infoset Mapping

This specification describes how to map each kind of node to the corresponding information item. This mapping produces an Infoset; it does not and cannot produce a PSVI. Validation must be used to obtain a PSVI for a (portion of a) data model instance.

5 Accessors

A set of accessors is defined on nodes in the data model. For consistency, all the accessors are defined on every kind of node, although several accessors return a constant empty sequence on some kinds of nodes.

In order for processors to be able to operate on instances of the data model, the model must expose the properties of the items it contains. The data model does this by defining a family of accessor functions. These are not functions in the literal sense; they are not available for users or applications to call directly. Rather they are descriptions of the information that an implementation of the data model must expose to applications. Functions and operators available to end-users are described in [XQuery and XPath Functions and Operators 3.1].

Some typed values in the data model are absent. Attempting to access an absent typed value is an error. Behavior in these cases is implementation-defined and the host language is responsible for determining the result.

5.1 `attributes` Accessor

dm:attributes($n as node()) as attribute()*

The dm:attributes accessor returns the attributes of a node as a sequence containing zero or more Attribute Nodes. The order of Attribute Nodes is stable but implementation dependent.