Publication Manifest

Abstract

This specification defines a general manifest format for expressing information about a digital publication. It uses [schema.org] metadata augmented to include various structural properties about publications, serialized in [json-ld11], to enable interoperability between publishing formats while accommodating variances in the information that needs to be expressed.

4. Publication Manifest

4.1 Requirements

The following properties MUST be set in the manifest:

context
conformsTo

The following properties are RECOMMENDED:

type
id

The priority of all other properties and resource relations is OPTIONAL, but MAY be modified by implementations of the manifest format.

Note

Some properties are implicitly required, as they are compiled from alternative information when not explicitly authored. See § A. Internal Representation Data Model for more information.

4.2 Value Categories

This section describes the categories of values that can be used with properties of the publication manifest.

4.2.1 Literals

When a manifest property expects a literal text string — one that is not language-dependent, such as a code value or date — as its value, the value MUST be expressed as a [json] string.

Literal values are not changed during processing of the manifest, unlike other values which might be, for example, converted to objects.

4.2.2 Numbers

When a manifest property expects a number as its value, the value MUST be expressed as a [json] number.

4.2.3 Booleans

When a manifest property expects a boolean as its value, the value MUST be expressed as an [ecmascript] Boolean value (true or false).

4.2.4 Explicit and Implied Objects

Various manifest properties are expected to be expressed as [json] objects. Although the use of explicit objects is usually advised, the following sections identify cases where it is also acceptable to use string values. These strings are automatically translated into objects during processing of the manifest by a user agent (the exact mapping of text values to objects is included in each definition).

4.2.4.1 Localizable Strings

When a manifest property expects a localizable text string as its value, the value MUST be expressed as one of:

a [json] string value; or
a LocalizableString.

A single string value represents an implied object whose value property is the string's text and whose language and base direction is determined from other information in the manifest.

As localizable strings are intended to facilitate multiple language representations of a value, properties that accept a localizable string always accept an array of these values. For this reason, although only a single string or object has to be authored, such values are converted to arrays for consistency of processing.

A LocalizableString is a [json] object consisting of the following properties:

Term	Description	Required Value	Value Category	[schema.org] Mapping
`value`	The value of the localizable string. REQUIRED.	Text.	Literal	(None)
`language`	The language of the value. OPTIONAL.	A well-formed language tag [bcp47].	Literal	(None)
`direction`	The base direction of the value. OPTIONAL.	`ltr` or `rtl`	Literal	(None)

The meanings of the base direction values are:

ltr: indicates that the textual value is explicitly directionally set to left-to-right text.
rtl: indicates that the textual value is explicitly directionally set to right-to-left text.

A missing base direction value means that that the textual value is explicitly directionally set to the direction of the first character with a strong directionality, following the rules of the Unicode Bidirectional Algorithm [bidi].

Example 1 : Set the language of a string

{
    "value"     : "孔子",
    "language"  : "zh"
}

Example 2 : Set the language and the base direction of a string

{
    "value"     : "HTML היא שפת סימון.",
    "language"  : "he",
    "direction" : "rtl"
}

Note

If the base direction value were not set in the last example, the text would be displayed, following the Unicode Bidirectional Algorithm [bidi] and due to the presence of a Latin character starting the string, as:

HTML היא שפת סימון.

However, that would be incorrect. The extra direction value is necessary to control the display to yield:

HTML היא שפת סימון.

Note that the value field in the example represents the text as it is stored in memory, hence the discrepancy between it and the two renderings depicted here. Text editors might also display the JSON value differently (e.g., using the Unicode Bidirectional Algorithm only).

See also the [string-meta] document for further explanations and examples.

4.2.4.2 Entities

When a manifest property expects an entity (i.e., an individual or organization responsible for the various aspects of creation), its value MUST be expressed either as:

a [json] string value; or
an Entity.

A single string value represents an instance of an Entity object whose name property is the string's text and whose type is assumed to be Person [schema.org].

An Entity is defined as an instance of either the [schema.org] Person or Organization type with the following minimal property set:

Term	Description	Required Value	Value Category	[schema.org] Mapping
`type`	The type of entity. OPTIONAL	One or more Text. Sequence MUST include "`Person`" or "`Organization`".	Array of Literals	(None)
`name`	Name of the entity. REQUIRED.	One or more Text.	Array of Localizable Strings	`name`
`id`	A canonical identifier associated with the entity. OPTIONAL.	A URL record [url].	Identifier	(None)
`url`	An address associated with the entity. OPTIONAL.	A valid URL string [url].	URL	`url`
`identifier`	An identifier associated with the entity (e.g., ORCID). OPTIONAL.	One or more Text.	Array of Literals	`identifier`

Note

This minimal set of properties is not restrictive. Authors can include any additional properties defined for the [schema.org] Person or Organization types, as appropriate. User agents are similarly not limited to interpreting only the preceding properties.

Example 3 : Using a string instead of a Person object.

The following author name is expressed as a string:

{
    …
    "author" : "Edgar Allen Poe",
    …
}

but, in the context of creators, it is equivalent to:

{
    …
    "author" : {
        "type" : "Person",
        "name" : "Edgar Allen Poe"
    },
    …
}

(See § 4.7.1.5 Creators for further details.)

4.2.4.3 Linked Resources

When a manifest property links to one or more resources, it MUST be expressed either as:

a [json] string encoding the URL of the resources; or
an instance of a LinkedResource.

A string value represents an implied LinkedResource object whose url property is set to the string value.

A LinkedResource object is defined as follows:

Term	Description	Required Value	Value Category	[schema.org] Mapping
`type`	The type of resource. OPTIONAL	One or more Text. Sequence MUST include "`LinkedResource`".	Array of Literals	(None)
`url`	Location of the resource. REQUIRED.	A valid URL string [url]. Refer to the property definitions that accept this type for additional restrictions.	URL	`url`
`encodingFormat`	Media type of the resource (e.g., `text/html`). OPTIONAL.	MIME Media Type [rfc2046].	Literal	`encodingFormat`
`name`	Name of the item. OPTIONAL.	One or more Text.	Array of Localizable Strings	`name`
`description`	Description of the item. OPTIONAL.	One or more Text.	Array of Localizable Strings	`description`
`rel`	The relation of the resource to the publication. OPTIONAL.	One or more relations. Keywords are ASCII case-insensitive [infra] and MUST be compared as such.	Array of Literals	(None)
`integrity`	A cryptographic hashing of the resource that allows its integrity to be verified. OPTIONAL.	One or more whitespace-separated sets of integrity metadata [sri]. The value MUST conform to the metadata definition [sri]. Refer to [sri] for the list of cryptographic hashing functions that user agents are expected to support.	Literal	(None)
`duration`	Overall duration of a time-based media resource. OPTIONAL	Duration value as defined by [iso8601-1].	Literal	`duration` (Property)
`alternate`	References to one or more reformulation(s) of the resource in alternative formats, where the `encodingFormat` specifies the format of the reformulation. OPTIONAL.	One or more of: a string, representing the URL of the resource reformulation in an alternative format; or an instance of a `LinkedResource` object A string value represents an implied `LinkedResource` object whose `url` property is set to the string value.	Array of Linked Resources	(None)

Although user agent support for the integrity property is OPTIONAL, user agents that support cryptographic hashing comparisons using this property MUST do so in accordance with [sri].

This specification only defines the alternate property for selecting from alternative formats (i.e., based on encodingFormat or by inspecting URLs). Profiles MAY extend this behaviour to allow selection based on other criteria. The process for selecting an alternate is described in § B. Selecting an Alternate Resource.

Note

When defining a LinkedResource object, it is advised to always specify the media type of the resource using the encodingFormat property. Doing so allows user agents to more readily determine the usability of the resource.

Example 4 : A resource with a SHA-256 hashing of its content.

{
    "type"           : "LinkedResource",
    "url"            : "chapter1.html",
    "encodingFormat" : "text/html",
    "name"           : "Chapter 1 - Loomings",
    "integrity"      : "sha256-13AE04E21177BABEDFDE721577615A638341F963731EA936BBB8C3862F57CDFC"
}

Example 5 : A resource with its alternate formats.

{
    "type"           : "LinkedResource",
    "url"            : "chapter1.mp3",
    "encodingFormat" : "audio/mpeg",
    "name"           : "Chapter 1 - Loomings",
    "alternate"      : [
        "chapter1.html",
        {
            "type": "LinkedResource",
            "url": "chapter1.json",
            "encodingFormat": "application/vnd.syncnarr+json",
            "duration": "PT1669S"
        }
    ]
}

Example 6 : Resource list that includes one link using a relative URL as a string ('datatypes.svg') and two that display the various properties of the a LinkedResource object.

{
    …
    "resources" : [
        "datatypes.svg",
        {
            "type"            : "LinkedResource",
            "url"             : "test-utf8.csv",
            "encodingFormat"  : "text/csv",
            "name"            : "Test Results",
            "description"     : "CSV file containing the full data set used."
        },
        {
            "type"            : "LinkedResource",
            "url"             : "terminology.html",
            "encodingFormat"  : "text/html",
            "rel"             : "glossary"
        }
    ],
    …
}

4.2.4.4 Objects

When a manifest property expects a type of object not defined in this section, or by a profile, it MUST be expressed as a [json] object (i.e., the property's value will not be processed to create an object).

4.2.5 URLs

URLs are used to identify resources associated with a digital publication. When a property expects a URL value, it MUST be a valid URL string [url].

In the case of relative-URL strings, these are resolved to absolute-URL strings using a base URL [url].

The base URL for relative-URL strings is determined as follows:

In the case of an embedded manifest, it is the document base URL of the embedding document [json-ld11].
In the case of a linked manifest, it is the URL of the manifest resource.
In the case of a digital publication format that uses another means of discovering the manifest, it is defined by the format.

By consequence, relative-URL strings in embedded manifests are resolved against the URL of the document that references the manifest unless the document declares a base URL (i.e., in a <base> element in its header).

4.2.6 Identifiers

Identifiers are used to refer to a digital publication and the entities responsible for its creation in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.

Identifiers MUST be expressed as URL records [url]

4.2.7 Arrays

When a manifest property allows one or more value of their respective type (e.g., literal, object, or URL), these values are expressed as [json] arrays. When a property value is a single element, however, the array syntax MAY be omitted.

Example 7 : Using a text string instead of an array.

As a digital publication typically contains many resources, this declaration of a single resource:

{
    …
    "resources" : "datatypes.svg",
    …
}

is equivalent to the array:

{
    …
    "resources" : ["datatypes.svg"],
    …
}

4.3 Manifest Contexts

A manifest MUST set its JSON-LD context [json-ld11] with the following two components, in the specified order:

the [schema.org] context: https://schema.org
the publication context: https://www.w3.org/ns/pub-context

Note

Although Schema.org is often referenced using the http URI scheme, the vocabulary is being migrated to use the secure https scheme as its default. As a result, only the https scheme is recognized in the publication manifest context.

Example 8 : Setting the context declaration.

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context"
    ],
    …
}

The publication context document adds features to the properties defined in Schema.org (e.g., the requirement for the creator property to be order preserving).

Profiles of this specification MAY require additional context URLs, but such URLs MUST be ordered after these two components.

The context can be extended by including additional parameters — such as the global language and direction declarations — in an object following the publication context.

Example 9

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {
            "language" : "es"
        }
    ],
    …
}

4.4 Manifest Language and Direction

Each natural language property value in a manifest (e.g., title, creators) has a default natural language, which is the language that it is expressed in (e.g., English, French, Chinese). It also has a natural base direction in which it is written — the display direction, either left-to-right or right-to-left.

The digital publication manifest provides the ability to set both these concepts globally as well as on individual items to aid user agents in interpreting and presenting the metadata.

Note

The ability to set the base direction is a JSON-LD 1.1 [json-ld11] feature. In other words, the Publication Manifest has a dependency on that version of the JSON-LD specification (as opposed to the earlier 1.0 [json-ld10] version).

4.4.1 Global Declarations

The global language and base direction declarations for natural language manifest properties are set in the context using the language and direction keywords [json-ld11], respectively. These values are used to expand simple string values into localizable strings during the processing of the manifest, as well as to provide a language and the base direction for localizable strings that omit one.

The value of language MUST be a well-formed language tag [bcp47].

The value of direction MUST have one of the following values:

"ltr": indicates that the textual values are explicitly directionally set to left-to-right text.
"rtl": indicates that the textual values are explicitly directionally set to right-to-left text.

The global language and base direction declaration, when present, MUST follow the publication context.

Default values are not specified for the global language or base direction.

Example 10 : Declaring French as the default language for the manifest.

{
    "@context": [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context", 
        {
            "language": "fr"
        }
    ],
    …
}

Example 11 : Declaring Azeri as the default language and with the base direction to right-to-left.

{
    "@context": [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context", 
        {
            "language": "az",
            "direction": "rtl"
        }
    ],
    …
}

4.4.2 Item-Specific Declarations

It is possible to set the language or a base direction locally for any natural language value in the manifest using a localizable string:

Example 12 : Providing the author name in English for a Chinese publication.

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {
            "language": "zh"
        }
    ],
    "type"     : "Book",
    …
    "author" : {
        "type" : "Person"
        "name" : [
            "孔子",
            {
                "value" : "Confucius",
                "language" : "en"				
            }
        ]
    }
}

Example 13 : A publication in Arabic with the title also given in English.

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {
            "language": "ar"
        }
    ],
    "type"     : "Book",
    …
    "name" : [
        {
            "value": "HTML و CSS: تصميم و إنشاء مواقع الويب",
            "direction": "rtl"
        },
        {
            "value"    : "HTML and CSS: Design and Build Websites",
            "language" : "en"
        }
    ]
}

The extra base direction setting for the Arabic title (i.e., HTML و CSS: تصميم و إنشاء مواقع الويب) is necessary to yield the correct display:

The possible values of the language and direction keywords [json-ld11] are the same as for the global declaration. Furthermore, both values can also be the (JSON) value of null, indicating that no explicit language, respectively direction, is set.

Note

Setting the value of language to null can be useful if a value (e.g., the name of an organization) is commonly used without any associated language (e.g., "Google").

A local declaration of the language, respectively the base direction, takes precedence over a global declaration.

4.5 Publication Types

A digital publication's manifest defines its Publication Type using the type keyword [json-ld11]. The type MAY be mapped onto any [schema.org] type, but CreativeWork is assumed as the default when no type is specified.

Example 14 : Setting a publication's type to CreativeWork.

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
    "type"     : "CreativeWork",
    …
}

More specific subtypes of CreativeWork, such as Article, Book, TechArticle, and Course can be used instead of, or in addition to, CreativeWork.

Example 15 : Setting a publication's type to Book.

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
    "type"     : "Book",
    …
}

Each Schema.org type defines a set of properties that are valid for use with it. To ensure that the manifest can be validated and processed by Schema.org-aware processors, the manifest SHOULD contain only the properties associated with the selected type.

If properties from more than one type are needed, the manifest MAY include multiple type declarations.

Example 16 : Setting the type property for a publication that combines properties from Book and VisualArtwork.

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
    "type"     : ["Book", "VisualArtwork"],
    …
}

User agents SHOULD NOT fail to process manifests that are not valid to their declared Schema.org type(s).

Note

Refer to the Schema.org site for the complete list of CreativeWork subtypes.

4.6 Profile Conformance

A digital publication indicates the profile its manifest and content conform to using the conformsTo property.

Term	Description	Required Value	Value Category	[dcterms] Mapping
`conformsTo`	URL of the profile.	An absolute-URL-with-fragment string [url].	Array of Literals	conformsTo

The URL to use for a profile is defined in its respective specification.

Note

The conformsTo property can also be used to indicate conformance to other specifications and standards (e.g., to [wcag21]).

Example 17 : Identify that a digital publication conforms to the W3C Audiobooks specification.

{
    …
    "conformsTo" : "https://www.w3.org/TR/audiobooks/",
    …
}

4.7 Properties

4.7.1 Descriptive Properties

4.7.1.1 Abridged

The abridged property provides information on whether or not a digital publication has been shortened from its original form.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`abridged`	Indicates whether the book is an abridged edition.	Either `true` or `false`.	Boolean	`abridged` (Book)

Example 18 : Setting that a publication is abridged.

{
    …
    "abridged" : true,
    …
}

4.7.1.2 Accessibility

The accessibility properties provide information about the suitability of a digital publication for consumption by users with different preferred reading modalities. These properties typically supplement an evaluation against established accessibility criteria, such as those provided in [wcag21].

The following properties are categorized as accessibility properties:

Term	Description	Required Value	Value Category	[schema.org] Mapping
`accessMode`	The human sensory perceptual system or cognitive faculty through which a person may process or perceive information.	One or more Text.	Array of Literals	`accessMode` (CreativeWork)
`accessModeSufficient`	A list of single or combined access modes that are sufficient to understand all the intellectual content of a resource.	One or more ItemList.	Array of Object	`accessModeSufficient` (CreativeWork)
`accessibilityFeature`	Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility.	One or more Text.	Array of Literals	`accessibilityFeature` (CreativeWork)
`accessibilityHazard`	A characteristic of the described resource that is physiologically dangerous to some users.	One or more Text.	Array of Literals	`accessibilityHazard` (CreativeWork)
`accessibilitySummary`	A human-readable summary of specific accessibility features or deficiencies that is consistent with the other accessibility metadata.	Text.	Array of Localizable Strings	`accessibilitySummary` (CreativeWork)

Note

Detailed descriptions of these properties, including the expected values to use with them, are available at [webschemas-a11y].

Note

A reference to a detailed accessibility report can also be provided if more information is needed than can be expressed by these properties.

Example 19 : Setting accessibility metadata for a publication that provides alternative text and long descriptions appropriate for each image, enabling it to be read in purely textual form.

{
    …
    "accessMode"              : ["textual", "visual"],
    "accessibilityFeature"    : ["alternativeText", "longDescription"]
    "accessModeSufficient"    : [
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual", "visual"]
        },
        {
            "type"            : "ItemList",
            "itemListElement" : ["textual"]
        }
    ],
    …
}

4.7.1.3 Address

An address is a URL that identifies the source location of a digital publication. It is expressed using the url property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`url`	URL of the publication.	A valid URL string [url].	Array of URLs	`url` (Thing)

A digital publication MAY have more than one address, but all the addresses MUST resolve to the same document.

Note

The publication's address can also be used as value for an identifier link relation [link-relation].

Example 20 : Setting the address of the publication.

{
    …
    "url" : "https://publisher.example.org/frankenstein",
    …
}

4.7.1.4 Canonical Identifier

A digital publication's canonical identifier property provides a unique identifier for a digital publication. It is expressed using the id property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`id`	Preferred version of the publication.	A URL record [url].	Identifier	(None)

Note

Ensuring uniqueness of canonical identifiers is outside the scope of this specification. The actual achievable uniqueness depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.

If a canonical identifier is not provided in the manifest, or the value is an invalid URL, the digital publication does not have a canonical identifier. User agents MUST NOT attempt to construct a canonical identifier from any other identifiers provided in the manifest.

The specification of the canonical identifier MAY be complemented by the inclusion of additional types of identifiers using the identifier property [schema.org] and/or its subtypes.

Example 21 : Setting the canonical identifier and the address as URLs.

{
    …
    "id"  : "http://www.w3.org/TR/tabular-data-model/",
    "url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    …
}

Example 22 : Using a URN for the canonical identifier.

{
    …
    "id"  : "urn:isbn:9780123456789",
    "url" : "https://publisher.example.org/wuthering-heights",
    …
}

4.7.1.5 Creators

A creator is an individual or organization responsible for the creation of a digital publication.

The following properties are categorized as creators:

Term	Description	Required Value	Value Category	[schema.org] Mapping
`artist`	The primary artist for the publication, in a medium other than pencils or digital line art.	One or more `Person`.	Array of Entities	`artist` (VisualArtwork)
`author`	The author of the publication.	One or more `Person` and/or `Organization`.	Array of Entities	`author` (CreativeWork)
`colorist`	The individual who adds color to inked drawings.	One or more `Person`.	Array of Entities	`colorist` (VisualArtwork)
`contributor`	Contributor whose role does not fit to one of the other roles in this table.	One or more `Person` and/or `Organization`.	Array of Entities	`contributor` (CreativeWork)
`creator`	The creator of the publication. Use of this property might lead to inconsistent results in user agents. It is marked as a synonym for author in [schema.org], but there is no guidance on which takes precedence or how to combine them. It is advised to use only one or the other, with preference given to the more specific author property.	One or more `Person` and/or `Organization`.	Array of Entities	`creator` (CreativeWork)
`editor`	The editor of the publication.	One or more `Person`.	Array of Entities	`editor` (CreativeWork)
`illustrator`	The illustrator of the publication.	One or more `Person`.	Array of Entities	`illustrator` (Book)
`inker`	The individual who traces over the pencil drawings in ink.	One or more `Person`.	Array of Entities	`inker` (VisualArtwork)
`letterer`	The individual who adds lettering, including speech balloons and sound effects, to artwork.	One or more `Person`.	Array of Entities	`letterer` (VisualArtwork)
`penciler`	The individual who draws the primary narrative artwork.	One or more `Person`.	Array of Entities	`penciler` (VisualArtwork)
`publisher`	The publisher of the publication.	One or more `Person` and/or `Organization`.	Array of Entities	`publisher` (CreativeWork)
`readBy`	A person who reads (performs) the publication (for audiobooks).	One or more `Person`.	Array of Entities	`readBy` (Audiobook)
`translator`	The translator of the publication.	One or more `Person` and/or `Organization`.	Array of Entities	`translator` (CreativeWork)

Creators MUST be represented either as:

a [json] string encoding the name of a Person [schema.org]; or
an instance of a Person or Organization [schema.org].

A single string value is a shorthand for a [schema.org] Person whose name property is set to that string value. (See also § 4.2.4.2 Entities.)

The manifest MAY include more than one of each type of creator.

Example 23 : Setting the author of a book.

{
    …
    "url"      : "https://publisher.example.org/alice-in-wonderland",
    "author"   : {
        "type"  : "Person",
        "name"  : "Lewis Carroll"
    }
}

Example 24 : Separating editors, authors, and publisher. Some persons expressed as simple strings instead of objects.

{
    …
    "author"     : [
        "Jeni Tennison",
        {
            "type"       : "Person",
            "name"       : "Gregg Kellogg",
        },
        {
            "type"       : "Person",
            "name"       : "Ivan Herman",
            "id"         : "https://www.w3.org/People/Ivan/"
            "identifier" : "0000-0003-0782-2704",
        }
    ],
    "editor"    : [
        "Jeni Tennison",
        {
            "type" : "Person",
            "name" : "Gregg Kellogg",
        }
    ],
    "publisher" : {
        "type" : "Organization",
        "name" : "World Wide Web Consortium",
        "id"   : "https://www.w3.org/"
    }
    …
}

4.7.1.6 Duration

The global duration indicates the overall length of a time-based digital publication (e.g., an audiobook or a book consisting of a series of video clips). It is expressed using the duration property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`duration`	Overall duration of a time-based publication.	Duration value as defined by [iso8601-1].	Literal	`duration` (Property)

Example 25 : Setting the global duration in seconds.

{
    …
    "type"     : "Audiobook",
    "id"       : "https://example.org/flatland-a-romance-of-many-dimensions/",
    "url"      : "https://w3c.github.io/pub-manifest/experiments/audiobook/",
    "name"     : "Flatland: A Romance of Many Dimensions",
    …
    "duration" : "PT15153S",
    …
}

Note

The relevant Wikipedia page gives a concise description of the ISO duration syntax.

4.7.1.7 Last Modification Date

The last modification date is the date when a digital publication was last updated (i.e., whenever changes were last made to any of the resources of the publication, including the manifest). It is expressed using the dateModified property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`dateModified`	Last modification date of the publication.	A `Date` or `DateTime` value [schema.org], both expressed in ISO 8601 Date, or Date Time formats, respectively [iso8601-1].	Literal	`dateModified` (CreativeWork)

The last modification date does not necessarily reflect all changes to a publication (e.g., if a digital publication format allows references to third-party content). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.

Example 26 : Setting the last modification date of the publication.

{
    …
    "dateModified" : "2015-12-17",
    …
}

4.7.1.8 Publication Date

The publication date is the date on which a digital publication was originally published. It represents a static event in the lifecycle of a publication and allows subsequent revisions to be identified and compared. It is expressed using the datePublished property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`datePublished`	Creation date of the publication.	A `Date` or `DateTime`, both expressed in ISO 8601 Date, or Date Time formats, respectively [iso8601-1].	Literal	`datePublished` (CreativeWork)

The exact moment of publication is intentionally left open to interpretation: it could be when the publication is first made available or could be a point in time before publication when the publication is considered final.

Example 27 : Setting the creation and modification date of the publication.

{
    …
    "datePublished" : "2015-12-17",
    "dateModified"  : "2016-01-30",
    …
}

4.7.1.9 Publication Language

A digital publication has at least one natural language, which is the language that the content is expressed in (e.g., English, French, Chinese). The manifest includes the following property to set this concept, which can influence, for example, the behavior of a user agent (e.g., to preload a dictionary or text-to-speech engine).

Term	Description	Required Value	Value Category	[schema.org] Mapping
`inLanguage`	Default language for the publication.	One or more well-formed language tags [bcp47].	Array of Literals	`inLanguage` (Property)

The natural language MUST be a well-formed language tag [bcp47].

If a user agent requires the publication language and it is not available in the manifest, or the obtained value is not well-formed [bcp47], the user agent MAY attempt to determine the publication language when generating its internal representation. This specification does not mandate how such a language tag is created. The user agent might:

use the language declaration of the manifest;
use the first language declaration found in a resource in the default reading order; or
calculate the language using an algorithm of its own design.

If a user agent requires a primary language for the publication and more than one language is specified, the first entry in the inLanguage array MUST be recognized as the primary.

Note

It is important to differentiate the language of the publication from the language of the individual resources that compose it. If such resources are, for example, in HTML, the language needs to be set in those resources, too. The language of the publication is not inherited.

4.7.1.10 Reading Progression Direction

The reading progression direction establishes the reading direction from one resource to the next within a digital publication. It is used to adapt such publication-level interactions as menu position, touch gestures, swap direction, and tap zones for next and previous page. The reading progression is expressed using the readingDirection property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`readingProgression`	Reading progression direction from one resource to the other.	One of: `ltr` or `rtl`.	Literal	(None)

The value of this property MUST be either:

ltr: left-to-right; or
rtl: right-to-left.

The default value is ltr. If the readingProgression is not set, user agents MUST use the default value when generating their internal representation.

This property has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.

Example 28 : Setting the reading progression explicitly to ltr (left-to-right).

{
    …
    "readingProgression" : "ltr",
    …
}

4.7.1.11 Title

The title provides the human-readable name of a digital publication. It is expressed using the name property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`name`	Human-readable title of the publication.	One or more Text.	Array of Localizable Strings	`name` (Thing)

If a title is not included in the manifest, the user agent MUST create one. The process for obtaining the title is defined in § 7.4.3 Add Default Values.

Note

A user agent is not expected to produce a meaningful title [wcag21] for a publication when one is not specified.

Example 29 : Setting the title of a book explicitly.

{
    …
    "name" : "Heart of Darkness",
    …
}

4.7.2 Resource Categorization Properties

Publication resources are specified via the default reading order, the resource list, and the links, as defined in this section. These lists contain references to informative resources like the privacy policy, and structural resources like the table of contents.

Note

It is not necessary to include a reference to the manifest in any of these lists.

4.7.2.1 Default Reading Order

The default reading order is a specific progression through a set of digital publication resources. A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.

The default reading order is expressed using the readingOrder property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`readingOrder`	Order of progression through the resources of a digital publication.	One or more `LinkedResource`.	Array of Linked Resources	(None)

Each element of the readingOrder property MUST be expressed either as:

a [json] string representing the URL of the resource; or
an instance of a LinkedResource object.

A single string value represents an instance of a LinkedResource object whose url property is the string's text.

The order of items is significant.

The URLs expressed in the reading order MAY include fragment identifiers, although profiles of this specification MAY restrict both their use as well as what schemes and features are supported. Fragment identifiers are to be interpreted as defined by their respective specifications (e.g., the start location to move the user to, or the range of content to render before moving to the next item in the reading order).

Resources SHOULD NOT be listed more than once in the reading order, as this can lead to unexpected results in user agents (e.g., links to the resource might not resolve to the right instance in the reading order).

The default reading order MAY be omitted when a digital publication consists only of the resource that links to the manifest. When the default reading order is absent, user agents MUST include an entry for the linking resource when compiling the internal representation. See § 7.4.3 Add Default Values for more information.

The default reading order MUST include at least one resource after processing of the manifest.

Example 30 : Expressing the reading order as a simple list of URLs.

{
    …
    "readingOrder" : [
        "html/title.html",
        "html/copyright.html",
        "html/introduction.html",
        "html/epigraph.html",
        "html/c001.html",
        …
    ],
    …
}

Example 31 : Expressing the reading order as LinkedResource objects to provide more information.

{
    …
    "readingOrder" : [
        {
            "type"           : "LinkedResource",
            "url"            : "html/title.html",
            "encodingFormat" : "text/html",
            "name"           : "Title page"
        },
        {
            "type"           : "LinkedResource",
            "url"            : "html/copyright.html",
            "encodingFormat" : "text/html",
            "name"           : "Copyright page"
        },
        …
    ],
    …
}

4.7.2.2 Resource List

The resource list enumerates any additional resources used in the processing or rendering of a digital publication that are not already listed in the default reading order. It is expressed using the resources property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`resources`	List of additional publication resources used in the processing or rendering of a publication.	One or more `LinkedResource`.	Array of Linked Resources	(None)

Each element of the resources property MUST be expressed either as:

a [json] string representing the URL of the resource; or
an instance of a LinkedResource object.

A single string value represents an instance of a LinkedResource object whose url property is the string's text.

The order of items is not significant.

To avoid conflicting information about a resource, a particular resource's URL SHOULD NOT be repeated within the resource list.

The URLs expressed in the resource list SHOULD NOT include fragment identifiers.

The completeness of the resource list can affect the usability of a digital publication in certain reading scenarios (e.g., the ability to read it offline). For this reason, it is strongly advised to provide a comprehensive list of all of the publication's constituent resources beyond those listed in the default reading order.

In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a publication even if some of these resources are not identified as belonging to the publication (e.g., if it is taken offline without them).

Example 32 : Expressing the list of resources via a combination of simple URL strings and LinkedResource objects.

{
    …
    "resources"  : [
        "datatypes.html",
        "datatypes.svg",
        "datatypes.png",
        "diff.html",
        {
            "type"           : "LinkedResource",
            "url"            : "test-utf8.csv",
            "encodingFormat" : "text/csv"
        },
        {
            "type"           : "LinkedResource",
            "url"            : "test-utf8-bom.csv",
            "encodingFormat" : "text/csv"
        },
        …
    ],
    …
}

4.7.2.3 Links

The Links list is used to provide a list of resources that are not required for the processing and rendering of a digital publication (i.e., the content of the publication remains unaffected even if these resources are not available). Links are expressed using the links property.

Term	Description	Required Value	Value Category	[schema.org] Mapping
`links`	List of resources associated with a publication but not required for its processing or rendering.	One or more `LinkedResource`.	Array of Linked Resources	(None)

Each element of the links property MUST be expressed either as:

a [json] string representing the URL of the resource; or
an instance of a LinkedResource object.

A single string value represents an instance of a LinkedResource object whose url property is the string's text.

The order of items is not significant.

It is RECOMMENDED to use LinkedResource objects with their rel values set.

Linked resources are typically made available to user agents to augment or enhance the processing or rendering, such as:

a privacy policy or license that the user agent can offer a link to from a shelf;
a metadata record that the user agent can use to discover and display more information about the publication; or
a dictionary of terms the user agent can process to provide enhanced language help.

Links can also be used to identify resources used in the online rendering of a publication, but that are not essential to include when the publication is taken offline or packaged (e.g., to minimize the size). These include:

large font files that enhance the appearance of the publication but are not vital to its display (i.e., a fallback font will suffice); or
third-party scripts that are not intended for use when a publication is taken offline or packaged (e.g., tracking scripts).

The links list SHOULD include resources necessary to render a linked resource (e.g., scripts, images, style sheets).

Resources listed in the links list MUST NOT be listed in the default reading order or resource list.

User agents MAY ignore linked resources and are not required to take them offline with a publication. These resources SHOULD NOT be included when packaging a publication.

4.7.3 Extensibility

The manifest is designed to provide a basic set of properties for use by user agents in presenting and rendering a digital publication, but MAY be extended in the following ways:

by the provision of linked metadata records; or
through the inclusion of additional properties in the manifest.

This specification does not define how such additional properties are compiled, stored or exposed by user agents in their internal representation of the manifest. A user agent MAY ignore some or all extended properties.

4.7.3.1 Linked records

The manifest MAY be extended through links to metadata records, such as an ONIX [onix] or BibTeX [bibtex], using a LinkedResource object, where:

the rel property of the LinkedResource includes a relevant identifier (e.g., if the linked record contains descriptive metadata, the describedby identifier [iana-link-relations] can be used);
the value of the encodingFormat identifies the MIME media type [rfc2046] defined for that particular type of record, if applicable.

Linked records are included in the resource list when they are part of the publication (i.e., are needed for more than just manifest extensibility). Otherwise, they are included in the links list.

Example 33 : Linking to an external ONIX for Books metadata record.

{
    …
    "links"  : [
        {
            "type"            : "LinkedResource",
            "url"             : "https://www.publisher.example.org/time-machine/onix.xml",
            "encodingFormat"  : "application/onix+xml",
            "rel"             : "describedby"
        },
        …
    ],
    …
}

Editor's note

The application/onix+xml MIME type has not yet been registered by IANA at the time of writing this document and is included in the example for illustrative purposes only.

4.7.3.2 Additional Manifest Properties

Additional properties MAY be included directly in the manifest using public schemes like [schema.org] or [dcterms]. Proprietary terms MAY be used, but it is RECOMMENDED that such terms be included using Compact IRIs [json-ld11], with prefixes defined as part of the context.

Note

Proper use of prefixes and compact IRIs is necessary to use a manifest with a full JSON-LD processor, but is not a requirement for the processing algorithm defined by this specification. Validation of prefixed terms has to be carried out separately if full JSON-LD processing is expected.

Example 34 : Extending the basic data set using a vocabulary prefix declaration.

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {
            "language" : "en",
            "ex"       : "https://example.org/vocab"
        }
    ],
    …
    "ex:region" : "North America",
    …
}

The Schema.org context file [schema.org] defines several prefixes for commonly used vocabularies, such as the Dublin Core Terms (dcterms) [dcterms] and Element Set (dc) [dc11], the FOAF vocabulary (foaf) [foaf], and the Bibliographic Ontology (bibo) [bibo]. Properties from these vocabularies can be used without their prefixes having to be declared.

Example 35 : Extending the basic data using the Schema.org 'copyrightYear' and 'copyrightHolder' terms.

{
    …
    "copyrightYear"   : "2015",
    "copyrightHolder" : "World Wide Web Consortium",
    …
}

Example 36 : Extending the basic data set using the Dublin Core 'subject' term with the 2012 ACM Classification terms.

{
    …
    "dcterms:subject" : ["Web data description languages","Data integration","Data Exchange"],
    …
}

4.8 Resource Relations

4.8.1 Structural Resources

4.8.1.1 Cover

The cover is a resource that user agents can use to present a digital publication (e.g., in a library or bookshelf, or when initially loading the publication).

The cover is identified by the cover link relation.

The link to the cover MUST NOT be specified in the links list.

Editor's note

The cover term is not currently registered in the IANA link relations, but the Working Group expects to add it.

Example 37 : Identifying an HTML cover page.

{
    …
    "resources" : [
        {
            "type"           : "LinkedResource",
            "url"            : "cover.html",
            "encodingFormat" : "text/html",
            "rel"            : "cover"
        },
        …
    ],
    …
}

If the cover is an image (whether embedded in an HTML resource or not), it is strongly advised to follow Success Criterion 1.1.1 [wcag21] for the provision of alternative text and extended descriptions. For image formats that do not provide the ability to embed this information, the name and description properties of LinkedResource can be used to provide alternative text and extended descriptions, respectively. In these cases, the name property SHOULD always be set — the property can be left empty for decorative images.

Example 38 : Identifying a cover image. Alternative text and a description are provided in the name and description properties, respectively.

{
    …
    "resources" : [
        {
            "type"           : "LinkedResource",
            "url"            : "whale-image.jpg",
            "encodingFormat" : "image/jpeg",
            "rel"            : "cover",
            "name"           : "Moby Dick attacking hunters",
            "description"    : "A white whale is seen surfacing from the water to attack a small whaling boat"
        },
        …
    ],
    …
}

Example 39 : A decorative cover. The name property is left empty.

{
    …
    "resources" : [
        {
            "type"           : "LinkedResource",
            "url"            : "cover.jpg",
            "encodingFormat" : "image/jpeg",
            "rel"            : "cover",
            "name"           : "",
        },
        …
    ],
    …
}

If a user agent requires alternative text for a cover image to make an interface accessible, and the name property is not specified, it MAY attempt to construct the alternative text from the publication metadata. This specification does not mandate how such alternative text is created. One method is to construct the alternative text as a string that identifies that the image as the cover, followed by the publication title.

Only one resource MAY be identified as the cover, but additional covers MAY specified using the alternate property (e.g., to provide alternative dimensions or resolution).

Example 40 : Providing a cover image in JPEG and SVG formats.

{
    …
    "resources" : [
        {
            "type"           : "LinkedResource",
            "url"            : "lilliput.jpg",
            "encodingFormat" : "image/jpeg",
            "rel"            : "cover"
            "alternate"      : [
                 {
                     "type"           : "LinkedResource",
                     "url"            : "lilliput.svg",
                     "encodingFormat" : "image/svg+xml",
                     "rel"            : "cover"
                 }
            ]
        },
        …
    ],
    …
}

4.8.1.2 Page List

The page list is a navigational aid that contains a list of static page demarcation points within a digital publication.

The page list is identified by the pagelist link relation.

Editor's note

The pagelist term is not currently registered in the IANA link relations but the Working Group expects to add it.

Only one resource MAY be identified as containing a page list. If multiple instances are specified, user agents MUST use the first instance encountered, with precedence given to the reading order.

The link to the page list MUST NOT be specified in the links list.

Example 41 : Identifying the resource that contains the page list.

{
    …
    "resources" : [
        {
            "type" : "LinkedResource",
            "url"  : "toc_file.html",
            "rel"  : "pagelist"
        },
        …
    ],
    …
}

4.8.1.3 Table of Contents

The table of contents is a navigational aid that provides links to the major structural sections of a digital publication.

The resource that contains the table of contents is identified by the contents link relation [iana-link-relations]. The table of contents proper is the first element inside that resource with the role value doc-toc, as defined in § C.2 HTML Structure.

Only one resource MAY be identified as containing the table of contents. If multiple instances are specified, user agents MUST use the first instance encountered, with precedence given to resources in the reading order.

Profiles of this specification MAY define how to locate a resource containing the table of contents when no resource is identified by the contents relation.

The link to the table of contents MUST NOT be specified in the links list.

The RECOMMENDED structure and processing model for the table of contents is defined in § C. Machine-Processable Table of Contents.

Example 42 : Identifying the resource that contains the table of contents.

{
    …
    "resources" : [
        {
            "type" : "LinkedResource",
            "url"  : "toc_file.html",
            "rel"  : "contents"
        },
        …
    ],
    …
}

4.8.2 Informative Resources

4.8.2.1 Accessibility Report

An accessibility report provides information about the suitability of a digital publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [wcag21], and are an important source of information in determining the usability of a publication.

An accessibility report is identified using the accessibility-report link relation.

Editor's note

The accessibility-report term is not currently registered in the IANA link relations but the Working Group expects to add it.

It is helpful to include the report as a resource of the publication so that it is available, for example, when a publication is read offline.

Note

Providing the accessibility report in a human-readable format, such as HTML [html], helps ensure that it can be accessed and understood by users. Augmenting the report with machine-processable metadata, such as provided in Schema.org [schema.org], will additionally aid in machine processing.

Example 43 : Setting a link to an accessibility report.

{
    …
    "resources" : [
        …
        {
            "type" : "LinkedResource",
            "url"  : "https://www.publisher.example.org/sherlock-holmes-accessibility.html",
            "rel"  : "accessibility-report"
        },
        …
    ],
    …
}

4.8.2.2 Preview

Not all digital publications will be available to all users (e.g., they might be restricted to registered users of a site). In such cases, the publisher might wish to provide a preview of the content to entice users to access the full version.

A preview is identified using the preview link relation [iana-link-relations].

Previews MAY be located externally or included as resources of digital publications.

Example 44 : Identifying a preview as an audio resource of a digital publication.

{
    …
    "links" : [
        {
            "type"           : "LinkedResource",
            "url"            : "preview.mp3",
            "encodingFormat" : "audio/mpeg",
            "rel"            : "preview"
        },
        …
    ],
    …
}

Example 45 : Identifying a preview via an external link.

{
    …
    "links" : [
        {
            "type"           : "LinkedResource",
            "url"            : "https://publisher.example.org/jekyll-hyde-preview.html",
            "encodingFormat" : "text/html",
            "rel"            : "preview"
        },
    	…
    ],
    …
}

4.8.2.3 Privacy Policy

Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses such privacy concerns is consequently an important part of publishing digital publications. Even if no information is collected, such a declaration increases the trust users have in the content.

A link to a privacy policy can be included in the manifest for this purpose. It is helpful to include the privacy policy as a resource of the publication so that it is available, for example, when a publication is read offline.

A privacy policy is identified using the privacy-policy link relation [iana-link-relations].

Example 46 : Identifying a privacy policy via an external link.

{
    …
    "resources"  : [
        …
        {
            "type"           : "LinkedResource",
            "url"            : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
            "encodingFormat" : "text/html",
            "rel"            : "privacy-policy"
        },
        …
    ],
    …
}

4.8.3 Extensions

If additional relations beyond those defined in this specification need to be expressed, the rel property can be extended in one of the following ways:

using relations defined in a relation vocabulary (e.g., the IANA link registry [iana-link-relations] and microformats existing rel values [mfrel]); or
using extension relation types [rfc8288].

7. Processing a Manifest

This section depends on the Infra Standard [infra].

7.1 Introduction

This section is non-normative.

Although a digital publication's manifest is authored as [json-ld11], the steps for processing a manifest described in this section detail how a user agent transforms the manifest into its internal representation of the data. The algorithm describes the process using the terminology and data types defined in [infra], and, if successful results in an [infra] map of the data being returned.

Note

An actual implementation of this algorithm will use the corresponding constructs and data types of whatever language is used.

7.2 Error Handling

The following error types are used in the processing algorithm:

validation error — a non-terminating error that occurs when the value of a key does match its expected input.
fatal error — a terminating error that results, for example, when a manifest cannot be processed or does not match critical validity constraints.

User agents SHOULD expose both validation and fatal errors, but this specification does not prescribe the way this is done.

For validation errors, user agents SHOULD differentiate the severity of the error (i.e., whether a required or recommended practice has been violated).

7.3 Processing Contexts

Some steps in the processing algorithm depend on the expected value category of a term, so the context in which a term is used can affect processing (e.g., url expects an Array of URLs only when the direct property of the Publication Manifest). To differentiate these uses, a context is provided to certain function calls. This context is set to the type of object that initiates the processing call.

The default list of recognized types includes Person, Organization and LinkedResource. Profiles MAY extend this list to include additional object types.

If a context is not provided to a function, the term being processed is considered part of the global context (i.e., it is a direct child of the manifest).

Note

When extending the list of recognized types, the normalize data function might also need to be extended to ensure that all objects have their type specified (e.g., when string values are automatically expanded to objects).

7.4 Generate the Internal Representation

This algorithm takes the following arguments:

text: a UTF-8 string representing the manifest.
base: a URL string that represents the base URL for the manifest.
document: the HTML Document (DOM) Node [html] of the document that references the manifest, when available.

Note

This algorithm does not describe how the manifest is discovered and obtained. The steps by which to do so are defined by each digital publication format.

To generate the internal representation, run the following steps:

Let processed be an empty map that will contain the internal representation of the manifest.
Let manifest be the result of parsing JSON into Infra values given text. If manifest is not a map, fatal error, return failure.

Explanation

Publication manifests have to be expressed as JSON objects, not arrays. After converting the manifest to [infra] types, an additional check is made that the resulting structure is a map.
(§ 4.3 Manifest Contexts) If manifest["@context"] is not set to a list, or the first and second items in manifest["@context"] are not the string values "https://schema.org" and "https://www.w3.org/ns/pub-context", in this order, fatal error, return failure.

Explanation

If the context URLs are not set as expected, the JSON data does not represent a publication manifest.
(§ 4.6 Profile Conformance) Let processed["profile"] be the profile the manifest conforms to. Set processed["profile"] as follows:
1. If manifest["conformsTo"] is not set, or does not include a profile the user agent recognizes as capable of processing and/or rendering, the user agent SHOULD inspect the media type(s) of the resources in the reading order to determine if the publication matches a profile it is capable of processing or rendering. If so, validation error, set processed["profile"] to the matching profile. Otherwise, fatal error, return failure.
2. Otherwise, set processed["profile"] to the first URL in manifest["conformsTo"] the user agent is capable of processing and/or rendering.
Note

The value of manifest["conformsTo"] could be a string or a list at this step in the process.

Explanation

The profile the publication conforms to determines any additional extension steps that have to be performed during processing. These steps are defined by their respective specifications.

The new term profile is created because conformsTo is not restricted to profile identifiers (i.e., the new term provides a persistent identifier of the profile within the internal representation).
(§ 4.4.1 Global Declarations) Let lang be the global language and dir be the global direction obtained from this step. Set each initially to an empty string.

For each context of manifest["@context"], moving from the last item to the first, if context is a map:
1. if lang is an empty string and context["language"] is defined, set lang to context["language"];
2. if dir is an empty string and context["direction"] is defined, set dir to context["direction"];
3. if neither lang nor dir is an empty string, then break.
If lang is neither an empty string nor a well-formed [bcp47] language tag, validation error, set lang to an empty string.

If dir is neither an empty string nor one of the values "ltr" or "rtl", validation error, set dir to an empty string.

Explanation

The global language and direction declarations obtained here are used to set the language and base direction, respectively, for localizable strings without a declaration.

The iterator moves backwards through @context as the last language and direction declarations override any earlier ones.
(§ 4.3 Manifest Contexts) If a profile requires additional validation of the manifest context, those steps are performed here.

Explanation

This extension step allows verification of any information a profile requires be present in the manifest context (e.g., additional context URLs or parameters). These steps have to be performed at this point, as @context terms are removed as part of the data normalization in the next step. A more general step for processing profile data is provided at a later step.
For each term → value of manifest, set processed[term] to the result, when successful, of calling normalize data given term, value, lang, dir and base. If failure is returned, do not add term to processed.

Explanation

The data normalization steps standardize the incoming manifest data to remove any authoring conveniences, such as the ability to use strings where objects or arrays are expected. The resulting processed data are added to the processed variable and are operated on in subsequent steps.
Set processed to the result of running data validation given processed.

Explanation

The data validation checks ensure that the incoming data matches its expected value categories. Any restrictions on the expected values are also enforced at this step, and any invalid data is removed from the final representation.
If a profile specifies additional processing functions that need to be run, those steps are executed at this point.
Set processed to the result of running add default values, when successful, given processed and document, when specified. Otherwise, terminate processing, return failure.

Explanation

This step checks if any information missing from the manifest can be obtained from the HTML document that links to the document, or from other sources.
Return processed.

Note

For a visualization of the resulting structure, see § A. Internal Representation Data Model.

7.4.1 Normalize Data

To normalize data for a property term's value, with the global language lang, global direction dir, base URL base, and optional context context run these steps:

Let normalized be the value of value.

Explanation

The data normalization steps are performed on the copy of the incoming value held in the normalized variable defined in this step. This variable is returned at the end of a successful normalization process.
(§ 4.3 Manifest Contexts) If term is @context, return failure.

Explanation

@context provides information for the initial processing of the manifest, but is not retained in the internal data representation. Returning a failure signals to remove the term.
(§ 4.2.7 Arrays) If, depending on context, term expects an array and value is not a list, set normalized to the list: « value ».
Explanation

Various terms require their values to be arrays, but, for the sake of convenience, authors are allowed to use a single value instead of a one element array. For example,
Example 51
```
{
    …
    "name"   : "Et dukkehjem",
    "author" : "Henrik Ibsen",
    …
}
```
yields:
Example 52
```
«[
    …
    "name"   → « "Et dukkehjem" »,
    "author" → « "Henrik Ibsen" »,
    …
]»
```
(§ 4.2.4.2 Entities) If, depending on context, term expects an array of entities, for each entity of normalized:
1. if entity is a string, set entity to the map:
```
«[
    "type" → « "Person" »,
    "name" → entity
]»
```
2. otherwise, if entity is not a map, validation error, remove entity from normalized.
3. otherwise, if entity["type"] is not set, set it to the list: « "Person" ». If entity["type"] is set but does not include the value Person or Organization, append the value Person to the list.
Explanation

Creators (authors, editors, etc.), are expected to be explicitly defined as an object, but, for the sake of convenience, only their name has to be specified in the manifest. For example:
Example 53
```
{
    …
    "author": "Ralph Ellison",
    …
}
```
This rule converts such string values to maps with a default type of Person, yielding the following for the preceding example:
Example 54
```
«[
    …
    "author" → « 
        «[
            "type" → « "Person" »
            "name" → "Ralph Ellison"
        ]»
    »,
    …
]»
```
For simplicity, the conversion of name to a localizable string is described by a later step.

(§ 4.2.4.1 Localizable Strings) If, depending on context, term expects an array of localizable strings, for each item of normalized:

if item is a string, set item to the map:
```
«[
    "value" → item,
    "language" → lang,
    "direction" → dir
]»
```
if lang or dir is not set, or is an empty string, remove item["language"] or item["direction"], respectively.
otherwise, if item is not a map, validation error, remove item from normalized.
otherwise, process the map in item as follows:
1. If item["language"] is not set, set it to the value of lang when lang is set and is not an empty string.
  
  Otherwise, if item["language"] is null, remove item["language"].
2. If item["direction"] is not set, set it to the value of dir when dir is set and is not an empty string.
  
  Otherwise, if item["direction"] is null, remove item["direction"].

Explanation

Natural language text values are expected to be explicitly defined as localizable string objects, but, for the sake of convenience, can be simple strings in the manifest. For example, if no language information has been provided via the global language declaration then:

Example 55

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
    "name"     : ["La Comédie humaine"],
    …
}

yields:

Example 56

«[
    "name"     → «
        «[
            "value" → "La Comédie humaine"
        ]»
    »,
    …
]»

If, however, an explicit language has been provided in the manifest, that language is added to the localizable string object. For example,

Example 57

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context",
        {"language": "fr"}
    ],
    "name"     : ["La Comédie humaine"],
    …
}

yields:

Example 58

{
    "name"     → «
        «[
            "value"    → "La Comédie humaine"
            "language" → "fr"
        ]»
    »,
    …
}

A local setting or a local null value prevents the global value from taking effect.

Example 59

{
    "@context" : [
        "https://schema.org",
        "https://www.w3.org/ns/pub-context", 
        {"language":"fr"}
    ],
    …
    "name" : [{
        "value" : "La Comédie humaine"
    }],
    "publisher" : [{
        "type":["Organization"],
        "name":[{
            "value": "Hachette",
            "language": null
        }]
    }],
    …
}

yields:

Example 60

{
    "name"     → «
        «[
            "value"    → "La Comédie humaine"
            "language" → "fr"
        ]»
    »,
    "publisher"    → «
        «[
            "type" → « "Organization" »,
            "name" → «
                «[
                    "value" → "Hachette",
                ]»
        ]»
    »,
    …
}

(§ 4.2.4.3 Linked Resources) If, depending on context, term expects an array of LinkedResources, for each resource of normalized:
1. if resource is a string, convert resource to the map:
```
«[
    "type" → « "LinkedResource" »,
    "url" → resource
]»
```
2. otherwise, if resource is not a map, validation error, remove resource from normalized.
3. otherwise, if resource["type"] is not set, set it to the list: « "LinkedResource" ». If resource["type"] is set but does not include the value LinkedResource, append that value to the list.
Explanation

Resource links are expected to be explicitly designed as an object of type LinkedResource, but, for the sake of convenience, only their absolute or relative URL has to be specified in the manifest. For example,
Example 61
```
{
    …
    "resources" : [
        "css/book.css",
        …
    ],
    …
}
```
This step converts the string values to objects, yielding the following for the preceding example:
Example 62
```
«[
    …
    "resources" → «
        «[
            "type" → « "LinkedResource" »,
            "url"  → "css/book.css"
        ]»,
        …
    »,
    …
]»
```
For simplicity, the conversion of relative paths to absolute is described by a later step.
(§ 4.2.5 URLs) If, depending on context, term expects a URL or array of URLs:
1. if normalized is a string, set normalized to the result of running convert to absolute URL, when successful, given normalized. If failure is returned, return failure.
2. otherwise, if normalized is a list, for each item of normalized, set item to the result of running convert to absolute URL, when successful, given normalized. If failure is returned, remove item from normalized.
3. otherwise, validation error, return failure.
Explanation

Relative URLs in the manifest are resolved against the base value to obtain absolute URLs. For example:
```
"url": "chapter01.html"
```
for a publication hosted at https://example.org/publications/wuthering-heights would yield:
```
"url" → "https://example.org/publications/wuthering-heights/chater01.html"
```
(§ 8. Modular Extensions, extension point) If a profile defines processing steps for profile-specific terms, those steps are executed at this point.
Recursively check normalized as follows to ensure that all properties get normalized:
1. if normalized is a list, for each item of normalized that is a map:
  1. if item["type"] is set and includes a recognized type, for each key → keyValue of item, set key to the result of running normalize data, when successful, given key, keyValue, lang, dir, base and using item["type"] as the context. If failure is returned, remove key from item.
  2. otherwise, do nothing.
2. otherwise, if normalized is a map:
  1. if normalized["type"] is set and includes a recognized type, for each key → keyValue of normalized, set key to the result of running normalize data, when successful, given key, keyValue, lang, dir, base and using normalized["type"] as the context. If failure is returned, remove key from normalized.
  2. otherwise, do nothing.
3. otherwise, do nothing.
Explanation

To ensure that all the properties in the manifest get processed, this step recursively checks normalized for additional map entries to process. If normalized is a list, each item is inspected to determine if it is a map that can be processed.

If a failure is returned, the item is removed from the map.
return normalized.

7.4.1.1 Convert to Absolute URL

To convert to absolute URL url, with a base URL base, run the following steps:

If url or base is not a string, or is an empty string, validation error, return failure.

Explanation

This step checks that both url and base are non-empty strings before attempting to use them.
Set url to the result of running the URL parser [url], when successful, with url as input and base as the base URL. If failure is returned, validation error, return failure.

Explanation

This step calls the URL parser function on the url to be processed. If the url is not an absolute URL, the parser converts it to one using the base URL.

If parsing returns a failure, a failure is returned to the caller to indicate to remove the URL.
Return url.

7.4.2 Data Validation

To perform data validation on map data, run the following steps:

For each term → value of data, set term to the result of running the global data checks, when successful, given term and value. If failure is returned, remove data[term].

Explanation

This step passes each entry to a set of global validation checks that need to be run on the value and recursively on any properties within the value.

A failure is returned if the property is invalid and has to be removed.
If a profile specifies data validation checks, those steps are executed at this point.

Explanation

Profile validation steps are prioritized over the default steps so that if profiles have, for example, different default values to apply, those values get applied.
(§ 4.5 Publication Types) If data["type"] is not set or is an empty list, validation error, set to « "CreativeWork" ».
(§ 4.7.1.2 Accessibility) If data["accessModeSufficient"] is set, for each item of data["accessModeSufficient"], if item["type"] is not set or does not contain "ItemList", remove item from data["accessModeSufficient"].
(§ 4.7.1.4 Canonical Identifier) If data["id"] is not set or is an empty string, validation error.
(§ 4.7.1.6 Duration) If data["duration"] is set and is not a valid duration value, per [iso8601-1], validation error, remove data["duration"].
(§ 4.7.1.7 Last Modification Date) If data["dateModified"] is set and is not a valid date or date-time per [iso8601-1], validation error, remove data["dateModified"].
(§ 4.7.1.8 Publication Date) If data["datePublished"] is set and is not a valid date or date-time per [iso8601-1], validation error, remove data["datePublished"].
(§ 4.7.1.9 Publication Language) If data["inLanguage"] is set, for each item of data["inLanguage"], if item is not well-formed [bcp47], validation error, remove item from data["inLanguage"].
(§ 4.7.1.10 Reading Progression Direction) If data["readingProgression"] is not set, set to "ltr". Otherwise, if it is not one of the required directional values, validation error, set to "ltr".
(§ 5. Publication Resources) Obtain and verify the unique URLs within the publication bounds as follows:
1. If readingOrder is set, let readingOrderURLs be the result of running get unique URLs given readingOrder. Otherwise, let readingOrderURLs be an empty ordered set.
2. If resources is set, let resourcesURLs be the result of running get unique URLs given resources. Otherwise, let resourcesURLs be an empty ordered set.
3. Set data['uniqueResources'] to the union of readingOrderURLs and resourceURLs.
Explanation

This step gets the list of unique URLs within the reading order and the resource list. It then sets data['uniqueResources'] the union of these two sets, which represents the complete list of unique resources within the bounds of the publication.

This step also warns if either the readingOrder or resources contains duplicate resource declarations. The validation errors are emitted as part of obtaining the unique URLs from each list.
(§ 4.7.2.3 Links) If data["links"] is set, for each link in data["links"]:
1. let url be the result of running URL serializer [url] on link["url"] with the exclude fragment flag set.
2. if data["uniqueResources"] contains url, validation error, remove link from data["links"], then continue.
3. if link["rel"] is not set or is an empty list, validation error, then continue.
4. if link["rel"] contains any of the case-insensitive values "contents", "pagelist" or "cover", validation error, remove link from data["links"].
Explanation

After obtaining the list of unique publication resources in the previous step, the links property is checked to ensure that any linked resources are not also listed as publication resources.

If the link does not specify a rel value, a warning is raised. If its rel property specifies a structural resource, the link is removed, as structural resources have to be within the publication bounds.
(§ 4.8.1 Structural Resources) Verify the use of structural relations as follows:
1. Set resources to the value of data["readingOrder"], when defined, otherwise to an empty list. Extend resources with data["resources"], when defined.
2. If more than one item in resources has a rel entry that contains the case-insensitive value "contents", validation error.
3. If more than one item in resources has a rel entry that contains the case-insensitive value "pagelist", validation error.
4. If more than one item in resources has a rel entry that contains the case-insensitive value "cover", validation error.
  
  If the cover(s) have an encodingFormat entry that specifies an image media type (image/*), and do not have a name entry, validation error.
Explanation

This checks the resources specified in the reading order and resource list to verify that only one instance of a table of content, page list and cover have been specified.

For covers, it also checks that a name has been set on image-based formats for accessibility purposes.
For each term → value of data, if running remove empty arrays given the variables term and value returns failure, remove data["term"].

Explanation

As the processing of the manifest involves removing invalid values at various stages, the final data structure might end up with some lists that not no longer contain any values. This step iterates back over the data and removes any such empty lists.
Return data.

7.4.2.1 Global Data Checks

To process the global data checks on a property term's value with an optional context context, run these steps:

(§ 4.2 Value Categories) If term has a known value category, set value to the result of calling verify value category, when successful, given the variables term, value and context. If failure is returned, return failure.

Otherwise, return value.

Explanation

This step verifies that the value of the term matches the expected category required for the term. For example, the abridged term requires a boolean value, so any other value used with the term will result in a failure.

If a failure occurs calling the function, this step also returns a failure so that the property is removed from the final data set.

Terms without a known value category are not processed, so the incoming value is returned.
Recursively descend into value as follows to check any sub-properties first:
1. if value is a map:
  1. if value["type"] includes a recognized type, for each key → keyValue of value, set value[key] to the result of running global data checks, when successful, given key, keyValue and using value["type"] as the context. If failure is returned, remove value[key].
  2. otherwise, do nothing.
2. otherwise, if value is a list, for each item of value, if item is a map:
  1. if item["type"] includes a recognized type, for each key → keyValue of item, set item[key] to the result of running global data checks, when successful, given key, keyValue and using item["type"] as the context. If failure is returned, remove item[key].
  2. otherwise, do nothing.
3. otherwise, do nothing.
Explanation

To ensure that all the properties in the manifest get processed, this step recursively checks each entry for additional map entries to process. If the value is a list, each item is inspected to determine if it is a map that can be processed.

Its placement also ensures that all subproperties are checked first, so that the higher-level checks later in the step are tested after any invalid values are removed.
(§ 4.4.1 Global Declarations and § 4.4.2 Item-Specific Declarations) If term expects an array of LocalizableStrings, for each item of value:
- if item["value"] is not set, remove item from value.
- if item["language"] is set and its value is not well-formed [bcp47], validation error, remove item["language"].
- if item["direction"] is set and its value is not one of "ltr" or "rtl", validation error, remove item["direction"].
Explanation

This step checks that localizable strings have values, that their language declarations are well formed, and that their direction declarations have either the value "ltr" or "rtl".
(§ 4.2.4.2 Entities) If term expects an array of entities, for each item of value, check whether item["name"] is set:
- If not, validation error, remove item from value.
- If so, for each name of item["name"], if name["value"] is not set, or is an empty string, remove name from item["name"].
Explanation

This step ensures that all entities have a name. Entities without a name are removed.
(§ 4.2.4.3 Linked Resources) If term expects an array of LinkedResources, for each resource of value:
- if resource["url"] is not set, or its value is an empty string, validation error, remove resource from value, then continue.
  
  Otherwise, if resource["url"] is not a valid URL [url], validation error, remove resource from value, then continue.
- if resource["duration"] is set and is not a valid duration value, per [iso8601-1], validation error, remove resource["duration"].
Explanation

This step performs the following two checks on the terms of a LinkedResource:
1. If a URL is not specified, or is invalid, the LinkedResource is removed.
2. If the duration of the resource is specified, or is not a value ISO 8601 duration value, the duration property is removed.
Return value.

7.4.2.2 Verify Value Category

To verify value category of a property term's value with a context context, run these steps:

If, depending on the context, term expects an array:
1. if value is not a list, validation error, return failure.
2. otherwise, for each item of value:
  1. if item does not match the expected value category of the array, validation error, remove item from value, then continue.
  2. if item is a map, for each key → keyValue of item, if key has an expected value category, set key to the result of running verify value category given key, keyValue, and using item["type"] as the context. If the result of processing item is an empty map, validation error, remove item from value.
  If the result of processing value is an empty array, validation error, return failure.
Otherwise, if, depending on the context, term expects a map:
1. if value is not a map, validation error, return failure.
2. otherwise, for each key → keyValue of value, if key has an expected value category, set key to the result of running verify value category given key, keyValue and using value["type"] as the context. If the result of processing value is an empty map, validation error, return failure.
Note

This step currently only exists for use by profiles. The properties defined in this specification all accept arrays of objects.
Otherwise, if, depending on the context, value does not match the expected value category of term, validation error, return failure.
Return value.

Explanation

This function checks that the value of the term being processed matches its expected value category. The function is recursively called when the value is a list or map to ensure that all properties in the manifest get checked.

7.4.2.3 Get Unique URLs

To get unique URLs from resources, run the following steps:

Let uniqueURLs be an empty ordered set.
For each resource of resources:
1. let url be the result of running URL serializer [url] on resource["url"] with exclude fragment flag set.
2. if uniqueURLs contains url, validation error. Otherwise, append url to uniqueURLs.
3. if resource["alternate"] is set, for each alternate of resource["alternate"]:
  1. let alt_url be the result of running URL serializer [url] on alternate["url"] with exclude fragment flag set.
  2. if uniqueURLs contains alt_url, validation error.
  3. otherwise, append alt_url to uniqueURLs.
Return uniqueURLs.

Explanation

This function takes a list of LinkedResource objects — from either the reading order or resource list — and returns the set of unique URLs. If duplicates are encountered, warnings are issued.

7.4.2.4 Remove Empty Arrays

To remove empty arrays from a property term's value, run these steps:

If value is an empty list, return failure.
Otherwise, if value is a map, for each key → keyValue of value, if running remove empty arrays given key and keyValue returns failure, remove value[key].

Explanation

This function checks that the value of the term being processed is not an empty list. A term that initially has a list can lose entries as it gets processed (i.e., when the list items are invalid).

7.4.3 Add Default Values

To add default values for missing properties in map data with an optional HTML Document (DOM) Node [html] document, run the following steps:

(§ 4.7.1.11 Title) If data["name"] is not set:
- Let title be an empty map. Set its values as follows:
  - if document is set, if the title element [html] of document is set and is not empty, set title["value"] to the text content of the title element.
    
    Set title["language"] to the language [html], if available, and title["direction"] to the base direction [html] if that value is available and its value is either "ltr" or "rtl".
  - otherwise, validation error, generate a value for title["value"] (see the separate note for details). Set title["language"] and title["direction"] as appropriate for the generated title.
- Set data["name"] to the list: « title ».
Explanation

This step adds the content of the title element of document when the name property is not specified in the manifest. For example:
Example 63
```
<html>
<head lang="en">
    <title>The Golden Bough</title>
    …
    <script type="application/ld+json">
    {
        "@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
        …
    }
    </script>
```
yields:
Example 64
```
«[
    …
    "name" → «
        «[
            "value"    → "The Golden Bough",
            "language" → "en"
        ]»
    »,
    …
]»
```
(§ 4.7.2.1 Default Reading Order and § 6.1 Linking) If data["readingOrder"] is not set:
- if either document or document.URL is not set, fatal error, return failure.
- set data["readingOrder"] to an empty list and append the map «[ "url" → document.URL ]».
- append document.URL to data["uniqueResources"].
Explanation

If the Digital Publication consists only of the referencing document, the default reading order can be omitted; it will consist, automatically, of that single resource.
If a profile specifies default values the user agent has to generate, those steps are executed at this point.
(§ 6.1 Linking) If document.URL is set and data["uniqueResources"] does not contain document.URL, validation error.

Explanation

If the page that links to the manifest is not listed as a unique resource of the publication after processing core and extension default value rules, an error is raised as it has to be a publication resource.
Return data.

C. Machine-Processable Table of Contents

C.1 Introduction

This section is non-normative.

To facilitate navigation within pages and across sites, HTML uses the nav element [html] to express lists of links. Although generic in nature by default, the purpose of a nav element can be more specifically identified by use of the role attribute [html]. In particular, the doc-toc role from the [dpub-aria-1.0] vocabulary identifies the nav element as the digital publication's table of contents.

Including an identifiable table of contents is an accessible way to produce any digital publication, but due to the flexibility of HTML markup, it also presents challenges for user agents trying to extract a meaningful hierarchy of links (e.g., to provide a custom view available from any page). To avoid duplicating the tables of contents for different uses, this section defines a syntax that is both human friendly and commonly used while still providing enough structure for user agent extraction.

Authors have a choice of lists (ordered or unordered) to construct their table of contents. By tagging each link within these lists in anchor tags (a elements), user agents can easily differentiate the information they need from any peripheral content (asides) or stylistic tagging that has also been added. The table of contents can consist of both active links (with an href attribute) and inactive links (excluding the href attribute), providing additional flexibility in how the table of contents is constructed (e.g., to omit links to certain headings or only link to certain content in a preview).

Note, however, that user agents are not required to preserve the presentational aspects of the table of contents (i.e., the user agent is typically extracting the information in order to present it in a common way across all publications). User agents are only expected to retain the text content of the link elements, for example, so text styling, inline images and other non-text content might be lost. Similarly, list styling and even how many levels deep of linking to display are at the discretion of the user agent. For this reason, linking to the presentational table of contents so that users are not limited to the machine-processed one is advised.

C.2 HTML Structure

The table of contents is expressed via an [html] element (typically a nav element). This element MUST be identified by the role attribute [html] value "doc-toc" [dpub-aria-1.0], and MUST be the first element in the document in document tree order [dom] with that role value. The element MAY be hidden from users.

The manifest SHOULD identify the resource that contains the table of contents.

Although the content model of the nav element is not restricted, user agents will only be able to extract a usable table of contents when the following markup guidelines are followed:

Table of Contents Title

Although a title for the table of contents is optional, to avoid having a user agent generate a placeholder title when one is needed, it is advised to add one. Titles are specified using any of the [html] h1 through h6 elements. Note that only the first such element is recognized as the title. If a heading element is not found before the list of links, user agents will assume that one has not been specified.

List of Links

The first [html] ol or ul list element encountered in the nav element is assumed to contain the list that defines the links into the content. This list will be found even if it is nested inside of div elements, for example, as the algorithm ignores elements that are not relevant to its processing. The list cannot occur inside of any skipped elements, however, since their internal contents are not evaluated.

If the nav element does not contain one of these elements, then user agents will not register the digital publication as containing a usable table of contents (e.g., a machine-rendered option will not be available).

Branches

If the table of contents is considered as a tree of links, then each list item (li element) inside of the list of links represents one branch. Each of these branches has to have a name and optional destination in order to be presented to users, and this information is obtained from the first a element found within the list item, wherever it is nested (again, excluding any a elements inside of skipped elements.)

The link destination for the branch is obtained from the a element's href attribute, when specified. This attribute can be omitted if a link is not available (e.g., in a preview) or not relevant (e.g., a grouping header). When providing a link into the content, it is also possible to specify the relation of the linked document (in a rel attribute) and the media type of the linked resource (in a type attribute).

After finding the a element that labels the branch, user agents will continue to inspect the markup for another list element (i.e., sub-branches). If a list is found, it is similarly processed to extract its links, and so on, until there are no more nested branches left to process.

Skipped Elements

A small set of elements are ignored when the parsing table of contents to avoid misinterpretation. These are the [html] sectioning content elements and sectioning root elements. The reason they are ignored is because they can define their own outlines (i.e., they can represent embedded content that is self-contained and not necessarily related to the structure of content links).

Any element that has its hidden attribute set is also skipped, since hidden elements are not intended to be directly accessed by users.

Although these elements can be included in the nav element, care has to be taken not to embed important content within them (e.g., do not wrap a section element around the list item that contains all the links into the content).

Ignored Elements

All elements that are not relevant to extracting the table of contents, and are not skipped, are ignored. Unlike skipped elements, ignoring means that user agents will continue to search inside them for relevant content, allowing greater flexibility in terms of the tagging that can be used.

C.2.1 Examples

This section is non-normative.

Example 65 : A basic multi-level table of contents.

Note that different list types can be used for the different levels.

<nav role="doc-toc">
   <h2>Contents</h2>

   <ol>
      <li>
        <a href="discourses.html">ZARATHUSTRA'S DISCOURSES.</a>
         <ul>
            <li><a href="discourses.html#s01">THE THREE METAMORPHOSES.</a></li>
            <li><a href="discourses.html#s02">THE ACADEMIC CHAIRS OF VIRTUE.</a></li>
            <li><a href="discourses.html#s03">BACKWORLDSMEN.</a></li>
            …
         </ul>
      </li>
      …
   </ol>
</nav>

Example 66 : A table of contents with ignored content.

The supplementary descriptive information is ignored by user agents.

<nav role="doc-toc">
   <h2>Contents</h2>

   <ol>
      <li>
         <div class="title"><a href="c01.html">CHAPTER I</a></div>
         <div class="description">Biographical and Introductory.</div>
      </li>
      <li>
         <div class="title"><a href="c02.html">CHAPTER II</a></div>
         <div class="description">A New System of Alternating Current Motors and Transformers.</div>
      </li>
      …
   </ol>
</nav>

Example 67 : A table of contents for a preview.

The a elements that link to content the user does not have access to do not include href attributes.

<nav role="doc-toc">
   <h2>Contents</h2>

  <ol>
     <li><a href="xmas_carol.html">Marley's Ghost</a></li>
     <li><a>The First of Three Spirits</a></li>
     <li><a>The Second of Three Spirits</a></li>
     <li><a>The Last of the Spirits</a></li>
     <li><a>The End of It</a></li>
  </ol>

   …
</nav>

Example 68 : A table of contents with unlinked headings.

In this example, the author names are not relevant link locations so href attributes are not included on their enclosing a elements.

<nav role="doc-toc">
   <h2>Contents</h2>

   <ol>
      <li>
         <a>Faraday, Michael</a>
         <ol>
            <li><a href="faraday.html#s01">Experimental Researches in Electricity</a></li>
            <li><a href="faraday.html#s02">The Chemical History of a Candle</a></li>
         </ol>
      </li>
      <li>
         <a>Forel, Auguste</a>
         <ol>
            <li><a href="forel.html">The Senses of Insects</a></li>
         </ol>
      </li>
      …
   </ol>
</nav>

C.3 User Agent Processing

This section depends on the Infra Standard [infra].

This section defines an algorithm for extracting a table of contents from a nav element. It is defined in terms of a walk over the nodes of a DOM tree, in tree order [dom], with each node being visited when it is entered and when it is exited during the walk. Each time a node is visited, it can be seen as triggering an enter or exit event. In some steps, user agents are provided a choice in how to process the content to provide flexibility for different presentation models.

Note

This algorithm is not defined in purely event driven terms, as inspecting all descendant nodes is not always necessary to obtain the needed information from the DOM. In some cases, an element, and all its descendants, is skipped immediately after it is processed on enter. An event approach could be applied but would require modifying the algorithm to process/ignore the skipped nodes.

Note

User agents can process and internalize the resulting structure using any language that can represent the final form of the data.

For the purposes of this algorithm, a list element is defined as either an [html] ol or ul element.

The following algorithm MUST be applied to a walk of a DOM subtree rooted at the first element in document order with the role attribute value doc-toc, regardless of whether the element has been declaratively hidden [html] or styled by CSS not to be visible:

Note

The rules for locating the resource containing the table of contents element are defined in § 4.8.1.3 Table of Contents.

If a table of contents element is not found, the publication does not have a table of contents that can be used for machine rendering purposes.

Let toc be the map «[ "name" → "", "entries" → « » ]» representing the table of contents.
Explanation

This step initializes the map that will store the title and the branches of the table of contents. In this map:
1. toc["name"] represents the title of the table of contents.
2. toc["entries"] represents the branches of the table of contents.
Initialize the stack branches to hold branches of the table of contents as they are created.

Explanation

The stack is used to hold branches that are not yet complete. As a new sub-branch is encountered, the parent gets pushed onto the stack so it can be retrieved later.
Let current_toc_node be a variable set to null.

Explanation

current_toc_node is used to hold the map that represents the branch of the table of contents that is currently being processed.
Walk over the DOM in tree order [dom], starting with the element the table of contents is being built from, and trigger the first relevant step below for each element as the walk enters and exits it.
1. When entering a heading content element:
  
  Run these steps:
  1. If branches is empty, and toc["name"] is an empty string, set toc["name"] to one of the following:
    - the descendant content of the element (to preserve any HTML tags);
    - the text string obtained from the descendant content (e.g., by calculating the accessible name [accname-1.1] of the element).
    If the resulting value of toc["name"] is an empty string (e.g., after removing any presentational elements and trimming all leading and trailing whitespace), set toc["name"] either to a placeholder value or to null.
  2. Skip further processing of the element and continue to the next.
  Explanation
  
  This step identifies the heading for the table of contents. A heading is only processed if the value of toc["name"] is an empty string (i.e., no headings have yet been encountered).
  
  Whether a user agent sets name to the descendant content of the heading element, or generates a text string from it, depends on whether it will re-use any descendant tagging in the presentation (e.g., to retain images, MathML, ruby and other content that does not translate to text easily).
  Example 69 : Visualization of the toc object with a heading.
```
«[
    "name"    → "Contents",
    "entries" → « »
]»
```
  If name is not an empty string, or is null, then a previous heading has already been encountered or content has been encountered that indicates the nav element does not have a heading (e.g., a list has already been processed, since the heading would not follow the list of links).
  Example 70 : Visualization of the toc object without a heading.
```
«[
    "name"    → null,
    "entries" → « »
]»
```
  If a heading is not specified, the user agent can provide its own for later use.
2. When entering a list element:
  
  Run these steps:
  1. If the toc["name"] is an empty string, set toc["name"] to null.
  2. If current_toc_node is not null:
    1. If current_toc_node["entries"] is null or a non-empty list, skip further processing of the element and continue to the next.
    2. Otherwise, push current_toc_node onto branches and then set current_toc_node to null.
  3. Otherwise, if branches is empty:
    1. If toc["entries"] is null or a non-empty list, skip further processing of the element and continue to the next.
    2. Otherwise, do nothing.
  Explanation
  
  This algorithm does not process multiple lists in a single branch or at the root of the nav element, so if a list has already been encountered (the entries property contains one or more branches or is set to null), this list is skipped.
  
  If a list is encountered and the table of contents (toc) still does not have a name (i.e., no heading element has been encountered), the table of contents is assumed to not have a heading (i.e., the heading for the table of contents cannot appear after the first list of entries). The value of the name property is changed from an empty string to null as no further headings encountered apply, either.
3. When exiting a list element:
  1. If branches is not empty, pop the top map from branches and set current_toc_node to it.
  2. Otherwise, if toc.entries contains an empty list, set it to null.
  Explanation
  
  This step resets current_toc_node back to the parent object after all of its child branches have been processed.
  
  If there are no branches in the stack, the toc.entries is set to null if it doesn't contain any items (to avoid processing any further lists at the root level).
4. When entering a list item element, set current_toc_node to the following map:
```
«[
    "name" → null,
    "url"  → null,
    "type" → null,
    "rel"  → null,
    "entries" → « »
]»
```
  Explanation
  
  Each list item represents a possible new branch in the table of contents, so whenever one is encountered a new blank object is created in current_toc_node.
  
  This object gets populated with information as a descendant a element and list are encountered.
5. When exiting a list item element:
  
  Run these steps:
  1. If current_toc_node["entries"] contains an empty list, set it to null.
  2. If current_toc_node["name"] is null or an empty string:
    1. if current_toc_node["entries"] is not null, set current_toc_node["name"] to a placeholder value or null;
    2. otherwise, set current_toc_node to null and exit this processing step.
  3. If branches is not empty, append current_toc_node to the entries property of the map at the top of branches. Otherwise, append current_toc_node to toc["entries"].
  4. Set current_toc_node to null.
  Explanation
  
  Exiting a list item indicates that processing of the current branch is complete. Before adding this branch to its parent's entries array, the branch needs to be tested to see if it has a name and/or any sub-branches. If it does not have a name but has sub-branches, the branch is kept. The user agent can either supply a placeholder value of its own creation or set the value to null. If it does not have a name or any branches, it is invalid and is discarded.
  
  To determine where to merge the branch, the stack is checked. If there are no items in the stack, it is added into the entries property of the root toc object (i.e., it is a top-level branch). Otherwise, it gets added into the entries property of the object immediately preceding it in the stack.
  
  As a final step, current_toc_node is reset back to null.
  Example 71 : Visualization of a branch merge.
  
  If the following map is in branches:
```
«[
    "name"    → "Section 1",
    "url"     → "http://example.com/contents.html#s1",
    "type"    → "text/html",
    "rel"     → null,
    "entries" → « »
]»
```
  And the following map is in current_toc_node:
```
«[
    "name"    → "Section 1.1",
    "url"     → "http://example.com/contents.html#s1.1",
    "type"    → "text/html",
    "rel"     → null,
    "entries" → null
]»
```
  Then only the following single object remains after merging:
```
«[
    "name"    → "Section 1",
    "url"     → "http://example.com/contents.html#s1",
    "type"    → "text/html",
    "rel"     → null,
    "entries" → «
        «[
            "name"    → "Section 1.1",
            "url"     → "http://example.com/contents.html#s1.1",
            "type"    → "text/html",
            "rel"     → null,
            "entries" → null
        ]»
    »
]»
```
6. When entering an anchor element and current_toc_node is not null:
  
  Run these steps:
  1. If current_toc_node["name"] is not null, do nothing.
  2. Otherwise:
    1. Set current_toc_node["name"] to one of the following:
      - the descendant content of the anchor element (to preserve any HTML tags);
      - the text string obtained from the descendant content (e.g., by calculating the accessible name [accname-1.1] of the element).
    2. If the element has an href attribute and the URL in the attribute resolves to a resource in uniqueResources, set current_toc_node["url"] to the value.
    3. If the element has a type attribute, and the value of the attribute is not an empty string after trimming leading and trailing white space, set current_toc_node["type"] to the trimmed value.
    4. If the element has a rel attribute, and the value of the attribute is not an empty string after trimming leading and trailing white space, split the trimmed value on whitespace and set current_toc_node["rel"] to the resulting list of tokens.
    Skip further processing of the element and continue to the next.
  Explanation
  
  This step processes anchor tags to obtain values for the name and url properties of a branch.
  
  If the name of the current branch is already defined, then processing of this element is terminated (i.e., to avoid processing multiple links for a single branch).
  
  Whether a user agent sets the name of the entry to the descendant content of the a element, or generates a text string from it, depends on whether it will re-use any descendant tagging in the presentation (e.g., to retain images, MathML, ruby and other content that does not translate to text easily).
  
  In addition to having an href attribute specified, it is necessary that it resolve to a resource that belongs to the digital publication to meet the requirements of this specification. If not, the branch is retained but the entry will not be linkable.
  
  Additional information about the target of the link — the type of resource and its relation — is also retained.
  Example 72 : Visualization of a link to an SVG image.
```
«[
    "name"    → "In the Beginning",
    "url"     → "http://example.com/page1.svg",
    "type"    → "image/svg",
    "rel"     → null,
    "entries" → « »
]»
```
7. When entering a sectioning content element, a sectioning root element, or an element with a hidden attribute:
  
  Skip further processing of the element and continue to the next.
  
  Explanation
  
  As sectioning and sectioning root elements can define their own outlines, descending into them poses problems for generating the table of contents (i.e., they may contain content that is not directly related). As a result, they are skipped over when encountered to prevent their child content from being processed.
8. Otherwise: do nothing.
  
  Explanation
  
  For all other elements, this step allows their descendant elements to continue to be processed.
After completing the DOM walk, if toc["entries"] contains a non-empty list, return toc. Otherwise, return null.

Explanation

If the entries array in the root toc object does not contain any branches (either because no list was found in the nav element or the list did not contain any conforming list items), then the algorithm did not produce a usable table of contents.

Name	Publication Manifest
`abridged`	§ 4.7.1.1 Abridged
`accessMode`	§ 4.7.1.2 Accessibility
`accessModeSufficient`	§ 4.7.1.2 Accessibility
`accessibilityFeature`	§ 4.7.1.2 Accessibility
`accessibilityHazard`	§ 4.7.1.2 Accessibility
`accessibilitySummary`	§ 4.7.1.2 Accessibility
`artist`	§ 4.7.1.5 Creators
`author`	§ 4.7.1.5 Creators
`conformsTo`	§ 4.6 Profile Conformance
`@context`	§ 4.3 Manifest Contexts
`contributor`	§ 4.7.1.5 Creators
`creator`	§ 4.7.1.5 Creators
`dateModified`	§ 4.7.1.7 Last Modification Date
`datePublished`	§ 4.7.1.8 Publication Date
`direction`	§ 4.4.1 Global Declarations
`duration`	§ 4.7.1.6 Duration
`editor`	§ 4.7.1.5 Creators
`id`	§ 4.7.1.4 Canonical Identifier
`illustrator`	§ 4.7.1.5 Creators
`inker`	§ 4.7.1.5 Creators
`inLanguage`	§ 4.7.1.9 Publication Language
`language`	§ 4.4.1 Global Declarations
`letterer`	§ 4.7.1.5 Creators
`link`	§ 4.7.2.3 Links
`name`	§ 4.7.1.11 Title
`penciler`	§ 4.7.1.5 Creators
`publisher`	§ 4.7.1.5 Creators
`readBy`	§ 4.7.1.5 Creators
`readingOrder`	§ 4.7.2.1 Default Reading Order
`readingProgression`	§ 4.7.1.10 Reading Progression Direction
`resources`	§ 4.7.2.2 Resource List
`translator`	§ 4.7.1.5 Creators
`type`	§ 4.5 Publication Types
`url`	§ 4.7.1.3 Address

Name	Publication Manifest
`accessibility-report`	§ 4.8.2.1 Accessibility Report
`contents`	§ 4.8.1.3 Table of Contents
`cover`	§ 4.8.1.1 Cover
`pagelist`	§ 4.8.1.2 Page List
`privacy-policy`	§ 4.8.2.3 Privacy Policy
`preview`	§ 4.8.2.2 Preview

Publication Manifest

W3C Recommendation 10 November 2020

Abstract

Status of This Document

1. Introduction

1.1 Scope

1.2 Manifest Format

1.3 JSON-LD Authoring and Processing

1.4 Relationship to Schema.org

2. Terminology

3. Conformance

4. Publication Manifest

4.1 Requirements

4.2 Value Categories

4.2.1 Literals

4.2.2 Numbers

4.2.3 Booleans

4.2.4 Explicit and Implied Objects

4.2.4.1 Localizable Strings

4.2.4.2 Entities

4.2.4.3 Linked Resources

4.2.4.4 Objects

4.2.5 URLs

4.2.6 Identifiers

4.2.7 Arrays

4.3 Manifest Contexts

4.4 Manifest Language and Direction

4.4.1 Global Declarations

4.4.2 Item-Specific Declarations

4.5 Publication Types

4.6 Profile Conformance

4.7 Properties

4.7.1 Descriptive Properties

4.7.1.1 Abridged

4.7.1.2 Accessibility

4.7.1.3 Address

4.7.1.4 Canonical Identifier

4.7.1.5 Creators

4.7.1.6 Duration

4.7.1.7 Last Modification Date

4.7.1.8 Publication Date

4.7.1.9 Publication Language

4.7.1.10 Reading Progression Direction

4.7.1.11 Title

4.7.2 Resource Categorization Properties

4.7.2.1 Default Reading Order

4.7.2.2 Resource List

4.7.2.3 Links

4.7.3 Extensibility

4.7.3.1 Linked records

4.7.3.2 Additional Manifest Properties

4.8 Resource Relations

4.8.1 Structural Resources

4.8.1.1 Cover

4.8.1.2 Page List

4.8.1.3 Table of Contents

4.8.2 Informative Resources

4.8.2.1 Accessibility Report

4.8.2.2 Preview

4.8.2.3 Privacy Policy

4.8.3 Extensions

5. Publication Resources

6. Manifest Discovery

6.1 Linking

6.2 Embedding

6.3 Other Discovery Methods

7. Processing a Manifest

7.1 Introduction

7.2 Error Handling

7.3 Processing Contexts

7.4 Generate the Internal Representation

7.4.1 Normalize Data

7.4.1.1 Convert to Absolute URL

7.4.2 Data Validation

7.4.2.1 Global Data Checks

7.4.2.2 Verify Value Category

7.4.2.3 Get Unique URLs

7.4.2.4 Remove Empty Arrays

7.4.3 Add Default Values

8. Modular Extensions

A.1 The `PublicationManifest` Dictionary

A.1.1 The `LinkedResource` Dictionary

A.1.2 The `Entity` Dictionary

A.1.3 The `LocalizableString` Dictionary