ANSI/NISO Z39.86-200x Version 1.0.0
File Specifications for the Digital Talking Book

Final Draft

September 28, 2001

Abstract: This standard defines the format and content of the electronic file set that comprises a digital talking book (DTB). It uses established and new specifications to delineate the structure of DTBs whose content can range from XML text only, to text with corresponding spoken audio, to audio with little or no text. DTBs are designed to make print material accessible and navigable for blind or otherwise print-disabled persons.

Copyright © 2001 by National Information Standards Organization


Table of Contents

  1. General Information
  2. Overview
  3. The DTB Package File
  4. Content Format for Text
  5. Audio File Formats
  6. Image File Formats
  7. Synchronization of Media Files
  8. Navigation Control File (NCX)
  9. Portable Bookmarks and Highlights
  10. Resource File
  11. Packaging Files for Distribution
  12. Presentation Styles
  13. Types of DTB
  14. Digital Rights Management
  15. Time-Scale Modification
  16. Conformance
  17. References to Other Specifications/Documents

Foreword

(This foreword is not a part of the American National Standard for Digital Talking Books... . It is included for information only.)

This standard presents the file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, learning-disabled, or otherwise print-disabled readers. For many years, "talking books" have been made available to print-disabled readers on analog media such as phonograph records and audiocassettes. These media serve their users well in providing human-speech recordings of a wide array of print material in increasingly robust and cost-effective formats. However, analog media are limited in several respects when compared to a print book. First, they are by their nature linear presentations, which leaves much to be desired when reading reference works, textbooks, magazines, and other materials which are often accessed randomly. Digital media offer readers the ability to move around a book or magazine as freely as (and more efficiently than) a sighted reader flips through a print book. Second, analog recordings do not allow users to interact with the book, placing bookmarks, highlighting material, and so forth. A DTB offers this capability, storing the bookmarks and highlights separate from, but associated with, the DTB itself. Third, talking book users have long complained that they do not have access to the spelling of the words they hear. As will be explained below, some DTBs will include a file containing the full text of the work, synchronized with the audio presentation, thereby allowing readers to locate specific words and hear them spelled. Finally, analog audio offers readers only one version of the document. If, for example, a book contains footnotes, they are either read where referenced, which burdens the casual reader with unwanted interruptions, or grouped at a location out of the flow of the text, making them difficult for interested readers to access. A DTB allows the user to easily skip over or read footnotes. The Digital Talking Book offers the print-disabled user a significantly enhanced reading experience -- one that is much closer to that of the sighted reader using a print book. This standard describes the various files that make up a DTB and specifies how each must be formatted.

The DTB goes far beyond the limits imposed on analog audio books because they can include not just the audio rendition of the work, but the full textual content and images as well. Because the textual content file is synchronized with the audio file, a DTB offers multiple sensory inputs to readers, a great benefit to learning-disabled readers, for example. Some visually impaired readers may choose to listen to most of the book, but find that inspecting the images provides information not available in the narrative flow. Others may opt to skip the audio presentation altogether and instead view the text file via screen-enlarging software. Braille readers may prefer to read some or all of the document via a refreshable Braille display device connected to their DTB player and accessing the textual content file.

Digital Talking Books are not tied to a single distribution medium. CD-ROMs will be used first but DTBs will be portable to any digital distribution medium capable of handling the large files associated with digital audio recordings. Regardless of how a DTB is distributed, however, it will normally be in the context of a digital rights management system.

The initiative behind this document grew from a desire to standardize DTB file structures, in the hope that it might prevent a recurrence of the multiple formats currently used for talking books throughout the world. This document benefited greatly from the work of the DAISY Consortium, whose members had broken much of the ground covered in this standard and who contributed enormously to the solution of the many problems encountered.

NISO Voting Members

To be added by NISO.

NISO Board of Directors

To be added by NISO.

Standards Committee AQ

Standards Committee AQ on Digital Talking Books had the following members at the time this standard was approved:

Contents

Acknowledgements

Standards Committee AQ gratefully acknowledges the contributions made by the DAISY Consortium (www.daisy.org) to this work. The Consortium created a series of open international specifications (DAISY 2.0 ©1998, DAISY 2.01 ©1999, and DAISY 2.02 ©2001) that formed the foundation on which this standard is built. DAISY representatives served on Committee AQ since its inception and knowledge gained in their work on DAISY projects greatly informed the complex discussions and decisions leading to the creation of this document. In addition, they hosted several list-servs on which many issues critical to DTB work in general, and to this standard specifically, were discussed and resolved. It is no exaggeration to state that without their groundbreaking efforts and their ongoing contributions to Committee work, this standard would not exist in anything like its current level of sophistication.

In addition, the Committee wishes to thank the following individuals for their substantial assistance to the process of creating the standard: Robert Berkovitz, Sensimetrics Corporation; Harvey Bingham; Mike Brown; John Churchill, Recording for the Blind and Dyslexic; Manon Gaudet, VisuAide, Inc.; Al Gilman; Markus Gylling, Swedish Library of Talking Books and Braille; Steve Jacobs, NCR Corporation; Lynn Leith, Canadian National Institute for the Blind; Tatsu Nishizawa, Plextor Corporation; Dave Pawson, Royal National Institute for the Blind; James Pritchett, Recording for the Blind and Dyslexic; Dr. Gregg Vanderheiden, TRACE Research and Development Center, University of Wisconsin; Mr. Paul Vassallo, National Institute of Standards & Technology; with special thanks to members of the DAISY Consortium's Specifications and Guidelines Work Team and DTD Work Team. Thanks also to these members of the W3C Synchronized Multimedia (SYMM) Working Group: Dick Bulterman, Oratrix; Wo Chang, NIST; Lloyd Rutledge, CWI; Patrick Schmitz, Microsoft.

Contents

1. General Information

1.1 Purpose and Scope of Standard

(This section is informative.)

This standard establishes file specifications for digital talking books (DTBs) for blind, visually impaired, physically handicapped, learning-disabled, or otherwise print-disabled readers. Its purpose is to ensure interoperability across service organizations and vendors providing content and playback systems to the target population.

This standard provides specifications applicable to all aspects of digital talking book production and rendering, including authoring tools for DTBs, hardware- or software-based playback devices, and compliance-testing software.

Contents

1.2 Definitions

(This section is informative)

The following abbreviations, acronyms, phrases, and terms are used in this standard as defined below. In the following definitions and throughout the standard, bracketed items correspond to entries in section 17, "References to Other Specifications/Documents," where the full URL is provided for each reference.

Accessible
Fully usable by the target population.
CSS
Cascading Style Sheets [CSS] is a mechanism for adding style (e.g. fonts, colors, spacing, formatting) to HTML or XML documents.
DRM
Digital Rights Management is a system of tools and processes that protect intellectual property when it is encoded and distributed in digital form.
DTB
The Digital Talking Book content data set that complies with the specifications in this standard.
DTBook
An XML element set (dtbook.dtd) that defines the markup for the textual content of a DTB.
DTD
The Document Type Definition file contains machine- and human-readable rules that define allowable XML markup for a particular application.
FIXED
When used in definitions of attributes, means the attribute has a single, fixed value specified in the DTD.
Fragment Identifier
A means to address a named place in a document. For reference within the current document, the reference part is to a named target, and begins with "#". See URI for addressing into another document.
Global navigation
Movement to user-selected portions of a document, with that movement enabled by the NCX. Navigation targets may be headings representing the hierarchical structure of the document or specific points such as pages, notes, sidebars, etc.
IMPLIED
When used in definitions of attributes, means the attribute is optional, as opposed to REQUIRED.
Informative
An explanatory part of this standard. Contrast with Normative.
Local navigation
Movement within a document at a granularity finer than that provided by the NCX. For example, navigation by paragraph or sentence, or within a table or nested list. Precise local navigation can be controlled by the textual content file or the SMIL file(s); the granularity is limited by the degree to which the textual content file has been marked up or the level to which synchronization has been applied in the SMIL file(s). Time-based movement through a document (similar to fast-forward and rewind on an analog cassette) may also be implemented.
Manifest
A component of the Package File, the Manifest lists all files included in the DTB.
May
In this standard, the word may means that a course of action is optional.
Media Unit
A single object on which a DTB is stored for distribution to the reader. For example, a single CD-ROM disk.
Must
In this standard, the word must is to be interpreted as a mandatory requirement on the content or implementation. The term shall has the same definition as must.
NCX
The Navigation Control file for XML applications (NCX) provides the reader efficient and flexible access to the hierarchical structure of a DTB as well as direct access to selected elements such as page numbers, notes, figures, etc.
Normative
A portion of the standard that supplies precise specifications rather than background or explanation. Contrast with Informative. Notes within a normative section may be informative.
OEBF
The Open eBook Forum [OEBF] is an organization formed to create and maintain standards and promote the successful adoption of electronic books. The Open eBook Publication Structure Version 1.0.1 provides a specification for representing the content of a book when it is converted from print to electronic form. This DTB standard utilizes a subset (the Package File) of that specification.
OPF
Open eBook Forum Package File. See Package File.
Package File
The Open eBook Forum Package File (OPF) is an XML file conforming to the oebpkg101.dtd that contains administrative information about the DTB, the files that comprise it, and how these files interrelate.
Playback
With regard to implementations, playback refers to the methods used to render the DTB content. Playback may include audio, Braille, large print, and synthetic speech as appropriate for the content and as supported by the playback system.
Playback System
The hardware/software platform which renders the contents of a DTB to a user. Synonymous with Player.
Player
See Playback System.
Reader
The person reading the digital talking book. Synonymous with User.
REQUIRED
When used in definitions of attributes, means the attribute is required, as opposed to IMPLIED.
Shall
See Must
Should
In this standard, the word should means that a course of action is recommended but not required.
SMIL
The Synchronized Multimedia Integration Language [SMIL] is a W3C recommendation (SMIL 2.0) utilized in this standard to control the synchronized presentation of content in multiple media.
Spine
A component of the Package File, the Spine lists in default reading order the SMIL files included in the DTB.
Target population
The target population consists of blind, visually impaired, physically handicapped, learning-disabled, and otherwise print-disabled readers.
Textual Content File
The content of the subject document in a character set specified by ISO/IEC 10646 [ISO 10646] to which XML markup valid to the DTBook DTD has been applied.
TSM
Time-scale modification. Varying playback rate (both slower and faster than real time) while maintaining constant pitch.
URI
A Uniform Resource Identifier is a compact string of characters for identifying resources: documents, images, audio files, etc. Within a DTB, they are most likely to appear as attribute values for various XML elements, used as a way of identifying other documents or files either in whole or part. For the purposes of this specification, URIs must adhere to the syntax defined in RFC 2396 [RFC 2396]. A URI may include a fragment identifier suffix beginning with "#" that matches some named anchor in the target document. See Fragment Identifier.
User
See Reader.
XML
The Extensible Markup Language [XML] is a standardized language for marking up files containing structured information.
XSL
Extensible Stylesheet Language: A series of recommendations by the Worldwide Web Consortium which describes how XML documents can be transformed and rearranged [XSLT], then formatted [XSL] for screen, handheld device, paper or audio presentation.
XSLT
A language for transforming XML documents into other XML documents. [XSLT] is designed for use as part of XSL. See XSL.

Contents

1.3 Strategy

(This section is informative.)

This standard is based primarily on a variety of widely used standards and specifications, including several from the World Wide Web Consortium and the Open eBook Forum. Wherever applicable and appropriate standards or specifications existed they were used. The use of these specifications and technologies is intended to promote a fast and consistent adoption of this standard for the target population, while encouraging its extension into mainstream use.

Contents

1.4 Accessibility Issues

(This section is informative.)

Digital Talking Book files, streams, transformation processes and players have been designed to present their content to people with a wide range of abilities and disabilities. They are designed to allow presentation in forms other than conventional print, due to the inaccessibility of printed documents to these users. To the greatest extent possible, files, streams, transformation processes and players should make information available in as many presentation modes as practical, including human-narrated audio, Braille, synthesized speech and, for players with visual display, large print with user-specifiable size and text re-wrapping, as well as text and audio synchronization and other enhancements for persons with learning disabilities. The controls of players should be easily used by people with a wide range of manual dexterity. Further, tools for producing DTBs should be designed from the outset to be usable by people who are blind, visually impaired, or have other reading disabilities.

During the development of this standard, an advisory document, DTB Playback Device Features List was created. Although it is not a normative part of this standard, player developers may find useful accessibility concepts embodied in it.

In addition to the provisions of this standard, valuable supplemental information is available from the guidelines and techniques produced by the Worldwide Web Consortium's Web Accessibility Initiative. At this time, these documents include:

(This section is normative.)

It is not expected that all modes of presentation will be available in all players and documents, but it is strongly recommended that multiple equivalent presentations be made available to users whenever possible. Historically, products marketed to specific user groups with disabilities have sometimes proven unusable. Not all players need to be accessible to all target groups, but any device compliant with this standard must be accessible to the target group for which it is advertised. It is also strongly recommended that DTB production tools and processes be made accessible to persons with disabilities.

Contents

1.5 Relationship to Other Specifications

This section is informative.

This standard is based on the specific versions of the standards and specifications referenced herein, which are used as defined, except as noted by this document. Any refinement or replacement of a referenced specification by a newer or different version is not directly applicable to this standard. Conformance to this standard is based on the versions of the standards and specifications in effect at the time of this writing.

1.5.1 Relationship to Unicode

This section is normative.

Playback systems must support at least UTF-8 and UTF-16 encodings.

Contents

1.6 Patent Rights

(This section is informative.)

It is possible that compliance with this standard may require the use of one or more inventions covered by patent rights. It is believed that all companies claiming such rights have agreed to grant a license under such rights that they hold on reasonable and nondiscriminatory terms and conditions to any applicant.

Producers of DTB systems or any component thereof are responsible for obtaining the appropriate licenses for any and all technology defined by the relevant standards and specifications referenced by this standard.

Issues surrounding the protection of intellectual property embodied in the works distributed as digital talking books are discussed in section 14, Digital Rights Management.

Contents

1.7 Maintenance Agency

(This section is informative.)

The maintenance agency designated in Appendix 7 will be responsible for reviewing and acting upon suggestions for modifications to this standard. Questions concerning the implementation of this standard and requests for information should be sent to the maintenance agency.

A list of errata relating to this standard will be maintained at http://www.loc.gov/nls/z3986/v100/errata.html.

Contents

2. Overview

(This section is informative.)

A digital talking book (DTB) is a collection of electronic files arranged to present information to the target population via alternative media, namely, human or synthetic speech, refreshable Braille, or visual display, e.g., large print. When these files are created and assembled into a DTB in accordance with this standard, they make possible a wide range of features such as rapid, flexible navigation; bookmarking and highlighting; keyword searching; spelling of words on demand; and user control over the presentation of selected items (e.g., footnotes, page numbers, etc.). Such features enable readers with visual and physical disabilities to access the information in DTBs flexibly and efficiently, and allow sighted users with learning or reading disabilities to receive the information through multiple senses. For a full discussion of these capabilities, see the "Document Navigation Features List" [Navigation Features], developed as the user requirements document on which this standard was based. A document written during the development of this standard, Theory Behind the DTBook DTD [DTBook Theory], also describes the navigational capabilities of a DTB in some detail. The content of DTBs will range from audio alone, through a combination of audio, text, and images, to text alone. Section 13 describes these various types of DTB.

DTB players will also be produced with a variety of capabilities. The simplest might be portable devices with audio-only capabilities. More complex portable players could include text-to-speech capabilities as well as audio output for recorded human speech. The most comprehensive playback systems are expected to be PC-based, supporting visual and audio output, text-to-speech capability, and output to a Braille display. The Playback Device Features List [Player Features] mentioned above presents the committee's priorities for a range of functions across three types of playback device.

The files comprising a DTB fall into ten categories, as described below:

Package File
The Package File, drawn from the Open eBook Publication Structure 1.0.1, contains administrative information about the DTB and the files that comprise it. A valid XML version 1.0 file, it contains a set of metadata describing the DTB, a list (the manifest) of the files that make up the DTB, and a spine that defines the default reading order of the document. See section 3, "Package File."
Textual Content File
A DTB may contain part or all of the text of the document, as an XML 1.0 file marked up in accordance with the document type definition (DTD) defined for this standard, dtbook.dtd. (See Appendix 1, "DTBook DTD.") The textual content file enables properly-configured playback devices to spell words on demand, carry out keyword searches, and permit finely-grained navigation. It may also be accessed directly via refreshable Braille display, synthetic speech, or screen-enlarging software. See section 4, "Content Format for Text."
Audio Files
A DTB may include human or synthetic speech recordings of the document, embodied in audio files encoded in one of a specified group of audio formats. Section 5, "Audio File Formats," presents the formats specified by this standard.
Image Files
In addition to text and audio, DTBs may include images which can be presented on players with visual displays. Section 6, "Image File Formats," lists the formats specified by this standard.
Synchronization Files
To synchronize the different media files of a DTB during playback, this standard specifies the use of the World Wide Web Consortium's (W3C) Synchronized Multimedia Integration Language (SMIL), SMIL 2.0 version, an XML 1.0 application. The DTB SMIL files define a sequence of media events. During each event, text elements and corresponding audio clips as well as any additional visual elements are presented simultaneously. DTB players utilize the synchronization information to both index into the audio presentation and to track, during audio playback, the corresponding position in the textual content file. This standard utilizes a subset of the full SMIL 2.0 specification. See section 7, "Synchronization of Media Files," for discussion of these issues and Appendix 2, "DTB-Specific SMIL DTD," for the DTD that defines the DTB SMIL application.
Navigation Control File
The DTB system supports two modes of navigation, global and local. Global navigation -- movement by structure (chapter, section, subsection) and by other selected points such as pages, figures, or notes -- is effected through the Navigation Control file for XML applications (NCX). The NCX presents a dynamic view of the document's hierarchical structure, allowing the user to move through the document in large steps corresponding to its major divisions, or in progressively smaller steps down to a limit set by the document's detail. Text, audio, and image elements present to the user the document's headings, and id-based links point to the SMIL presentation at the corresponding locations. Appendix 3 contains the XML 1.0 DTD for the NCX. Local (more finely-grained) navigation is not handled by the NCX but is enabled through the textual content file or SMIL file(s), or through time-based movement through the audio presentation, depending on the document and on the player. See section 8, "Navigation Control File (NCX)," and Appendix 3, "NCX DTD" for specifications related to the NCX.
Bookmark/Highlight File
This standard supports user-set, exportable bookmarks and highlights to which text and audio notes may be applied. Specifications for the XML 1.0 file for portable bookmarks and highlights are presented in section 9, "Portable Bookmarks and Highlights" and Appendix 4, "DTD for Portable Bookmarks/Highlights."
Resource File
The resource file contains or references various text segments, audio clips, and/or images that provide alternative representations of navigational information -- for example, feedback on the user's current location in the document. It supplies information normally presented in a print book via typographical clues. See section 10, "Resource File," and Appendix 5, "DTD for Resource File" for file specifications.
Distribution Information File
Given the great size of audio files, even when heavily compressed, it will be common for large books to span several media units. Section 11, "Packaging Files for Distribution," describes how the "distInfo" file maps the location of each SMIL file to a specific media unit, e.g., disk 1 of 3. It also explains how, when several books are distributed on the same media unit, the distInfo file stores information about each book for presentation to the reader . Appendix 6, "Distribution Information DTD," presents the document type definition for "distInfo" files.
Presentation Styles
Section 12, "Presentation Styles," discusses how the presentation of a DTB in various media may be controlled through the use of optional style sheets.

Contents

3. The DTB Package File

(This section is normative.)

A DTB conforming to this standard must include exactly one Package File which must be a valid XML 1.0 document conforming to the Open eBook Forum™ (OEBF) 1.0.1 package DTD (oebpkg101.dtd) and its associated entity reference (oeb1.ent). The full specification, DTD, and entity reference for the OEBF package file are available for download from the OEBF site [OEBF]. The Package File must be named with the extension ".opf."

A Package File conforming to this standard must comply with all aspects of section 2 of the OEBF Publication Structure 1.0.1, with the following two exceptions:

(This section is informative.)

The Package File, drawn from the OEBF Publication Structure 1.0.1, contains administrative information about the DTB, the files that comprise it, and how these files interrelate. This section, drawn largely from the Publication Structure, provides only a brief summary of the function of each section with an example illustrating how it is applied to the DTB. Please see section 2 of the full OEBF Publication Structure 1.0.1 for complete details on the Package File.

The Publication Structure describes the major parts of the Package File as follows:

Here is an informal outline of the package file:

<?xml version="1.0"?>
<!DOCTYPE package PUBLIC "+//ISBN 0-9673008-1-9//DTD OEB 1.0.1 Package//EN"
"http://openebook.org/dtds/oeb-1.0.1/oebpkg101.dtd">
          
<package>
         <metadata>...</metadata>
         <manifest>...</manifest>
         <spine>...</spine>
         <tours>...</tours>
         <guide>...</guide>
</package>

3.1 Package Identity

(This section is normative.)

The package must include a value for its unique-identifier attribute. This is required because more than one dc:Identifier may be present in a DTB's Package File metadata and the unique-identifier specifies which dc:Identifier element provides the package's primary identifier. The value of unique-identifier must match the id attribute of one and only one dc:Identifier element which is a descendant of the package element.

The primary identifier of the DTB must be globally unique.

(This section is informative.)

Example 3.1:


     
<package unique-identifier="uid">
    <metadata>
        <dc-metadata...>
            <dc:Identifier id="uid" scheme="DTB">uk-rnib-db02006</dc:Identifier>
...
</package>

3.2 Publication Metadata

(This section is normative.)

This portion of the Package File contains the information about a DTB that would normally be found in a library catalog record. It includes data about the DTB itself (e.g., title, author, producer, format, and narrator) as well as information about the source publication (usually a print book) such as publisher, edition, copyright statement, etc.

The Package File must contain exactly one metadata element which must contain one and only one dc-metadata element holding Dublin Core [DC] metadata and must contain supplemental metadata in an x-metadata element. The x-metadata element must contain at least one instance of the meta element, which uses name and content attributes to define its value (see section 3.2.3, "X-Metadata").

3.2.1 Dublin Core Metadata

(This section is normative.)

The use of Dublin Core metadata within a compliant DTB must conform to the following description from the OEBF Publication Structure 1.0.1:

The dc-metadata element contains specific publication-level metadata as defined by the Dublin Core initiative (http://purl.org/dc/). The descriptions below are included for convenience, and the Dublin Core's own definitions take precedence (see http://www.ietf.org/rfc/rfc2413.txt).

The dc-metadata element can contain any number of instances of any Dublin Core elements. Dublin Core element names begin with the "dc:" prefix followed by a leading uppercase letter. Dublin Core metadata elements may occur in any order; in fact, multiple instances of the same element type (multiple dc:Creator elements, for example) can be interspersed with other metadata elements without change of meaning.

For upwards-compatibility, the element metadata in an OEB package is required to have an attribute of xmlns:dc="http://purl.org/dc/elements/1.0/" and xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/".

Following are brief definitions of the Dublin Core elements. See the Publication Structure and the Dublin Core itself for more complete descriptions. The attributes "xml:lang" and "id" can be applied to all "dc:..." elements. Additional attributes can be used with several elements as detailed below. Note that all Dublin Core element types may be repeated (occur more than once) within dc-metadata.

3.2.2 DTB ID Scheme

(This section is informative.)

Various schemes are available for identifying digital publications. In the DTB domain, the requirements for an identifier are simply to identify the publication in a manner that is highly likely to be globally unique. A major purpose of the uniqueness requirement is to prevent filename collisions among bookmark files.

To meet this base requirement, a simple DTB id scheme may be used. A DTB identifier under this scheme consists of a hyphen-separated string consisting of a two-letter country code drawn from [ISO 3166], an agency code unique within its country, and an identifier unique within the agency. For example, us-afb-x12345.

This scheme will provide a simple solution to the uniqueness requirement that will serve DTB-publishers' needs in the short term. In the longer term, as the requirements of a global library of alternative format materials become more important, other more sophisticated mechanisms should certainly be employed.

3.2.3 X-Metadata

(This section is normative.)

The following names were developed for the DTB application to supply information that the Dublin Core element set does not cover. These names may appear only within the x-metadata containing element, as values of the name attribute on the meta element. Each x-metadata name below is shown as either "Repeatable" (it may be used more than once) or "Not repeatable."

(This example is informative)

Example 3.2:


...
<metadata> 
     <dc-metadata xmlns:dc="http://purl.org/dc/elements/1.0/"
     xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
          <dc:Title>Revised Standards and Guidelines of
          Service for the Library of Congress Network of Libraries for the 
          Blind and Physically Handicapped 1995</dc:Title>
          <dc:Subject>library information networks</dc:Subject> 
          <dc:Subject>libraries and the physically
          handicapped--standards--U.S.</dc:Subject>
          <dc:Subject>libraries and the blind--standards--U.S.</dc:Subject> 
          <dc:Identifier id="uid" scheme="DTB">us-nls-db00001</dc:Identifier>
          <dc:Identifier  scheme="DOI">10.1000/DX44998</dc:Identifier>
          <dc:Creator role="aut">American Library Association. Association of 
          Specialized and Cooperative Library Agencies</dc:Creator>
          <dc:Publisher>National Library Service for the Blind and Physically
          Handicapped, Library of Congress</dc:Publisher>
          <dc:Date>2000-06-22</dc:Date>
          <dc:Source>0-8389-7797-9</dc:Source>
          <dc:Language>en</dc:Language>
          <dc:Format>ANSI/NISO Z39.86-200x v1.0.0</dc:Format>
          <dc:Description>A document developed to improve library service for blind and
          physically disabled persons by providing a tool for assessing the current status of those services
          and for developing long-range plans.</dc:Description>
     </dc-metadata>
     <x-metadata>
          <meta name="dtb:sourceDate" content="1995" />  
          <meta name="dtb:sourcePublisher" content="American Library Association" />
          <meta name="dtb:sourceRights" content="copyright 1995, American Library Association" />
          <meta name="dtb:narrator" content="Lowenstein, Ralph" />
          <meta name="dtb:producer" content="American Foundation for the Blind" />
          <meta name="dtb:multimediaType" content="audioNcx" />
          <meta name="dtb:totalTime" content="06:22:34.143" />
     </x-metadata>

</metadata>
...

3.3 Manifest

(This section is normative.)

The manifest, which is a child of the package element, must contain a complete list of all of the files (documents, audio files, images, style sheets, etc.) that make up a given DTB, including the package file itself. The distInfo file and any associated audio changeMsgs are not considered part of the DTB and thus shall not be listed (See section 11, "Packaging Files for Distribution.") Each file is referenced by an item element. Each item must have an href attribute which is the URI of the referenced file and is unique within the manifest. This URI must not include fragment identifiers; if relative, it is interpreted as relative to the package file itself. Further, any relative URIs contained within an XML file listed in the manifest are considered to be relative to the referring file.

In addition, each item must have a media-type attribute containing the MIME media type of the file, and an id attribute. The id is utilized primarily when a manifest item is referenced by the spine. The manifest also includes fallback declarations for files of types not supported by this standard (see OEBF Publication Structure for details). Support for the fallback mechanism is not required by this standard. The NCX entry in the Package File manifest must have an id value equal to "ncx". The Resource File entry in the Package File manifest must have an id value equal to "resource". The item elements listing SMIL files in the manifest must have a media-type attribute of "application/smil". The item elements for the NCX, textual content file(s), Package File, and Resource File must have media-type attribute values of "text/xml." The order of item elements within the manifest is not significant.

(This example is informative)

A sample manifest for a DTB with audio, structure, and text follows (multimediaType=audioFullText):

Example 3.3:


...
<manifest>
     
     <item id="opf" href="rs.opf" media-type="text/xml" />
     <item id="text" href="rs.xml" media-type="text/xml" />
     <item id="text_style" href="dtbbase.css" media-type="text/css2" />
     <item id="ncx" href="rs.ncx" media-type="text/xml" />
     <item id="ncx_style" href="ncx16.css" media-type="text/css2" /> 
     <item id="SMIL" href="rs.smil" media-type="application/smil" /> 
     <item id="foreword" href="rs_fwdx.mp3" media-type="audio/mp3" />
     <item id="standards" href="rs_stdx.mp3" media-type="audio/mp3" />
     <item id="appendices" href="rs_app.mp3" media-type="audio/mp3" />
     <item id="index" href="rs_index.mp3" media-type="audio/mp3" />
     <item id="fig_01" href="fig1.png" media-type="image/png" />
     <item id="resource" href="rs.res" media-type="text/xml" />
     <item id="resource_audio" href="res.mp3" media-type="audio/mp3" />


</manifest>
...

Here is a manifest for an audio-only version of the above DTB (multimediaType=audioNcx), where separate SMIL files were created for each segment of the book.

Example 3.4:


...
<manifest>
     <item id="opf" href="rs.opf" media-type="text/xml" />
     <item id="ncx" href="rs.ncx" media-type="text/xml" /> 
     <item id="foreword" href="rs_fwdx.mp3" media-type="audio/mp3" />
     <item id="standards" href="rs_stdx.mp3" media-type="audio/mp3" />
     <item id="appendices" href="rs_app.mp3" media-type="audio/mp3" />
     <item id="index" href="rs_index.mp3" media-type="audio/mp3" />
     <item id="SMIL1" href="rsfwd.smil" media-type="application/smil" />
     <item id="SMIL3" href="rsapp.smil" media-type="application/smil" />
     <item id="SMIL4" href="rsind.smil" media-type="application/smil" /> 
     <item id="SMIL2" href="rsstd.smil" media-type="application/smil" /> 

</manifest>
...

3.4 Spine

(This section is normative.)

The spine, a child of the package element, shall consist of a list of one or more itemref elements whose order defines the default linear reading order for the DTB. Each itemref must contain an idref which points to the id of a SMIL file listed in the manifest. Only SMIL files can be referenced by itemrefs in the spine. The itemrefs must be listed in the spine in order in which the SMIL files are to be presented. A player must consult the spine when it reaches the end of a SMIL file to determine which file to render next.

(The following examples are informative.)

The first of the following examples shows the spine that corresponds to the first of the two manifest examples above:
Example 3.5:


<spine>

     <itemref idref="SMIL" />

</spine>

The following spine matches the second manifest example above. The correct reading order is presented here. Note that it does not match the order of files in the manifest, where order is not significant.

Example 3.6:


<spine>

      <itemref idref="SMIL1" /> 
      <itemref idref="SMIL2" />
      <itemref idref="SMIL3" />
      <itemref idref="SMIL4" /> 

</spine>

3.5 Tours

(This section is normative.)

Compliant players are not required to support tours.

(This section is informative.)

The tours element is an optional child of the package element. The OEBF Publication Structure describes tours as follows: "Much as a tour guide might assemble points of interest into a set of sightseers' tours, a content provider may assemble selected parts of a publication into a set of tours to enable convenient navigation. ... Reading systems may use tours to provide various access sequences to parts of the publication, such as selective views for various reading purposes, reader expertise levels, etc." Because of inherent differences between the structures of a DTB and the OEBF tours, it is not feasible to implement tours in a DTB prepared in accordance with this standard. If a producer wishes to provide the functionality described above, it may partially achieve it by producing customized navLists in the NCX.

3.6 Guide

(This section is normative.)

Compliant players are not required to support guides.

(This section is informative.)

As specified in the OEBF Publication Structure, the guide, a child of the package element, lists the key structural features of the DTB, such as the table of contents, introduction, bibliography, etc. to enable playback devices to provide convenient access to them. Because DTBs include a mandatory NCX that satisfies a more rigorous and detailed access requirement, the guide is not expected to be used in DTBs.

Contents

4. Content Format for Text

4.1 Introduction

(This section is normative.)

This standard defines an XML 1.0 Document Type Definition -- DTBook -- for markup of the textual content files of books and other publications presented in digital talking book format. To be compliant with this standard, a textual content file of a DTB must be a valid XML file conforming to dtbook100.dtd, which can be found in Appendix 1, "DTBook DTD." The version attribute on the dtbook element must be present and contain the value drawn from the above-named DTD. Parsers will not enforce the presence of this attribute, so other mechanisms must.

A DTB that includes textual content will, in most cases, contain only one textual content file. However, when necessary (with a very large book, for example), a DTB can contain multiple textual content files, each of which must be valid to the DTBook DTD.

DTB content producers may extend the base DTD by including one or more new elements or full modules for special situations. To remain conformant with this standard, such extensions of the DTD must employ the mechanisms specified by XML 1.0. See section 4.2.2, "Modular Extension of the DTD."

4.2 Using the DTBook Element Set

(This section is informative.)

A document developed during the creation of this standard, Theory Behind the DTBook DTD [DTBook Theory], discusses the rationale underlying the DTBook element set and the benefits it provides to digital talking book applications.

An alphabetical listing of the DTBook elements, with definitions, is included in section 4.3. Two documents external to this standard provide detailed information on the use of the element set. First, an expanded version of the DTD, in HTML format, (see [DTBook HTML]) provides full detail on each element, describing where it can be used and which elements can be used within it, along with an expanded list of attributes.

Second, a comprehensive set of guidelines for applying DTBook markup is available from the DAISY Consortium. These Structure Guidelines [StructGuide] describe the correct application of the DTBook element set, emphasize the importance of capturing the structure of the text content, and provide detailed examples of the use of all DTBook elements.

The DTBook element set has considerable application outside of the digital talking book as well. It was designed to enable the production of documents in a variety of accessible formats. At least one U.S. Braille translation software package has implemented a facility that imports DTBook documents and automatically translates and formats them in Grade 2 Braille. It is expected that similar automated processes will be developed for converting properly marked-up documents into large print and for rendering DTBook documents in Braille, synthetic speech, and large print "on the fly." Finally, an attribute called "showin" is incorporated in the DTBook element set to control the display of selected segments of a DTBook document. For example, descriptions of a graph might vary between Braille and large print editions; "showin" could allow only the appropriate version to show in each edition, although both would be present in the DTBook document.

This standard does not mandate the degree of markup to be applied to a textual content file. However, the richer the markup, the greater the functionality available to the reader.

For more information on XML 1.0 markup and DTD usage, see the W3C XML site [XML].

4.2.1 DTBook Markup Related to SMIL

(This section is normative.)

To ensure efficient player operation with DTBs containing textual content files, the smilref attribute must be present and non-empty for each element in the textual content file referenced by a SMIL file. The smilref value shall normally be the uri of the SMIL time container (par or seq) containing the media object that references a given element. However, in a text-only DTB consisting of a sequence of text media objects, smilref contains the uri of the media object that references the element. The smilref attribute permits the DTB player to resume SMIL-based playback following text-based navigation, full-text searches, etc.

4.2.2 Modular Extension of the DTD

(This section is informative.)

The DTBook DTD includes a base set of elements for use in marking up a broad range of material. Additional modules containing elements for specialized applications such as poetry, plays, dictionaries, mathematics, etc. can be "invoked" from within a DTBook document when needed, as described below.

A DTBook document is an XML application. Therefore it should begin with the XML declaration identifying the version of XML, and the optional character set encoding (see Appendix 1, "DTBook DTD" for more information):

<?xml version="1.0" encoding="UTF-8" ?>

This is followed by the document type declaration:


<!DOCTYPE dtbook SYSTEM "dtbook100.dtd"
>

For discussion of other ways of expressing the DOCTYPE, see section 2.2 of Appendix 1, "DTBook DTD."

A book can invoke other DTDs or modules to augment the DTBook DTD by adding instructions in square brackets before the concluding ">" of the document type declaration. Such instructions in square brackets are called the "internal subset of declarations." For example:


<!DOCTYPE dtbook SYSTEM "dtbook100.dtd"
        [
            <!ENTITY % dramaModule SYSTEM "drama.dtd" >
            %dramaModule;
            <!ENTITY % externalblock "| drama">
            <!ENTITY % externalinline "| stagedir">
        ]> 

The first line of the internal subset declares an entity known as "dramaModule" and provides the URI where that module can be found. The second line invokes this entity, that is "brings it into" the current document, just as the DOCTYPE declaration invoked the base DTD (dtbook100.dtd). The third line declares the entity "% externalblock" and gives it the value "drama." Since dtbook100.dtd contains an entity of the same name, and the internal subset overrules the base (external) DTD (dtbook100.dtd) in areas of conflict, everywhere in dtbook100.dtd where %externalblock; appears (that is, wherever block elements are allowed), the value "drama" is added. Since drama is the root element in the drama module, the full drama module can be used there. Similarly, the last line effectively allows the element stagedir to be used anywhere %externalinline; is allowed in dtbook100.dtd (wherever inline elements can be used).

More than one module may be needed and included in a book. In the following example, both a poetry and drama module are invoked, as well as one inline element (stagedir) from the drama module.


        [ 
            <!ENTITY % poemModule "http://www.xyz.org/poem.dtd" >
            %poemModule;
            <!ENTITY % dramaModule "http://www.xyz.org/drama.dtd" >
            %dramaModule;
            <!ENTITY % externalblock "| poem | drama" >
            <!ENTITY % externalinline "| stagedir">
        ]>

See section 3 of Appendix 1, "DTBook DTD" for a more detailed discussion of this issue.

4.3 DTBook Elements

(This section is informative.)

The element names from DTBook are listed below in alphabetical order. The description provided for each element is taken directly from the DTBook DTD.

a
contains an anchor, which is used to reference another location, within the same or another <dtbook>.
abbr
designates an abbreviation, a shortened form of a word. For examples: Mr., approx., lbs., rec'd.
acronym
marks a word formed from key letters (usually initials) of a group of words. For examples: UNESCO, NATO, XML.
address
contains a location at which a person or agency may be contacted. By use of <line> to contain content of the individual lines, the class attribute can be used to identify the content of that <line>. For example, class values might include: name, address, region: (state. province, etc.), country, location code: (zipcode, provincial code, etc.), phone, fax, email, etc.
annoref
marks a text segment that references an <annotation>. Each <annoref> is usually a word, phrase, or whole line that is part of the surrounding text (identified in the original print book by bolding, italics, etc.). It should not normally be allowed to be turned off in a DTB application.
annotation
is a comment on or explanation of a portion of a printed book. It differs from <note> in that an <annotation> is usually set in the margin or on a facing page, often with no explicit reference to it inserted in the text. Any local reference to <annotation id="xxx"> is by <annoref idref="#xxx">.
author
identifies the writer of a work other than this one. Contrast with <docauthor> which identifies the author of this work. <author> typically occurs within <blockquote> or <cite>.
bdo
is used in special cases where the automatic actions of the bi-directional algorithm would result in incorrect display.
blockquote
indicates a block of quoted content that is set off from the surrounding text by paragraph breaks. Compare with <q> which marks short, inline quotations.
bodymatter
consists of the text proper of a book, as opposed to preliminary material <frontmatter> or supplementary information <rearmatter>.
book
surrounds the actual content of the document, which is divided into <frontmatter>, <bodymatter>, and <rearmatter>. <head>, which contains metadata, precedes <book>.
br
marks a forced line break.
caption
describes a <table> or <img>. If used with <table> it must follow immediately after the <table> start tag. If used with <img> or <imggroup> it is not so constrained.
cite
marks a reference (or citation) to another document. <cite> may occur within an <a href="URL">...</a> should that other document be available in the same dtbook distribution.
code
designates a fragment of computer code.
col
is a means to apply attribute values to a column of a <table>.
colgroup
groups adjacent columns <col> that are semantically related. The <col> in a <colgroup> may inherit attribute values from it, or an enclosing parent, such as <thead>, <tfoot>, or <tbody>, or within a <table>.
dd
marks a definition of a term within a definition list.
dfn
marks the first occurrence of a word or term that is defined or explained there or elsewhere in <book>. Often <dfn> is rendered in italics, sometimes in parentheses.
div
is a generic container for subdivisions of a book. The <level1> ... <level6> hierarchy, or the <level> tag used recursively, should mark the major hierarchical structures of a book, while <div> is used in less formal circumstances or when for production purposes it is desired that a structure should be treated differently. The class attribute value identifies the actual name (e.g., part, chapter, letter) of the structure it marks. Compare with <span> which is used in inline settings.
dl
contains a definition list, usually consisting of pairs of terms <dt> and definitions <dd>. Any definition can contain another definition list.
docauthor
marks each author or editor of this work. Compare with <author>, used to mark the author of another work, within <blockquote> or <cite>.
doctitle
marks the title of the book within <frontmatter>. By convention it should appear only once, usually first. Within <head> is <title> whose contents are generally the same.
dt
marks a term in a definition list.
dtbook
is the root element in the Digital Talking Book DTD. <dtbook> contains metadata in <head> and the contents itself in <book>.
em
indicates emphasis. Usually <em> is rendered in italics. Compare with <strong>.
frontmatter
contains preliminary material enclosed in appropriate <level> or <level1>. Content may include <doctitle>, <docauthor> copyright notice, foreword, acknowledgments, table of contents, etc. <frontmatter> serves as a guide to the content and nature of a <book>.
h1
contains the text of the heading for a <level1> structure.
h2
contains the text of the heading for a <level2> structure.
h3
contains the text of the heading for a <level3> structure.
h4
contains the text of the heading for a <level4> structure.
h5
contains the text of the heading for a <level5> structure.
h6
contains the text of the heading for a <level6> structure.
hd
marks the text of a heading in a <list> or <sidebar>.
head
contains metainformation about the book but no actual content of the book itself, which is placed in <book>. This information is consonant with the <head> information in xhtml, see [XHTML11STRICT]. Other miscellaneous elements can occur before and after the required <title>. By convention <title> should occur first.
hr
is an empty element indicating a horizontal rule. May be used to indicate a break in the text where only blank lines, a row of asterisks, a horizontal line, etc. are used in the print book.
img
marks a visual image. An <img> will generally contain a longdesc, a pointer to the related <prodnote>. The referencing is typically of the form <caption imgref="#yyy">The Caption</caption> for the printed caption of the <img id="yyy">.
imggroup
provides a container for <img> or images and associated <caption> and <prodnote>. <prodnote> may contain a description of the image. The content model allows: 1) multiple <img> if they share a caption, with the ids of each <img> in the <caption idref="id1 id2 ...">, 2) multiple <caption> if several captions refer to a single <img id="xxx"> where each caption has the same <caption idref="xxx">, 3) multiple <prodnote> if different versions are needed for different media (e.g., large print, braille, or print.)
kbd
designates information that the reader is to input directly into a computer using the keyboard.
level
is an alternative tag for marking the major structures in a book. It may be used recursively, i.e., repeated indefinitely with each successive occurrence nesting within the previous. It may also be included in a subsequent higher level. Subordinate levels have greater depth. Contrast with the explicit <level1>...<level6> elements, which may not be intermixed with <level>.
level1
is the highest level container of major divisions of a book. Used in <frontmatter>, <bodymatter>, and <rearmatter> to mark the largest divisions of the book (usually parts or chapters), inside which level2 subdivisions (often sections) may nest. The class attribute identifies the actual name (e.g., part, chapter) of the structure it marks. Contrast with <level>.
level2
contains subdivisions that nest within <level1> divisions. The class attribute identifies the actual name (e.g., subpart, chapter, subsection) of the structure it marks.
level3
contains sub-subdivisions that nest within <level2> subdivisions (e.g., sub-subsections within subsections). The class attribute identifies the actual name (e.g., section, subpart, subsubsection) of the subordinate structure it marks.
level4
contains further subdivisions that nest within <level3> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
level5
contains further subdivisions that nest within <level4> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
level6
contains further subdivisions that nest within <level5> subdivisions. The class attribute identifies the actual name of the subordinate structure it marks.
levelhd
contains the text of a heading within <level>. Corresponds to <h1> through <h6> used in <level1> through <level6>.
li
marks each list item in a <list>. <li> content may be either inline or block and may include other nested lists. Alternatively it may contain a sequence of list item components, <lic>, that identify regularly occurring content, such as the heading and page number of each entry in a table of contents.
lic
("list item component") allows ordered substructure within a list item <li>. Used when a list item is made up of two or more components, as in a table of contents entry. The same number of <lic> should occur in each <li>. If not, correspondence of <lic> in different <li> is in order of occurrence for the current writing direction of the <li>.
line
marks a single logical line of text. Often used in conjunction with <linenum> in documents with numbered lines.
linenum
contains a line number in, for example, in legal text.
link
is an empty element appearing in the <head> section of a document that establishes a connection between the current document and another document. The <link> element conveys relationship information (for example, "next" and "previous") that may be rendered by user agents in a variety of ways.
list
contains some form of list, ordered or unordered. The list may have intermixed heading <hd> (generally only one, possibly with <prodnote>) and an intermixture of list items <li> and <pagenum>. If bullets and outline enumerations are part of the print content, they are expected to prefix those list items in content, rather than be implicitly generated. Note: XHTML has explicit list element types: ol for ordered, and ul for unordered.
meta
indicates metadata about the book. It is an empty element that may appear repeatedly only in <head>.
note
marks a footnote, endnote, etc. Any local reference to <note id="yyy"> is by <noteref idref="#yyy">.
noteref
marks one or more characters that reference a footnote or endnote <note>. Contrast with <annoref>. Either may be independently skippable.
notice
contains a warning, caution, or other type of admonition normally found in the margin of a book. In contrast with <sidebar> a <notice> must be presented at a specific location within the text. Its presentation is not optional.
p
contains a paragraph, which may contain subsidiary <list> or <dl>.
pagenum
contains one page number as it appears from the print document, usually inserted at the point within the file immediately preceding the first item of content on a new page.
prodnote
contains language added to the alternative-format version by the producer; commonly used to: 1) provide descriptions of one or more visual elements such as charts, graphs, etc. 2) supply operating instructions 3) describe differences between the print book and the audio version.
q
contains a short, inline quotation. Compare with <blockquote> which marks a longer quotation set off from the surrounding text.
rearmatter
contains supplementary material such as appendices, glossaries, bibliographies, and indices. It follows the <bodymatter> of the book.
samp
contains a sample of work created by the author for use as an example or template. For example, a sample business letter, resume, or computer program output, or form.
sent
marks a sentence.
sidebar
contains information supplementary to the main text and/or narrative flow and is often boxed and printed apart from the main text block on a page. It may have a heading <hd>.
span
is a generic container for use in inline settings when no specific tag exists for a given situation. The class attribute may describe the nature of the text it marks (e.g., a typographical error). May be used to mark a class of items to which styles are to be applied. Compare with <div> which is used in block settings. #PCDATA following an inline can be given an id for resumed playing by putting it in a <span>.
strong
marks stronger emphasis than <em>. Visually <strong> is usually rendered bold.
style
provides the means to include styling information that applies to the book. It may appear only in <head>. It may include CDATA sections.
sub
indicates a subscript character (printed below a character's normal baseline). Can be used recursively and/or intermixed with <sup>.
sup
marks a superscript character (printed above a character's normal baseline). Can be used recursively and/or intermixed with <sub>.
table
contains cells of tabular data arranged in rows and columns. A <table> may have a <caption>. It may have descriptions of the columns in <col>s or groupings of several <col> in <colgroup>. A simple <table> may be made up of just rows <tr>. A long table crossing several pages of the print book should have separate <pagenum> values for each of the pages containing that <table> indicated on the page where it starts. Note the logical order of optional <thead>, optional <tfoot>, then one or more of either <tbody> or just rows <tr>. This order accommodates simple or large, complex tables. The <thead> and <tfoot> information usually helps identify content of the <tbody> rows, For a multiple-page print <table> the <thead> and <tfoot> are repeated on each page, but not redundantly tagged.
tbody
marks a group of rows in the main body of a <table>. If the <table> is divided into several sections, each consisting of a number of rows, each section would be separately tagged with <tbody>. The same <thead> and <tfoot> apply to every <tbody> section.
td
indicates a table cell containing data.
tfoot
marks footer information in a <table>, consisting of one or more rows <tr>, usually of <th> cells. On multiple-page printed tables, <tfoot> rows are repeated at the bottom of the first page of the <table> and its continuation on other pages.
th
indicates a table cell containing header information.
thead
marks header information in a <table>, consisting of one or more rows <tr> of <th> cells. On multiple-page printed tables, <thead> rows are repeated at the top of the <table> and on top of its continuation on other pages.
title
contains the title of the book but is used only as metainformation in <head>. Use <doctitle> within <book> for the actual book title, which will usually be the same.
tr
marks one row of a <table> containing <th> or <td> cells. The values for %cellhalign; and %cellvalign; provide default values for <th> and <td> in the row, overriding any from <col>.
w
marks a word.

Contents

5. Audio File Formats

5.1 Distribution Formats

(This section is normative.)

A set of audio file formats is listed below. A compliant audio player must be capable of decoding at least one of the formats listed. It is strongly recommended that players be able to decode all listed formats. Content compliant with this standard must be delivered in one of the formats below, or any mixture of them. The file extensions shown for each format must be utilized in audio filenames in compliant DTBs. Values are not case-sensitive.

It is permissible for parts of a single book to be encoded in different audio formats. For example, a producer may choose to encode a lengthy bibliography at a lower bitrate or with a different codec than the main body of the book. Players must support transitions between differently encoded sections smoothly. There is no restriction on the granularity of these parts, i.e. they may occur at any point in the SMIL presentation.

Support for multi-channel rendering is not required. Stereo signals must be recognized and rendered at least in monaural format.

A compliant DTB player that provides audio output should be capable of decoding the following audio formats:

While the ISO standards for MP3 and AAC require support for variable bitrate playback, players compliant with this standard are only required to support constant bitrate playback.

Players must support sample rates of 44.1, 22.05, and 11.025 kHz at a depth of 16 bits per sample. Compressed audio must be encoded such that the output sampling rate is restricted to one of the above three rates.

5.2 Formats for Audio Notes

(This section is normative.)

Audio players capable of recording and exporting audio notes for bookmarks and highlights must support encoding in the following format or one of the formats specified in section 5.1. Audio players capable of importing bookmarks and highlights must support decoding of the following format.

Contents

6. Image File Formats

(This section is normative.)

Images included in DTBs must be presented in one or more of the following formats. Compliant playback devices that support image display must be capable of displaying the following image formats: JPEG (JFIF V 1.02) [JPEG] and PNG [RFC 2083]. Support for Scalable Vector Graphics [SVG] is recommended. Appendix 8 of the SVG specification addresses accessibility issues.

Contents

7. Synchronization of Media Files

7.1 Introduction

7.1.1 Background

(This section is informative.)

The Synchronized Multimedia Integration Language (SMIL 2.0) [SMIL] was developed by the World Wide Web Consortium as a standard for definition and playback of multimedia presentations over the Internet. SMIL defines the sequence of playback for one or more media objects. In the case of DTBs, the primary media objects are audio and textual content files; SMIL provides for their parallel and synchronized presentation. Any DTB constructed using SMIL, and utilizing content encoded in standard text and audio media types, is playable on any device or platform which has implemented a SMIL-conformant player of the same or later SMIL version, so long as the necessary audio and textual rendering decoders are present.

What distinguishes a DTB playback system from a basic SMIL player is the inclusion of specific navigation and presentational capabilities set out in the user requirements for DTBs ([Navigation Features]). These capabilities can utilize information from an NCX file, from the textual content, and/or from the SMIL file itself. The key to this information is the inclusion of unique identifiers within the textual content (when present) and SMIL files. Audio files are indexed by time-based positions and in themselves contain no embedded semantic structure. To provide semantic structure to audio content, it is necessary to associate time-points in the audio file with the corresponding position within the textual content. This is achieved using SMIL through the pairing of a pointer to a specific position within a textual content file (referenced by a URI) with its corresponding time position in the audio content. In the case of the DTB SMIL application, each synchronization point within the SMIL file is assigned a unique identifier. The presence of these identifiers within both the textual content and the SMIL allows navigation to occur by several different methods, as determined by the playback system.

SMIL incorporates a control structure called customTests, which allows SMIL authors to identify by class selected elements of a document (e.g., notes, page numbers, line numbers). The playback device can then expose to the user the presence of these classes and allow the user to select whether a given class of elements is to be read or skipped over during sequential playback.

The DTB producer determines granularity of the synchronization events. Synchronization events may be limited to the primary structural elements (those indicated in the NCX) or may be augmented in books with full textual content to include synchronization down to paragraph, sentence, or even word level. The requirement for this level of synchronization is that the textual content include mark-up tags for the desired elements, and that those elements include unique identifiers that can be referenced in the SMIL files.

The SMIL file for a DTB typically will consist of a sequence of parallel events (e.g., text and audio (and possibly image) events occurring simultaneously). SMIL represents this structure through the use of the "time containers" seq (sequence of media objects) and par (parallel time grouping in which multiple media objects play back at the same time). A simple form of DTB SMIL file would be as follows, where the three pars shown are played one after the other, and the text and audio content referenced in each par are rendered simultaneously:


<smil>
 ...
<seq>
<par><text.../><audio.../></par>
<par><text.../><audio.../></par>
<par><text.../><audio.../><img.../></par>
</seq>
...
</smil>

7.1.2 SMIL Modules

(This section is informative.)

Synchronization of media objects in this standard is based on the SMIL 2.0 Specification. Developers are requested to reference SMIL 2.0 [SMIL] for complete background and details. Only a small subset of the SMIL specification is utilized in this implementation, drawing from the following modules, which are grouped by functional area. Modules marked with asterisks are used in whole or in part in this application; the others are not utilized but are included because they are part of a core set of modules required for host language conformance under W3C modularization guidelines.

The modules mentioned above can be combined, using W3C modularization guidelines, to form a profile specific to DTB applications. Section 2 of the SMIL specification, "The SMIL 2.0 Modules," describes this process in detail.

7.2 Application of SMIL to DTBs

(This section is normative.)

To simplify validation using commonly available parsers and to lessen the complexity of determining content models and applicable attribute lists, a DTB-Specific SMIL DTD is included in this standard in Appendix 2. This DTD includes only those elements and attributes from the modules listed above that are required for the DTB application. In addition, it is more restrictive than the SMIL modules in that id attributes are often required in the DTB application when they are implied in the SMIL modules.

A compliant DTB must contain at least one SMIL file. All SMIL files included in a DTB must be valid XML documents conforming to dtbsmil100.dtd. The version attribute on the smil element must be present and contain the value drawn from the above-named DTD. Parsers will not enforce the presence of this attribute, so other mechanisms must.

Time containers (seqs or pars) within SMIL files must contain ids. Media objects (audio, text, and img) may also contain ids, although this practice will generally be limited to single-medium DTBs. See section 7.4.11, "Text-Only DTBs."

In the textual content file, each segment to be synchronized (e.g., heading, paragraph, list item, etc.) must be contained within an element carrying a unique id to which the corresponding SMIL segment points. In addition, any textual content file element referenced by a SMIL file must include a smilref attribute specifying the uri of the time container or media object that references it. The smilref value shall normally be the uri of the SMIL time container containing the media object that references a given element. However, in a text-only DTB consisting of a sequence of text media objects, smilref shall contain the uri of the referencing media object itself. See section 4.2.1, "DTBook Markup Related to SMIL."

To ensure efficient player operation with DTBs containing textual content files, the smilref attribute must be present and non-empty for each element in the textual content file referenced by a SMIL file. The smilref value shall normally be the uri of the SMIL time container (par or seq) containing the media object that references a given element. However, in a text-only DTB consisting of a sequence of text media objects, smilref contains the uri of the media object that references the element. The smilref attribute permits the DTB player to resume SMIL-based playback following text-based navigation, full-text searches, etc.

It is strongly recommended that the SMIL file(s) have a level of granularity matching that of the textual content file. That is, if the textual content file is marked up to the paragraph level, the SMIL file(s) should include synchronization to the paragraph level.

All time offsets in SMIL files (and all other applicable DTB files, e.g., NCX clipBegin/clipEnd, bookmark timeOffsets, etc.), are based on normal play speed. In order to maintain synchronization, a player must process time offsets independently of actual playback speed.

7.3 SMIL Elements

(This section is informative.)

As mentioned above, the DTB application utilizes only a portion of the elements and attributes that make up the modules in the DTB SMIL Profile. Playback devices compliant with this standard need support only the following SMIL elements and attributes, which make up the DTB-Specific SMIL DTD.