W3C

XML Query (XQuery) Requirements

W3C Working Draft 27 June 2003

This version:
http://www.w3.org/TR/2003/WD-xquery-requirements-20030627
Latest version:
http://www.w3.org/TR/xquery-requirements
Previous version:
http://www.w3.org/TR/2003/WD-xquery-requirements-20030502
Editors:
Don Chamberlin, IBM Almaden Research Center <chamberlin@almaden.ibm.com>
Peter Fankhauser, Infonyte GmbH <fankhaus@infonyte.com>
Massimo Marchiori, W3C/MIT/University of Venice <massimo@w3.org>
Jonathan Robie, DataDirect <jonathan.robie@datadirect-technologies.com>

Abstract

This document specifies goals, requirements, and usage scenarios for the W3C XML Query (XQuery) data model, algebra, and query language. It also includes, for each requirement, a corresponding status, indicating the current situation of the requirement in the XML Query family of specifications.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C. This document fixes a bug in 3.2.1 Query Language Syntax of http://www.w3.org/TR/2003/WD-xquery-requirements-20030502, which repeated one requirement twice and deleted another, and modifies the status section.

Otherwise, it is the same as in the previous version: it includes, for each requirement, a corresponding status, indicating the current situation of the requirement in the XML Query family of specifications, at the beginning of the Last Call period for the XQuery 1.0 and XPath 2.0 Data Model and for the XQuery 1.0 and XPath 2.0 Functions and Operators. A future revision will be provided when all remaining open issues have been resolved and when the remaining documents are issued as Last Call working drafts. In some cases, this future revision may include changes to a requirement based on what we have learned since the original requirements were written.

This is a W3C Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.

This document has been produced as part of the W3C XML Activity, following the procedures set out for the W3C Process. The document has been written by the XML Query Working Group. The goals of the XML Query working group are discussed in the XML Query Working Group charter ( W3C members only).

The XML Query Working Group feels that the contents of this Working Draft are relatively stable, and therefore encourages feedback on this version.

Comments on this document should be sent to the W3C mailing list public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Table of Contents

1 Goals
2 Usage Scenarios
3 Requirements
    3.1 Terminology
    3.2 General Requirements
    3.3 XML Query Data Model
    3.4 XML Query Functionality
4 Relationship to Other Activities
5 References (non-normative)

Appendix

A Glossary


1 Goals

The goal of the XML Query Working Group is to produce a data model for XML documents, a set of query operators on that data model, and a query language based on these query operators. The data model will be based on the W3C XML Infoset, and will include support for Namespaces.

Queries operate on single documents or fixed collections of documents. They can select whole documents or subtrees of documents that match conditions defined on document content and structure, and can construct new documents based on what is selected.

2 Usage Scenarios

The following usage scenarios describe how XML queries may be used in various environments, and represent a wide range of activities and needs that are representative of the problem space to be addressed. They are intended to be used as design cases during the development of XML Query, and should be reviewed when critical decisions are made. These usage scenarios should also prove useful in helping non-members of the XML Query Working Group understand the intent and goals of the project.

2.1 Human-readable documents

Perform queries on structured documents and collections of documents, such as technical manuals, to retrieve individual documents, to generate tables of contents, to search for information in structures found within a document, or to generate new documents as the result of a query.

2.2 Data-oriented documents

Perform queries on the XML representation of database data, object data, or other traditional data sources to extract data from these sources, to transform data into new XML representations, or to integrate data from multiple heterogeneous data sources. The XML representation of data sources may be either physical or virtual; that is, data may be physically encoded in XML, or an XML representation of the data may be produced.

2.3 Mixed-model documents

Perform both document-oriented and data-oriented queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.

2.4 Administrative data

Perform queries on configuration files, user profiles, or administrative logs represented in XML.

2.5 Filtering streams

Perform queries on streams of XML data to process the data in a manner analogous to UNIX filters. This might be used to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.

2.6 Document Object Model (DOM)

Perform queries on DOM structures to return sets of nodes that meet the specified criteria.

2.7 Native XML repositories and web servers

Perform queries on collections of documents managed by native XML repositories or web servers.

2.8 Catalog search

Perform queries to search catalogs that describe document servers, document types, XML schemas, or documents. Such catalogs may be combined to support search among multiple servers. A document-retrieval system could use queries to allow the user to select server catalogs, represented in XML, by the information provided by the servers, by access cost, or by authorization. Once a server is selected, a retrieval system could query the kinds of documents found on the server and allow the user to query those documents.

2.9 Multiple syntactic environments

Queries may be used in many environments. For example, a query might be embedded in a URL, an XML page, or a JSP or ASP page; represented by a string in a program written in a general-purpose programming language; provided as an argument on the command-line or standard input; or supported by a protocol, such as DASL or Z39.50.

3 Requirements

3.1 Terminology

The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:

MUST

This word means that the item is an absolute requirement.

SHOULD

This word means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.

MAY

This word means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.

When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.

Each requirement also includes a status section, indicating its current situation in the XML-Query family of specifications. Three status levels are used:

"Green" status

green status This indicates that the requirement, according to its original formulation, has been completely met. Optional clarificatory text may follow.

"Yellow" status

yellow status This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.

"Red" status

red status This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.

3.2 General Requirements

3.2.1 Query Language Syntax

The XML Query Language MAY have more than one syntax binding.

green status Status: this requirement has been met.

The XML Query Language MUST be convenient for humans to read and write.

green status Status: this requirement has been met.

One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query.

yellow status Status: this requirement has been partially met. XQueryX was last published in June 2001. This has been captured in issue 152 ( W3C members only)

3.2.2 Declarativity

The XML Query Language MUST be declarative. Notably, it MUST not enforce a particular evaluation strategy.

green status Status: this requirement has been met.

3.2.3 Protocol Independence

The XML Query Language MUST be defined independently of any protocols with which it is used. (Relationships to some specific protocols are discussed in 4 Relationship to Other Activities.)

green status Status: this requirement has been met.

3.2.4 Error Conditions

The XML Query Language MUST define standard error conditions that can occur during the execution of a query, such as processing errors within expressions, unavailability of external functions to the query processor, or processing errors generated by external functions.

yellow status Status: this requirement has been partially met. There is still an open issue ( issue 340, W3C members only) on "How to identify errors?"

3.2.5 Updates

Version 1.0 of the XML Query Language MUST not preclude the ability to add update capabilities in future versions.

green status Status: this requirement has been met.

3.2.6 Defined for Finite Instances

The XML Query Language MUST be defined for finite instances of the data model.

green status Status: this requirement has been met.

It MAY be defined for infinite instances.

red status Status: this requirement has not been met. The XQuery Data Model defines only finite sequences.

3.3 XML Query Data Model

3.3.1 Reliance on XML Information Set

The XML Query Data Model relies on information provided by XML Processors and Schema Processors, and it MUST ensure that it does not require information that is not made available by such processors.

green status Status: this requirement has been met.

For XML constructs found in XML 1.0 or the Namespaces Recommendation, the XML Query Data Model MUST show how the equivalent XML Query Data Model constructs are built from items in the XML Information Set.

green status Status: this requirement has been met.

The XML Query Data Model SHOULD represent all information items, or provide justification for any information items omitted.

green status Status: this requirement has been met. Note that some information items, such as Unparsed Entity Reference Information Item, have been omitted. These were determined not to be necessary.

For information found in the XML Schema, such as datatypes, the XML Query Working Group MUST coordinate with the XML Schema Working Group to ensure that schema processors may be relied on to provide the information needed to construct the Data Model.

green status Status: this requirement has been met.

3.3.2 Datatypes

The XML Query Data Model MUST represent both XML 1.0 character data and the simple and complex types of the XML Schema specification.

green status Status: this requirement has been met.

3.3.3 Collections

The XML Query Data Model MUST represent collections of documents and collections of simple and complex values. (Note that collections are not part of the current XML Infoset.)

green status Status: this requirement has been met.

3.3.4 References

The XML Query Data Model MUST include support for references, including both references within an XML document and references from one XML document to another.

yellow status Status: this requirement has been partially met. The doc() function provides the resolution of a URI that does not contain a fragment identifier. Support for XPointer was considered, but was not provided (the XPointer element() Scheme was just approved as a W3C Recommendation on March 25, 2003). An explicit defererence operator was considered, but ultimately dropped (see issue 134, W3C members only).

3.3.5 Schema Availability

Queries MUST be possible whether or not a schema is available (in this document, the term "schema" may refer to either an XML Schema or a DTD).

green status Status: this requirement has been met.

If a schema is available, the data model MUST represent any items that they define for their instances, such as default attributes, entity expansions, or data types. These items will not be present if a schema is not present.

green status Status: this requirement has been met. Note that not all XML Schema components are fully supported (facets, for example, are not supported).

3.3.6 Namespace Awareness

The XML Query Language and XML Query Language Data Model MUST be namespace aware.

green status Status: this requirement has been met.

3.4 XML Query Functionality

3.4.1 Supported Operations

The XML Query Language MUST support operations on all data types represented by the XML Query Data Model (see datatypes, collections, references) .

green status Status: this requirement has been met.

3.4.2 Text and Element Boundaries

Queries MUST be able to express simple conditions on text, including conditions on text that spans element boundaries.

green status Status: this requirement has been met. This requirement is satisfied by the ability of the string() function to return the text content of an element, including text within sub-elements. This requirement will be even better supported in a future version of XQuery by Full-Text operations (see the XQuery and XPath Full-Text Requirements).

3.4.3 Universal and Existential Quantifiers

Operations on collections MUST include support for universal and existential quantifiers.

green status Status: this requirement has been met.

3.4.4 Hierarchy and Sequence

Queries MUST support operations on hierarchy and sequence of document structures.

green status Status: this requirement has been met.

3.4.5 Combination

The XML Query Language MUST be able to combine related information from different parts of a given document or from multiple documents.

green status Status: this requirement has been met.

3.4.6 Aggregation

The XML Query Language MUST be able to compute summary information from a group of related document elements (this operation is sometimes called "aggregation.")

green status Status: this requirement has been met.

3.4.7 Sorting

The XML Query Language MUST be able to sort query results.

green status Status: this requirement has been met.

3.4.8 Composition of Operations

The XML Query Language MUST support expressions in which operations can be composed, including the use of queries as operands.

green status Status: this requirement has been met.

3.4.9 NULL Values

The XML Query Language MUST include support for NULL values. Therefore, all operators, including logical operators, MUST take NULL values into account.

green status Status: this requirement has been met. The closest analog for an SQL null value is XQuery's empty sequence. The xsi:nil='true' attribute is also supported.

3.4.10 Structural Preservation

Queries MUST be able to preserve the relative hierarchy and sequence of input document structures in query results.

green status Status: this requirement has been met.

3.4.11 Structural Transformation

Queries MUST be able to transform XML structures and MUST be able to create new structures.

green status Status: this requirement has been met.

3.4.12 References

Queries MUST be able to traverse intra- and inter-document references.

yellow status Status: this requirement has been partially met. The doc() function provides the resolution of a URI that does not contain a fragment identifier. Support for XPointer was considered, but was not provided (the XPointer element() Scheme was just approved as a W3C Recommendation on March 25, 2003). An explicit defererence operator was considered, but ultimately dropped (see issue 134, W3C members only).

3.4.13 Identity Preservation

Queries MUST be able to preserve the identity of items in the XML Query Data Model.

green status Status: this requirement has been met.

3.4.14 Operations on Literal Data

Queries SHOULD be able operate on XML Query Data Model instances specified with the query ("literal" data).

green status Status: this requirement has been met.

3.4.15 Operations on Names

Queries MUST be able to perform simple operations on names, such as tests for equality in element names, attribute names, and processing instruction targets, and to perform simple operations on combinations of names and data.

green status Status: this requirement has been met.

Queries MAY perform more powerful operations on names.

green status Status: this requirement has been met. Computed constructors give us this capability.

3.4.16 Operations on Schemas

Queries SHOULD provide access to the XML schema or DTD for a document, if there is one.

red status Status: this requirement has not been met. Support for this requirement was felt to be too complex for XQuery 1.0. An XML Schema document can be queried independently by a user.

If the schema is represented as a DTD, a mapping to an appropriate XML Schema representation MAY be required.

red status Status: this requirement has not been met. This requirement was not supported because the previous requirement was not supported.

3.4.17 Operations on Schema PSV Infoset

Queries MUST be able to operate on information items provided by the post-schema-validation information set defined by XML Schema.

green status Status: this requirement has been met. Data Model instances are constructed from the XML Schema PSVI.

3.4.18 Extensibility

The XML Query Language SHOULD support the use of externally defined functions on all datatypes of the XML Query Data Model. The interface to such functions SHOULD be defined by the Query Language, and SHOULD distinguish these functions from functions defined in the Query Language. The implementation of externally defined functions is not part of the Query Language.

green status Status: this requirement has been met.

3.4.19 Environment Information

The XML Query Language MUST provide access to information derived from the environment in which the query is executed, such as the current date, time, locale, time zone, or user.

green status Status: this requirement has been met. XQuery supports access to the current date, time, and timezone.

3.4.20 Closure

Queries MUST be closed with respect to the XML Query Data Model.

green status Status: this requirement has been met.

Both the input to a query and the output of a query MUST be defined purely in terms of the XML Query Data Model. Non-XML sources such as traditional databases or objects may be queried if they are given an XML Query Data Model representation. Similarly, query results are defined purely in terms of the XML Query Data Model. In software systems these results may be instantiated in any convenient representation such as DOM nodes, hyperlinks, XML text, or various data formats.

green status Status: this requirement has been met.

4 Relationship to Other Activities

XML has become a strategic technology in W3C and in the global Web market. The deliverable of the XML Query Working Group MUST satisfy the dependencies from the following Working Groups before it can advance to Proposed Recommendation. Some dependencies to and from the following W3C Working Groups will require close cooperation during the development process; the requirements posed for the Query work by these Working Groups may change during the development process, which means the interdependency of the Query work with these Working Groups must be managed actively:

W3C Document Object Model Working Group (DOM)

The XML Query Language must be able to return results in a form that can be used in DOM programs, such as DOM Nodes or the Iterators and TreeWalkers defined in the Traversal specification.

XSL and Linking Working Groups

Both XSLT and XPointer use the XML Path Language ( XPath), which defines a location path syntax that can be used to search for matching parts of an XML document. The XML Query work will take into consideration the expressibility and search facilities of XPath when formulating its algebra and query syntax, and where desirable try to encompass those functionalities into its query language. The XML Query WG will also take into consideration the additional functionality in the XSLT and XPointer specifications.

XML Schema Working Group

It is a goal of the XML Query work to be compatible with the work of the XML Schema Working Group, including both Structures and Datatypes.

For example, it should be possible to base query predicates on the existing DTD or XSDL definition of the content of an XML document and on the new data types being defined as part of the XDTL.

W3C XML Core Working Group

The XML Query work will define a formal data model of XML documents. This model must be based on the model of the XML Infoset. In case incompatibilities arise, requirements must be posed to the W3C XML Core Working Group. In any case, the final model used by the XML Query working group will have to be based on, and totally compatible with, the model of the XML Infoset.

There are no requirements for co-development of features with the following Working Groups, but there are points of contact between their work and that of this Working Group, and thus logical dependency between their deliverables and those of this Working Group. Requirements from these Working Groups are expected to be well suited for communication via documents:

WAI Protocols & Formats Working Group

Reuse of common constructs greatly facilitates accessibility; the WAI PF Working Group will review work on the XML query facilities to be sure cost/benefit design decisions are informed of the benefits of accessibility.

Internationalization Working Group

The XML Query Working Group will solicit feedback from the Internationalization Working Group to ensure that it satisfies W3C goals for international access to the Web.

XML Fragments Working Group

It may be necessary for the XML Query Working Group to reference the XML Fragment specification if a valid query return type is an XML fragment.

IETF DASL Working Group

XML Query must strive for smooth interaction with the IETF DASL (DAV Searching & Locating) Working Group, in such a way that the XML query language can be easily incorporated into the DASL protocol.

Formal liaison between the XML Query Working Group and other W3C working groups, including the other XML working groups and the WAI (Web Accessibility Initiative) group, as well as organizations outside of the W3C, shall be accomplished by the exchange of documents (requirements, reviews, etc.) transmitted through the XML Coordination Group.

5 References (non-normative)

The following references are some of the works considered by the WG in deriving its requirements.

QL98
Query Languages'98 (QL'98), W3C, 1998.
Maier98
Database Desiderata for an XML Query Language, David Maier, 1998. In Query Languages 98 (QL'98). (See http://www.w3.org/TandS/QL/QL98/pp/maier.html.)
CM98
Candidate Requirements for XML Query, Paul Cotton and Ashok Malhotra, 1998. In Query Languages 98 (QL'98). (See http://www.w3.org/TandS/QL/QL98/pp/queryreq.html.)
FSW99
XML Query Languages: Experiences and Exemplars, Mary Fernandez, Jerome Simeon, Philip Wadler, 1999. (See http://www.w3.org/1999/09/ql/docs/xquery.html.)
Robie99
The Tree Structure of XML Queries, Jonathan Robie. ( W3C members only). (See http://www.w3.org/XML/Group/1999/10/xquery-tree.html.)
XML
Extensible Markup Language (XML), Version 1.0 (second edition). W3C Recommendation, 2000. (See http://www.w3.org/TR/1998/REC-xml-19980210.)
XPath
XML Path Language (XPath), Version 1.0. W3C Recommendation, 1999. (See http://www.w3.org/TR/xpath.)
Namespaces
Namespaces in XML. W3C Recommendation, 1999. (See http://www.w3.org/TR/1999/REC-xml-names-19990114/.)
DOM
Document Object Model (DOM), Level 2 Core Specification. W3C Recommendation, 2000. (See http://www.w3.org/TR/1999/CR-DOM-Level-2-19991210.)
XSLT
XSL Transformations (XSLT), Version 1.1. W3C Recommendation, 2001. (See http://www.w3.org/TR/1999/REC-xslt-19991116.)
Infoset
XML Information Set, W3C Recommendation, 2001. (See http://www.w3.org/TR/1999/CR-DOM-Level-2-19991210.)
XMLSchema0
XML Schema Part 0: Primer, W3C Recommendation, 2001. (See http://www.w3.org/TR/xmlschema-0/.)
XMLSchema1
XML Schema Part 1: Structures, W3C Recommendation, 2001. (See http://www.w3.org/TR/xmlschema-1/.)
XMLSchema2
XML Schema Part 2: Datatypes, W3C Recommendation, 2001. (See http://www.w3.org/TR/xmlschema-2/.)

A Glossary

Universal and Existential Quantifiers

A quantifier is a term denoting a constraint on the number of objects in a collection that satisfy an accompanying condition. The existential quantifier denotes that at least one object satisfies the condition. The universal quantifier denotes that all objects satisfy the condition.

Document

A document consists of the set of nodes and edges in the subtree descended from a Document node in the XML Query Data Model.

Inter-document references

References that refer to nodes that do not reside in the same XML document as the reference itself.

Intra-document references

References that reside in the same XML document as the nodes they reference.

Literal Data

Literal fragments of an XML document such as <name><first>Joe</first><last>Doe</last></name>, which may be used for comparison.