This document is also available in these non-normative formats: XML.
Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document defines serialization for the [XSLT 2.0] and [XQuery 1.0] specifications and any other specifications that reference it.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This is a Public Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.
This document describes how [XSLT 2.0] and [XQuery 1.0] convert an instance of the [Data Model] into a sequence of octets. This material has been moved out of the XSLT draft and into a separate document so that it can be shared by both the named specifications and possibly other specifications as well.
XSLT 2.0 and XQuery 1.0 Serialization has been defined jointly by the XSL Working Group and the XML Query Working Group (both part of the XML Activity).
Comments on this document should be sent to the W3C mailing list public-qt-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.html.
1 Introduction
2 Serializing Arbitrary Data
Models
3 Serialization Parameters
4 XML Output Method
4.1 XML Output
Method: the version Parameter
4.2 XML Output
Method: the encoding Parameter
4.3 XML Output
Method: the indent Parameter
4.4 XML Output
Method: the cdata-section-elements Parameter
4.5 XML Output
Method: the omit-xml-declaration Parameter
4.6 XML Output
Method: the doctype-system and doctype-public
Parameters
4.7 XML Output
Method: the undeclare-namespaces Parameter
4.8 XML Output
Method: Other Parameters
5 XHTML Output Method
6 HTML Output Method
6.1 HTML Output
Method: Markup for Elements
6.2 HTML Output
Method: Writing Attributes
6.3 HTML Output
Method: Indentation
6.4 HTML Output
Method: Writing Character Data
6.5 HTML Output
Method: Encoding
6.6 HTML Output
Method: Document Type Declaration
6.7 HTML Output
Method: Other Parameters
7 Text Output Method
8 Character Maps
This document defines serialization of the W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of at least [XPath 2.0], [XSLT 2.0], and [XQuery 1.0], and any other specifications that reference it.
Ed. Note: This material has been moved out of the XSLT draft and into a separate document. The Working Groups also considered moving this material directly into the Data Model document, but elected to keep it separate for the moment, principally in order to advance the Data Model to Last Call. In the future, this material may be moved into the Data Model. The Working Groups solicit public opinion about which alternative is superior.
Serialization is the process of converting an instance of the [Data Model] into a sequence of octets. Serialization is well-defined for most data model instances.
Ed. Note: The document assumes the reader already knows generally what serialization is. A brief explanation will be added, especially to disabuse any reader who thinks it might mean Java (or .NET) serialization.
The XQuery 1.0 and XPath 2.0 Data Model is richer and less constrained than XML. There are valid instances of the data model that have no direct analog in XML. In particular, data model instances can contain typed values, sequences, and sequences of typed values. And whereas XML deals only with "documents", data model instances can have as their root any node type, simple value, or sequence and may even be empty.
This section describes how to convert an arbitrary data model instance into one of several simplified forms. We then describe how these forms are serialized. This greatly simplifies the the sections which follow. Implementations are not required to implement serialization of arbitrary data model instances in this way, provided that they produce the same results as this conceptual model.
If the data model instance contains any typed or
untyped values, or sequences that contain typed or
untyped values, convert them to strings: obtain the
lexical representation of each value by casting it to
an xs:string
and replace the value with
its string representation.
If adjacent strings occur in a sequence, replace both values with their concatenation separated by a single space.
If empty sequences occur, replace them with the empty string.
To complete the simplification, perform the following steps interactively until a simplest form is reached:
If the data model instance has as its root an attribute or namespace node, or a QName value, or if it has as its root a sequence which contains one of these items, serialization is undefined.
If the data model instance has as its root a single document node, or an element, processing instruction, comment, or text node, or a sequence of only element, processing instruction, comment, and text nodes, it is already in its simplest form.
If the data model instance has as its root a sequence of document nodes, or a sequence which contains document nodes, replace each document node with its children in document order.
If the data model instance has as its root a string value, or a sequence which contains one or more string values, replace each string value with a text node that contains the same string.
If there are any remaining string values among the children of elements in the data model instance, replace them with text nodes that contain the same string values and merge adjacent text nodes.
There are a number of parameters that influence how serialization is performed. Host languages may allow users to specify any or all of these parameters, but they are not required to be able to do so.
The following serialization parameters are defined:
Ed. Note: Here and throughout the document, the distinction between "should" and "must" will be revisited. When serialization was described in the XSLT specification, use of "should" helped to clarify that the serialization process was optional. Now that it's described here in a standalone specification, many of those clauses should use "must".
encoding
specifies the preferred
character encoding that the processor should use to
encode sequences of characters as sequences of bytes;
the value of the parameter should be treated
case-insensitively; the value must contain only
characters in the range #x21 to #x7E (i.e. printable
ASCII characters); the value should either be a
charset
registered with the Internet
Assigned Numbers Authority [IANA],
[RFC2278] or start with
X-
If this parameter is not specified, the encoding used is implementation defined.
cdata-section-elements
specifies a list
of the names of elements whose text node children
should be output using CDATA sections
If this parameter is not specified, no elements will be treated specially.
doctype-system
specifies the system
identifier to be used in the document type
declaration
If this parameter is not specified, no system
identifer will be generated. For XML and XHTML output
methods, no public identifer will be generated either,
regardless of the setting of
doctype-public
.
doctype-public
specifies the public
identifier to be used in the document type
declaration
If this parameter is not specified, no public identifier will be generated.
escape-uri-attributes
specifies whether
the processor should escape URI-valued attributes in
HTML and XHTML output using the method recommended in
[RFC2396] (section 2.4.1). The
value must be yes
or no
.
If this parameter is not specified, the value is implementation defined.
include-content-type
specifies whether
the serialization process should add a
meta
element in HTML and XHTML output. The
value must be yes
or no
.
If this parameter is not specified, the value is implementation defined.
indent
specifies whether the processor
may add additional whitespace when outputting the data
model; the value must be yes
or
no
If this parameter is not specified, the value is implementation defined.
media-type
specifies the media type
(MIME content type) of the data that results from
outputting the data model; the charset
parameter should not be specified explicitly; instead,
when the top-level media type is text
, a
charset
parameter should be added
according to the character encoding actually used by
the output method
If this parameter is not specified, the media type is implementation defined.
normalize-unicode
indicates whether or
not the serialization process should convert the
serialized output to Unicode Normalization Form C as
specified in [Unicode
Normalization]. The value must be yes
or no
.
If this parameter is not specified, the value is implementation defined.
omit-xml-declaration
specifies whether
the serialization process should output an XML
declaration; the value must be yes
or
no
If this parameter is not specified, the value is implementation defined.
standalone
specifies whether the
processor should output a standalone document
declaration and what it's value should be; the value
must be yes
or no
If this parameter is not specified, no standalone document declaration should be output.
undeclare-namespaces
specifies whether
namespaces should be undeclared during serialization;
the value must be yes
or
no
.
If this parameter is not specified, the value is implementation defined.
This parameter only applies when the XML serialization method is used and the version is greater than 1.0.
use-character-maps
provides a list of
character/string pairs that are used in serialization
(see 8 Character
Maps).
If this parameter is not specified, no character maps are used.
version
specifies the version of the
output method
If this parameter is not specified, the value is implementation defined.
The method
identifies the overall method
that should be used for serializing. The value of the
method
parameter must be a valid QName. If the
QName is in no namespace, then it identifies a method
specified in this document and must be one of
xml
, html
, xhtml
, or
text
. If the QName is in a namespace, then it
identifies the output method; the behavior in this case is
not specified by this document.
The detailed semantics of each parameter will be described separately for each output method for which it is applicable. If the semantics of a parameter are not described for an output method, then it is not applicable to that output method.
Serialization can be regarded as involving four phases of processing, carried out sequentially as follows:
Markup generation produces the
representation of start and end tags for elements, and
other constructs such as XML declarations, processing
instructions, and so on. This is influenced by the
parameters method
,
doctype-system
,
doctype-public
,
include-content-type
, indent
,
omit-xml-declaration
,
standalone
, and version
.
Character expansion is concerned with the representation of characters appearing in text and attribute nodes in the data model. The substitution processes that may apply are listed below, in priority order: a character that is handled by a one process in this list will be unaffected by processes appearing later in the list:
URI escaping (in the case of URI-valued
attributes in the HTML and XHTML output methods),
as determined by the
escape-uri-attributes
parameter
Creation of CDATA sections, as determined by the
cdata-section-elements
parameter. Note
that this is also affected by the
encoding
parameter, in that characters
not present in the selected encoding cannot be
represented in a CDATA section.
Character mapping, as determined by the
use-character-maps
parameter.
Escaping of special characters according to XML
or HTML rules, for example replacing
<
by <
Unicode Normalization, if requested by the
normalize-unicode
parameter. Unicode
normalization is applied to the character stream that
results after all markup generation and character
expansion has taken place.
Encoding, as controlled by the
encoding
parameter. This converts the
character stream produced by the previous phases into a
byte stream.
The xml
output method outputs the data
model as an XML entity that should satisfy the rules for
either a well-formed XML document entity, or a well-formed
XML external general parsed entity, or both. If the
document node of the data model has a single element node
child and no text node children, then the serialized output
should be a well-formed XML document entity conforming to
the XML Namespaces Recommendation [XML
Names]. If the data model does not take this form, then
the serialized output should be an entity which, when
referenced within a trivial XML document wrapper like
this
<!DOCTYPE doc [ <!ENTITY e SYSTEM "entity-URI"> ]> <doc>&e;</doc>
where entity-URI
is a URI for the entity,
produces a document which should itself be a well-formed
XML document conforming to the XML Namespaces
Recommendation [XML Names].
In addition, the output should be such that if a new tree was constructed by parsing the XML document and converting it into a data model as specified in this document, then the new data model would be the same as starting data model, with the following possible exceptions:
If the document was produced by adding a document
wrapper, as described above, then it will contain an
extra doc
element as the document
element.
The order of attribute and namespace nodes in the two trees may be different.
The base URIs of nodes in the two trees may be different.
The new tree may contain additional attributes and text nodes resulting from the expansion of default and fixed values in its DTD or schema.
The type annotations of the nodes in the two trees may be different. Type annotations in a result tree are discarded when the tree is serialized. Any new type annotations obtained by parsing the document will be derived by processing the serialized XML document against a schema, and this may result in type annotations that are either more or less precise than those in the original result tree.
A consequence of this rule is that certain whitespace
characters should be output as character references, to
ensure that they survive the round trip through
serialization and parsing. Specifically, CR characters in
text nodes should be written as 
or
an equivalent; while CR, NL, and TAB characters in
attribute nodes should be output respectively as

, 

, and
	
, or their equivalents.
For example, an attribute with the value "x" followed by
"y" separated by a newline will result in the output
"x
y"
(or with any equivalent
character reference). The XML output cannot be "x" followed
by a literal newline followed by a "y" because after
parsing, the attribute value would be "x y"
as
a consequence of the XML attribute normalization rules.
Note: To anticipate the proposed changes to end-of-line handling in XML 1.1, implementations may also output the characters x85 and x2028 as character references. This will not affect the way they are interpreted by an XML 1.0 parser.
It is a serialization error to request the output of a
document type declaration, or of a standalone
parameter, if the data model contains text nodes or
multiple element nodes as children of the root node. The
processor may signal the error, or may recover by ignoring
the request to output a document type declaration or
standalone
parameter.
The result of serialization using the XML output method is not guaranteed to be well-formed XML if character maps have been specified (see 8 Character Maps) or if nodes in the data model contain characters that are invalid in XML (introduced, perhaps, by calling a user-written extension function: this is an error but the processor is not required to signal it).
version
ParameterThe version
parameter specifies the
version of XML to be used for outputting the data model.
If the processor does not support this version of XML, it
should use a version of XML that it does support. The
version output in the XML declaration (if an XML
declaration is output) should correspond to the version
of XML that the processor used for outputting the data
model. The value of the version
parameter
should match the VersionNum
production of the XML Recommendation [XML].
encoding
ParameterThe encoding
parameter specifies the
preferred encoding to use for outputting the data model.
Processors are required to respect values of
UTF-8
and UTF-16
. A
serialization error occurs when an output encoding other
than UTF-8
or UTF-16
is
requested, if the implementation does not support that
encoding. The processor may signal the error, or may
recover by using UTF-8
or
UTF-16
instead. The processor must not use
an encoding whose name does not match the EncName
production of the XML Recommendation [XML]. If no encoding
parameter is specified, then the processor should use
either UTF-8
or UTF-16
.
When outputting a newline character in the data model, the implementation is free to represent it using any character sequence that will be normalized to a newline character by an XML parser, unless a specific mapping for the newline character is provided in a character map: see 8 Character Maps.
When outputting any other character that is defined in the selected encoding, the character should be output using the correct representation of that character in the selected encoding.
It is possible that the data model will contain a character that cannot be represented in the encoding that the processor is using for output. In this case, if the character occurs in a context where XML recognizes character references (that is, in the value of an attribute node or text node), then the character should be output as a character reference. A serialization error occurs if such a character appears in a context where character references are not allowed (for example if the character occurs in the name of an element). The processor should signal the error.
indent
ParameterIf the indent
parameter has the value
yes
, then the xml
output method
may output whitespace in addition to the whitespace in
the data model (possibly based on whitespace stripped
from either the source document or the stylesheet) in
order to indent the result nicely; if the
indent
parameter has the value
no
, it should not output any additional
whitespace. The xml
output method should use
an algorithm to output additional whitespace that
satisfies the following constraints:
Whitespace characters must not be added adjacent to a text node that contains non-whitespace characters.
Whitespace may only be added adjacent to an element node, that is, immediately before a start tag or immediately after an end tag.
The new whitespace characters may replace existing whitespace characters in the same position, for example a tab may be inserted as a replacement for existing spaces. However, existing whitespace must not be removed without such a replacement.
Whitespace characters must not be inserted in a
part of the result document that is controlled by an
xml:space="preserve"
attribute.
Note: The effect of these rules is to ensure that whitespace may only be added in places where (a) XSLT's
<xsl:strip-space>
declaration could cause it to be removed, and (b) it does not affect the string value of any element node with simple content. It is usually not safe to indent document types that include elements with mixed content.
cdata-section-elements
ParameterThe cdata-section-elements
parameter
contains a list of expanded-QNames. If the expanded-QName
of the parent of a text node is a member of the list,
then the text node should be output as a CDATA
section.
If the text node contains the sequence of characters
]]>
, then the currently open CDATA
section should be closed following the ]]
and a new CDATA section opened before the
>
.
If the text node contains characters that are not representable in the character encoding being used to output the data model, then the currently open CDATA section should be closed before such characters, the characters should be output using character references or entity references, and a new CDATA section should be opened for any further characters in the text node.
CDATA sections should not be used except where they
have been explicitly requested by the user, either by
using the cdata-section-elements
parameter,
or by using some other implementation-defined
mechanism.
Note: This is phrased to permit an implementor to provide an option that attempts to preserve CDATA sections present in the source document.
omit-xml-declaration
ParameterThe xml
output method should output an
XML declaration unless the
omit-xml-declaration
parameter has the value
yes
. The XML declaration should include both
version information and an encoding declaration. If the
standalone
parameter is specified, it should
include a standalone document declaration with the same
value as the value as the value of the
standalone
parameter. Otherwise, it should
not include a standalone document declaration; this
ensures that it is both an XML declaration (allowed at
the beginning of a document entity) and a text
declaration (allowed at the beginning of an external
general parsed entity).
The omit-xml-declaration
parameter should
be ignored if the standalone
parameter is
present, or if the encoding
parameter
specifies a value other than UTF-8 or UTF-16.
doctype-system
and
doctype-public
ParametersIf the doctype-system
parameter is
specified, the xml
output method should
output a document type declaration immediately before the
first element. The name following
<!DOCTYPE
should be the name of the first
element. If doctype-public
parameter is also
specified, then the xml
output method should
output PUBLIC
followed by the public
identifier and then the system identifier; otherwise, it
should output SYSTEM
followed by the system
identifier. The internal subset should be empty. The
doctype-public
parameter should be ignored
unless the doctype-system
parameter is
specified.
undeclare-namespaces
ParameterThe Data Model allows an element to have fewer
in-scope namespaces than its parent. In XML 1.1, this can
be represented most accurately by undeclaring namespaces.
If undeclare-namespaces
is
"yes
" and the output method is XML and the
version
is greater than 1.1, serialization
should undeclare namespaces.
Consider an element x:foo
with three
in-scope namespaces:
<x:foo xmlns:x="http://example.org/x" xmlns:y="http://example.org/y" xmlns:z="http://example.org/z">
Suppose that it has a child element with two in-scope namespaces:
<x:bar xmlns:x="http://example.org/x" xmlns:y="http://example.org/y">...
If namespace undeclaration is in effect, it will be serialized this way:
<x:foo xmlns:x="http://example.org/x" xmlns:y="http://example.org/y" xmlns:z="http://example.org/z"> <x:bar xmlns:z="">...</x:bar> </x:foo>
In XML 1.0, namespace undeclaration is not possible.
The xhtml
output method serializes the data
model as XML, using the HTML compatibility guidelines
defined in the XHTML specification.
It is entirely the responsibility of the stylesheet author to ensure that the data model conforms to the [XHTML 1.0] or [XHTML 1.1] specification. It is not an error if the data model is invalid XHTML. Equally, it is entirely under the control of the stylesheet author whether the output conforms to XHTML Strict, XHTML Transitional, XHTML Frameset, or XHTML Basic.
The serialization of the data model follows the same
rules as for the xml
output method, with the
exceptions noted below. These differences are based on the
HTML compatibility guidelines published in Appendix C of [XHTML 1.0], which are designed to
ensure that as far as possible, XHTML is rendered correctly
on user agents designed originally to handle HTML.
Given an empty instance of an XHTML
element whose content model is not EMPTY (for example,
an empty title or paragraph) the serializer should not
use the minimized form. That is, it should output
<p></p>
and not
<p />
.
Given an XHTML element whose content model is EMPTY,
the serializer should use the minimized tag syntax, for
example <br />
, as the
alternative syntax <br></br>
allowed by XML gives uncertain results in many existing
user agents. The serializer should include a space
before the trailing />
, e.g.
<br />
, <hr
/>
and <img
src="karen.jpg" alt="Karen"
/>
.
The serializer should avoid outputting line breaks and multiple whitespace characters within attribute values. These are handled inconsistently by user agents.
The serializer should avoid use of the entity
reference '
which, although legal
in XML and therefore in XHTML, is not defined in HTML
and is not recognized by all HTML user agents.
The serializer should output namespace declarations
in a way that is consistent with the requirements of
the XHTML DTD if this is possible. The DTD requires the
declaration
xmlns="http://www.w3.org/1999/xhtml"
to
appear on the html
element, and only on
the html
element. The serializer must
output namespace declarations that are consistent with
the namespace nodes present in the result tree, but it
should avoid outputting redundant namespace
declarations on elements where the DTD would make them
invalid.
Note: Although the specification of the namespace fixup process provides no guarantees about the namespace prefixes that are allocated, implementors are encouraged to ensure that where possible, writing the literal result element
<html xmlns="http://www.w3.org/1999/xhtml"> ... </html>
places the resultinghtml
element in the default namespace.
If the data model includes a head
element in the XHTML namespace, then unless the
include-content-type
parameter has the
value "no"
, the xhtml
output
method should add a meta
element
immediately after the start-tag of the
head
element specifying the character
encoding actually used.
For example,
<head> <meta http-equiv="Content-Type" content="text/html; charset=EUC-JP"/> ...
The content type should be set to the value given
for the media-type
parameter; the default
value for XHTML is text/html
. The value
application/xhtml+xml
, registered in [RFC3236], may also be used.
Unless the escape-uri-attributes
parameter has the value no
, the
xhtml
output method should escape
non-ASCII characters in URI attribute values using the
method recommended in [RFC2396]
(section 2.4.1).
Note: This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting
escape-uri-attributes
tono
, and controlling the escaping of URIs by means of the fn:escape-uri function defined in [Functions and Operators].
Note: As with the XML output method, the XHTML output method outputs an XML declaration unless it is suppressed using the
omit-xml-declaration
parameter. Appendix C.1 of the XHTML specification provides advice on the consequences of including, or omitting, the XML declaration.
The html
output method outputs the data
model as HTML.
For example,
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <html> <xsl:apply-templates/> </html> </xsl:template> ... </xsl:stylesheet>
The version
parameter indicates the version
of the HTML. The default value is 4.0
, which
specifies that the result should be output as HTML
conforming to the HTML 4.0 Recommendation [HTML].
The html
output method should not output
an element differently from the xml
output
method unless the expanded-QName of the element has a
null namespace URI; an element whose expanded-QName has a
non-null namespace URI should be output as XML. If the
expanded-QName of the element has a null namespace URI,
but the local part of the expanded-QName is not
recognized as the name of an HTML element, the element
should output in the same way as a non-empty, inline
element such as span
. In particular:
If the result tree contains namespace nodes for
namespaces other than the XML namespace, the HTML
output method will represent these namespaces using
attributes named xmlns
or
xmlns:
prefix in the same way as
the XML output method would represent them when the
version parameter is set to 1.0.
If the result tree contains elements or attributes whose names have a non-null namespace URI, the HTML output method will generate namespace-prefixed QNames for these nodes in the same way as the XML output method would do when the version parameter is set to 1.0.
Where special rules are defined later in this section for serializing specific HTML elements and attributes, these rules are never applied to an element or attribute whose name has a non-null namespace URI. However, the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, must be used also for namespaced elements and attributes.
When serializing an element whose name is not
defined in the HTML specification, but that is in the
null namespace, the HTML output method should apply
the same rules (for example, indentation rules) as
when serializing a span
element. The
descendants of such an element should be serialized
as if they were descendants of a span
element.
When serializing an element whose name is in a
non-null namespace, the HTML output method should
apply the same rules (for example, indentation rules)
as when serializing a div
element. The
descendants of such an element should be serialized
as if they were descendants of a div
element.
The html
output method should not output
an end-tag for empty elements. For HTML 4.0, the empty
elements are area
, base
,
basefont
, br
, col
,
frame
, hr
, img
,
input
, isindex
,
link
, meta
and
param
. For example, an element written as
<br/>
or
<br></br>
in an XSLT stylesheet
should be output as <br>
.
The html
output method should recognize
the names of HTML elements regardless of case. For
example, elements named br
, BR
or Br
should all be recognized as the HTML
br
element and output without an
end-tag.
The html
output method should not perform
escaping for the content of the script
and
style
elements.
For example, a literal result element such as:
<script>if (a < b) foo()</script>
or
<script><![CDATA[if (a < b) foo()]]></script>
should be output as
<script>if (a < b) foo()</script>
A common requirement is to output a
script
element as shown in the example
below:
<script type="text/javascript"> document.write ("<em>This won't work</em>") </script>
This is illegal HTML, for the reasons explained in section B.3.2 of the HTML 4.01 specification. Nevertheless, it is possible to output this fragment, using either of the following constructs:
Firstly, by use of a literal result element:
<script type="text/javascript"> document.write ("<em>This won't work</em>") </script>
Secondly, by constructing the markup from ordinary text characters:
<script type="text/javascript"> document.write ("<em>This won't work</em>") </script>
As the HTML specification points out, the correct way to write this is to use the escape conventions for the specific scripting language. For JavaScript, it can be written as:
<script type="text/javascript"> document.write ("<em>This will work<\/em>") </script>
The HTML 4.01 specification also shows examples of how to write this in various other scripting languages. The escaping must be done manually, it will not be done by the serializer.
The html
output method should not escape
"<
" characters occurring in attribute
values.
If the indent
parameter has the value
yes
, then the html
output
method may add or remove whitespace as it outputs the
data model, so long as it does not change how an HTML
user agent would render the output.
Unless the escape-uri-attributes
parameter is present and has the value no
,
the html
output method should escape
non-ASCII characters in URI attribute values using the
method recommended in [RFC2396]
(section 2.4.1).
Note: This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting
escape-uri-attributes
tono
, and controlling the escaping of URIs by means of the fn:escape-uri function defined in [Functions and Operators].
The html
output method should output
boolean attributes (that is attributes with only a single
allowed value that is equal to the name of the attribute)
in minimized form.
For example, a start-tag written in the stylesheet as
<OPTION selected="selected">
should be output as
<OPTION selected>
The html
output method should not escape
a &
character occurring in an attribute
value immediately followed by a {
character
(see Section B.7.1 of the HTML 4.0 Recommendation).
For example, a start-tag written in the stylesheet as
<BODY bgcolor='&{{randomrbg}};'>
should be output as
<BODY bgcolor='&{randomrbg};'>
If the indent
attribute has the value
yes
, then the html
output
method may add or remove whitespace as it outputs the
result tree, so long as it does not change the way that a
conforming HTML user agent would render the output. The
default value is yes
.
Note: This rule can be satisfied by observing the following constraints:
Whitespace must only be added before or after an element, or adjacent to an existing whitespace character.
Whitespace must not be added or removed adjacent to an inline element, the inline elements being those included in the
%inline
category in the HTML 4.01 DTD.Whitespace must not be added or removed inside a formatted element, the formatted elements being
pre
,script
,style
, andtextarea
.Note that the HTML definition of whitespace is different from the XML definition: see section 9.1 of the HTML 4.01 specification.
The html
output method may output a
character using a character entity reference in
preference to using a numeric character reference, if an
entity is defined for the character in the version of
HTML that the output method is using. Entity references
and character references should be used only where the
character is not present in the selected encoding, or
where the visual representation of the character is
unclear (as with
, for
example).
When outputting a sequence of whitespace characters in
the data model, within an element where whitespace is
treated normally, (but not in elements such as
pre
and textarea
) the
html
output method is free to represent it
using any character sequence that will be treated as
whitespace by an HTML user agent.
Certain characters, specifically the control characters #x7F-#x9F, are legal in XML but not in HTML. It is an error to use the HTML output method when such characters appear in the data model. The processor may signal the error, but is not required to do so. If it does not signal the error, it may copy the offending characters into the serialized output, creating invalid HTML.
The html
output method should terminate
processing instructions with >
rather
than ?>
.
The encoding
parameter specifies the
preferred encoding to be used. If there is a
HEAD
element, then unless the
include-content-type
parameter is present
and has the value "no"
, the
html
output method should add a
META
element immediately after the start-tag
of the HEAD
element specifying the character
encoding actually used.
For example,
<HEAD> <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> ...
The content type should be set to the value given for
the media-type
parameter; the default value
is text/html
.
It is possible that the data model will contain a
character that cannot be represented in the encoding that
the processor is using for output. In this case, if the
character occurs in a context where HTML recognizes
character references, then the character should be output
as a character entity reference or decimal numeric
character reference; otherwise (for example, in a
script
or style
element or in a
comment), the processor should signal a serialization
error.
If the doctype-public
or
doctype-system
parameters are specified,
then the html
output method should output a
document type declaration immediately before the first
element. The name following <!DOCTYPE
should be HTML
or html
. If the
doctype-public
parameter is specified, then
the output method should output PUBLIC
followed by the specified public identifier; if the
doctype-system
parameter is also specified,
it should also output the specified system identifier
following the public identifier. If the
doctype-system
parameter is specified but
the doctype-public
parameter is not
specified, then the output method should output
SYSTEM
followed by the specified system
identifier.
The text
output method outputs the data
model by outputting the string-value of every text node in
the data model in document order without any escaping.
A newline character in the data model may be output using any character sequence that is conventionally used to represent a line ending in the chosen system environment.
The media-type
parameter is applicable for
the text
output method.
The encoding
parameter identifies the
encoding that the text
output method should
use to convert sequences of characters to sequences of
bytes. The default is implementation-defined.
If the data model contains a character that cannot be
represented in the encoding that the processor is using for
output, the implementation should signal a serialization
error.
The default encoding for the text
output
method is implementation-defined.
The unicode-normalization
parameter is
applicable for the text
output method.
The use-character-maps
parameter is
applicable for the html
output method.
The use-character-maps
parameter is a list
of characters and corresponding string substitutions.
Character maps allow a specific character appearing in a text or attribute node in the data model to be substituted by a specified string of characters during serialization. The string that is substituted is output "as is", and the serializer performs no checks that the resulting document is well-formed. This mechanism can therefore be used to introduce arbitrary markup in the serialized output.
Character mapping is applied to the characters that actually appear in a text or attribute node in the data model, before any other serialization operations such as escaping or Unicode normalization are applied. If a character is mapped, then it is not subjected to XML or HTML escaping, nor to Unicode normalization. The string that is substituted for a character is not validated or processed in any way by the serializer, except for translation into the target encoding. In particular, it is not subjected to XML or HTML escaping, it is not subjected to Unicode normalization, and it is not subjected to further character mapping. If the string cannot be represented using the target encoding, the serializer takes the same action as it would if the offending characters appeared directly in the data model.
Character mapping is not applied to characters in text
nodes whose parent elements are listed in the
cdata-section-elements
parameter, nor to
characters in attribute values that are subject to the URI
escaping defined for the HTML and XHTML output methods,
unless URI escaping has been disabled using the
escape-uri-attributes
parameter in the output
definition.
On serialization, occurrences of a character specified
in the use-character-maps
in text nodes and
attribute values are replaced by the corresponding string
from the use-character-maps
parameter.
Note: Using a character map can result in non-well-formed documents if the string contains XML-significant characters. For example, it is possible to create documents containing unmatched start and end tags, references to entities that are not declared, or attributes that contain tags or unescaped quotation marks.
Character mapping is applied to the characters that actually appear in a text or attribute node in the data model, before any other serialization operations such as escaping or Unicode normalization are applied.
Character mapping is not applied to characters for which
output escaping has been disabled (disabling output
escaping is an [XSLT 2.0] feature),
nor to characters in text nodes whose parent elements are
listed in the cdata-section-elements
parameter, nor to characters in attribute values that are
subject to the URI escaping defined for the HTML and XHTML
output methods, unless URI escaping has been disabled using
the escape-uri-attributes
parameter.
If a character is mapped, then it is not subjected to XML or HTML escaping.
A serialization error occurs if character mapping causes the output of a string containing a character that cannot be represented in the encoding that the processor is using for output. The processor should signal the error.