Stylus Studio XML Editor

Table of contents

Appendices

7.1 The Influence of Serialization Parameters upon the HTML Output Method

The Influence of Serialization Parameters upon the HTML Output Method

HTML Output Method: Markup for Elements[top]

HTML Output Method: Markup for Elements

The html output method SHOULD MUST NOT output an element differently from the xml output method unless the expanded-QName of the element has a null namespace URI; an element whose expanded-QName has a non-null namespace URI SHOULD MUST be output as XML. If the expanded-QName of the element has a null namespace URI, but the local part of the expanded-QName is not recognized as the name of an HTML element, the element SHOULD MUST be output in the same way as a non-empty, inline element such as span. In particular:

  1. If the result tree contains namespace nodes for namespaces other than the XML namespace, the HTML output method will MUST represent these namespaces using attributes named xmlns or xmlns:prefix in the same way as the XML output method would represent them when the version parameter is set to 1.0.

  2. If the result tree contains elements or attributes whose names have a non-null namespace URI, the HTML output method will MUST generate namespace-prefixed QNames for these nodes in the same way as the XML output method would do when the version parameter is set to 1.0.

  3. Where special rules are defined later in this section for serializing specific HTML elements and attributes, these rules are never MUST NOT be applied to an element or attribute whose name has a non-null namespace URI. However, the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, MUST be used also for namespaced elements and attributes.

  4. When serializing an element whose name is not defined in the HTML specification, but that is in the null namespace, the HTML output method SHOULD MUST apply the same rules (for example, indentation rules) as when serializing a span element. The descendants of such an element SHOULD MUST be serialized as if they were descendants of a span element.

  5. When serializing an element whose name is in a non-null namespace, the HTML output method SHOULD MUST apply the same rules (for example, indentation rules) as when serializing a div element. The descendants of such an element SHOULD MUST be serialized as if they were descendants of a div element.

The html output method SHOULD MUST NOT output an end-tag for empty elements. For HTML 4.0, the empty elements are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in an XSLT stylesheet SHOULD MUST be output as <br>.

The html output method SHOULD MUST recognize the names of HTML elements regardless of case. For example, elements named br, BR or Br SHOULD MUST all be recognized as the HTML br element and output without an end-tag.

The html output method SHOULD MUST NOT perform escaping for the content of the script and style elements.

For example, a script element created by an XQuery direct element constructor or an XSLT literal result element, such as:

<script>if (a &lt; b) foo()</script>

or

<script><![CDATA[if (a < b) foo()]]></script>

SHOULD MUST be output as

<script>if (a < b) foo()</script>

A common requirement is to output a script element as shown in the example below:

<script type="text/javascript">
      document.write ("<em>This won't work</em>")
</script>

This is illegal HTML, for the reasons explained in section B.3.2 of the HTML 4.01 specification. Nevertheless, it is possible to output this fragment, using either of the following constructs:

Firstly, by use of a script element created by an XQuery direct element constructor or an XSLT literal result element:

<script type="text/javascript">
      document.write ("<em>This won't work</em>")
</script>

Secondly, by constructing the markup from ordinary text characters:

<script type="text/javascript">
      document.write ("&lt;em&gt;This won't work&lt;/em&gt;")
</script>

As the HTML specification points out, the correct way to write this is to use the escape conventions for the specific scripting language. For JavaScript, it can be written as:

<script type="text/javascript">
      document.write ("&lt;em&gt;This will work&lt;\/em&gt;")
</script>

The HTML 4.01 specification also shows examples of how to write this in various other scripting languages. The escaping MUST be done manually, it will not be done by the serializer.

HTML Output Method: Writing Attributes[top]

HTML Output Method: Writing Attributes

The html output method MUST NOT escape "<" characters occurring in attribute values.

If the indent parameter has the value yes, then the html output method MAY add or remove whitespace as it outputs the instance of the data model, so long as it does not change how an HTML user agent would render the output.

Unless If the escape-uri-attributes parameter is specified and has the value no yes, the html output method SHOULD MUST escape non-ASCII characters in URI attribute values using the method defined by Section 5.4 of [XLINK], except that relative URIs MUST NOT be absolutized. RECOMMENDED in [RFC2396] (section 2.4.1).

NOTE: 

This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting escape-uri-attributes to no, and controlling the escaping of URIs by means of the fn:escape-uri function defined in [FANDO].

The html output method MUST output boolean attributes (that is attributes with only a single allowed value that is equal to the name of the attribute) in minimized form.

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<OPTION selected="selected">

MUST be output as

<OPTION selected>

The html output method SHOULD MUST NOT escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML 4.0 Recommendation).

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<BODY bgcolor='&{{randomrbg}};'>

SHOULD MUST be output as

<BODY bgcolor='&{randomrbg};'>

HTML Output Method: Indentation[top]

HTML Output Method: Indentation

If the indent attribute has the value yes, then the html output method MAY add or remove whitespace as it outputs the result tree, so long as it does not change the way that a conforming HTML user agent would render the output. The default value is yes.

NOTE: 

This rule can be satisfied by observing the following constraints:

Whitespace MUST only be added before or after an element, or adjacent to an existing whitespace character.

Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those included in the %inline category of any of the HTML 4.01 DTD's, as well as the INS and DEL elements if they are used as inline elements (i.e., if they do not contain element children).

Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, and textarea.

Note that the HTML definition of whitespace is different from the XML definition: see section 9.1 of the HTML 4.01 specification.

HTML Output Method: Writing Character Data[top]

HTML Output Method: Writing Character Data

The html output method MAY output a character using a character entity reference in preference to using a numeric character reference, if an entity is defined for the character in the version of HTML that the output method is using. Entity references and character references SHOULD be used only where the character is not present in the selected encoding, or where the visual representation of the character is unclear (as with &nbsp;, for example).

When outputting a sequence of whitespace characters in the instance of the data model, within an element where whitespace is treated normally (but not in elements such as pre and textarea), the html output method is free to MAY represent it using any sequence of whitespace that will be treated as whitespace in the same way by an HTML user agent. See section 3.5 of [xhtml-mod] for some additional information on handling of whitespace by an HTML user agent.

Certain characters, specifically the control characters #x7F-#x9F, are legal in XML but not in HTML. It is a serialization error to use the HTML output method when such characters appear in the instance of the data model. The processor serializer MAY signal the error, but is not REQUIRED to do so. If it does not signal the error, it MAY copy the offending characters into the serialized output, creating invalid HTML.

The html output method SHOULD MUST terminate processing instructions with > rather than ?>.

HTML Output Method: Encoding[top]

HTML Output Method: Encoding

The encoding parameter specifies the preferred encoding to be used. Processors Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error occurs if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding. The processor serializer MUST signal the error.

If there is a HEAD element, then unless and the include-content-type parameter is specified and has the value no yes, the html output method MUST add a META element immediately after the start-tag of the HEAD element specifying the character encoding actually used.

For example,

<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
...

The content type MUST be set to the value given for the media-type parameter; the default value is text/html.

addCIf the instance of the data model includes a head element that has a meta element child, the processor serializer SHOULD replace any content attribute of the meta element, or add such an attribute, with the value as described above, rather than output a new meta element.

It is possible that the instance of the data model will contain a character that cannot be represented in the encoding that the processor serializer is using for output. In this case, if the character occurs in a context where HTML recognizes character references, then the character SHOULD MUST be output as a character entity reference or decimal numeric character reference; otherwise (for example, in a script or style element or in a comment), the processor serializer SHOULD MUST signal a serialization error.

HTML Output Method: Document Type Declaration[top]

HTML Output Method: Document Type Declaration

If the doctype-public or doctype-system parameters are specified, then the html output method SHOULD MUST output a document type declaration immediately before the first element. The name following <!DOCTYPE SHOULD MUST be HTML or html. If the doctype-public parameter is specified, then the output method SHOULD MUST output PUBLIC followed by the specified public identifier; if the doctype-system parameter is also specified, it SHOULD MUST also output the specified system identifier following the public identifier. If the doctype-system parameter is specified but the doctype-public parameter is not specified, then the output method SHOULD MUST output SYSTEM followed by the specified system identifier.

HTML Output Method: Unicode Normalization[top]

HTML Output Method: Unicode Normalization

The delEnormalize-unicode addEnormalization-form parameter is applicable for the html output method. The values NFC and none MUST be supported by the processor serializer. A serialization error results if the value of the normalization-form parameter specifies a normalization form that is not supported by the processor serializer; the processor serializer MUST signal the error.

HTML Output Method: Other Parameters[top]

HTML Output Method: Other Parameters

The media-type parameter is applicable for the html output method. See [serparam] for more information.

delGThe use-character-maps parameter is applicable for the xml output method.

The use-character-maps parameter is applicable for the html output method. See [character-maps] for more information.

addGThe byte-order-mark parameter is applicable for the html output method. See [serparam] for more information.