Table of contentsAppendices |
4.3 Parsed EntitiesParsed EntitiesThe Text Declaration[top]The Text DeclarationExternal parsed entities SHOULD each begin with a text declaration. Text Declaration
The text declaration MUST be provided literally, not by reference to a parsed entity. The text declaration MUST NOT appear at any position other than the beginning of an external parsed entity. The text declaration in an external parsed entity is not considered part of its Replacement Text. Well-Formed Parsed Entities[top]Well-Formed Parsed EntitiesThe document entity is well-formed if it matches the production labeled document. An external general parsed entity is well-formed if it matches the production labeled extParsedEnt. All external parameter entities are well-formed by definition. Well-Formed External Parsed Entity
An internal general parsed entity is well-formed if its replacement text matches the production labeled content. All internal parameter entities are well-formed by definition. A consequence of well-formedness in general entities is that the logical and physical structures in an XML document are properly nested; no Start-Tag, End Tag, Empty, Element, Comment, Processing instruction, Character Reference, or Entity Reference can begin in one entity and end in another. Character Encoding in Entities[top]Character Encoding in EntitiesEach external parsed entity in an XML document MAY use a different encoding for its characters. All XML processors MUST be able to read entities in both the UTF-8 and UTF-16 encodings. The terms "UTF-8" and "UTF-16" in this specification do not apply to character encodings with any other labels, even if the encodings or labels are very similar to UTF-8 or UTF-16. Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark described in ISO/IEC 10646 [ISO10646] or Unicode [Unicode] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors MUST be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents. Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration (see [The Text Declaration]) containing an encoding declaration: Encoding Declaration
In the Document Entity, the encoding declaration is part of the XML Declaration. The EncName is the name of the encoding used. In an encoding declaration, the values " In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a Fatal Error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration. It is a Fatal Error for a TextDecl to occur other than at the beginning of an external entity. It is a Fatal Error when an XML processor encounters an entity with an encoding that it is unable to process. It is a Fatal Error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding. Specifically, it is a fatal error if an entity encoded in UTF-8 contains any irregular code unit sequences, as defined in Unicode [Unicode]. Unless an encoding is determined by a higher-level protocol, it is also a Fatal Error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16. Examples of text declarations containing encoding declarations: <?xml encoding='UTF-8'?> <?xml encoding='EUC-JP'?> Version Information in Entities[top]Version Information in EntitiesEach entity, including the Document Entity, can be separately declared as XML 1.0 or XML 1.1. The version declaration appearing in the document entity determines the version of the document as a whole. An XML 1.1 document may invoke XML 1.0 external entities, so that otherwise duplicated versions of external entities, particularly DTD external subsets, need not be maintained. However, in such a case the rules of XML 1.1 are applied to the entire document. If an entity (including the document entity) is not labeled with a version number, it is treated as if labeled as version 1.0. |