[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: BOM and encodings questions
In article <B546C312A37C12438A22154026CDC7E011ED9B16@e...> you write: >If an XML document starts with the FF FE BOM (UTF-16, little endian) but >the encoding is set to "UTF-8" in the prolog, what is the expected >behavior of the Parser? The BOM says that the document is in UTF-16. If it isn't in UTF-16, then it's broken at the encoding level, and this is a fatal error. If it *is* in UTF-16, the encoding declaration is wrong. This is a fatal error unless there was some external indication (e.g. from HTTP) that the document is supposed to be in UTF-16. >I think that the parser should respect the BOM, read the prolog assuming >it is encoded in UTF-16 little endian and then process the remaining of >the XML document in UTF-8 as the prolog says. No. XML entities must be in a single encoding. (The spec doesn't say this explicitly, but it is clear that that's what's intended.) >Is an XML parser expected to process a document in alternating >encodings? I mean, is there a way to signal the parser that from a >certain point on the encoding changes to some other encoding? If so, >how? An XML document can be made up of multiple entities which may have different encodings. There's no way to mix encodings in a single entity. >Is there a way to express the expected encoding of the XML document in >the XML Schema? If so, how? No, the schema is applied after parsing the document. -- Richard -- "Consideration shall be given to the need for as many as 32 characters in some alphabets" - X3.4, 1963.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|