[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: expat whitespace weirdness?
* Tim Crook | | I was looking around to see if there might have been a particular | reason why expat was implemented such that no leading white space is | allowed before the standard <?xml version="1.0" ?> line. The reason is that the XML recommendation requires it. :-) | From my understanding of things, the Byte Order Mark is what allows | an XML parser to determine which character set in use. Not really. It allows a parser to determine whether UTF-16 was used, and if so which variety of UTF-16 (BE or LE). However, if UTF-16 is not used then the encoding can basically be anything. | (see Appendix F, Autodetection of Character Encodings in | http://www.w3.org/TR/REC-xml) If the Byte Order Mark is not found, | shouldn't the starting content of the data stream be discarded until | the Byte Order Mark is located? If the BOM is not at the beginning of the data stream then there most likely isn't one, for example because iso-8859-1 was used. This is what makes it so handy that the XML declaration must appear first in the document if it appears at all. The rules then become something like: a) does the stream begin with a BOM? if yes, assume UTF-16 b) does the stream begin with an XML declaration (in some encoding that the parser is able to figure out)? if yes, see what the encoding pseudo-attribute says. c) assume UTF-8 --Lars M.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|