|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: expat whitespace weirdness?
* Tim Crook
|
| I was looking around to see if there might have been a particular
| reason why expat was implemented such that no leading white space is
| allowed before the standard <?xml version="1.0" ?> line.
The reason is that the XML recommendation requires it. :-)
| From my understanding of things, the Byte Order Mark is what allows
| an XML parser to determine which character set in use.
Not really. It allows a parser to determine whether UTF-16 was used,
and if so which variety of UTF-16 (BE or LE). However, if UTF-16 is
not used then the encoding can basically be anything.
| (see Appendix F, Autodetection of Character Encodings in
| http://www.w3.org/TR/REC-xml) If the Byte Order Mark is not found,
| shouldn't the starting content of the data stream be discarded until
| the Byte Order Mark is located?
If the BOM is not at the beginning of the data stream then there most
likely isn't one, for example because iso-8859-1 was used. This is
what makes it so handy that the XML declaration must appear first in
the document if it appears at all.
The rules then become something like:
a) does the stream begin with a BOM? if yes, assume UTF-16
b) does the stream begin with an XML declaration (in some encoding
that the parser is able to figure out)? if yes, see what the
encoding pseudo-attribute says.
c) assume UTF-8
--Lars M.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








