[Home] [By Thread] [By Date] [Recent Entries]

  • From: Steve Rowe <sarowe@t...>
  • To: Tim Crook <tcrook@J...>, xml-dev@l...
  • Date: Fri, 14 Jul 2000 15:12:05 -0400

Tim Crook wrote:
> I was looking around to see if there might have been a
> particular reason why expat was implemented such that no leading
> white space is allowed before the standard <?xml version="1.0" ?>
> line. You get the error XML_ERROR_MISPLACED_XML_PI if there are any
> leading carriage returns, line feeds, spaces or tabs.

From the XML Rec [1]:

  [22] prolog  ::= XMLDecl? Misc* (doctypedecl Misc*)?
  [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

I.e., if the XML declaration is in the stream, it must occupy the
first characters of the stream passed to the parser.

> From my understanding of things, the Byte Order Mark is
> what allows an XML parser to determine which character set in use.
> (see Appendix F, Autodetection of Character Encodings in
> http://www.w3.org/TR/REC-xml) If the Byte Order Mark is not found,
> shouldn't the starting content of the data stream be discarded
> until the Byte Order Mark is located?

Yes.  But by the application (or other parser user), not the parser.
Note also that Appendix F is NON-normative -- compliant parsers are
not required to produce results consistent with it.

Steve Rowe
MNIS-TextWise Labs

[1] http://www.w3.org/TR/REC-xml



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member