[Home] [By Thread] [By Date] [Recent Entries]
> > An XML Parser will make an initial "guess" of the encoding based upon > the presence or absence of a Byte Order Mark (BOM). The XML parser then > interprets the bit strings using that guess up to the first ">" > character (the end of the XML declaration). > If the encoding isn't known in advance then (in theory) you don't know where the first > is (as you don't know how > is encoded) > Now that it knows the "real" encoding it interprets the rest of the > document using the encoding it found in the XML declaration. That still makes it sound as if the encoding declaration is read using a different encoding from the rest of the document. Once an encoding has been determined then the encoding declaration line itself must be consistent with that encoding. You can't use one byte per character ascii <?xml version="1.0" encoding="utf-16"?> and then read the rest of the file using two (or four) bytes per character. Suppose I have an encoding "my-encoding" that's the same as as ascii except that > and < are swapped round. then the following is a well formed document >?xml version="1.0" encoding="my-encoding"< >foo<hello>/foo< The parser knows it's been handed an xml file, can tell that it's not going to parse as utf8 so there must be an xml declaration, so the first tfew bytes must encode "<?xml" it sees the bytes it sees and the only encoding it knows about in which that sequence encodes "<?xmlis the "my-encoding" encoding so proceeds on that basis, which means it successfullt finds encoding="my-encoding" and knows all is well... David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



