[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [Summary] Why is Encoding Metadata (e.g. encoding="UTF
Philippe Poulard said: > > I guess some parsers have additional heuristics for reading successfully > the sequence <?xml encoding="blah-blah"?> ; maybe some try-catch to > apply with the set of charset they know ? I hope they don't, unless they are specific tools for repairing broken documents. Guessing encoding is the *opposite* of the XML approach and should be strongly resisted. The XML approach is based on explicit labeling as the only approach that is reliable (which is not the same as not-stuff-up-able of course). There are many problems with guessing: * most platforms provide hundreds of character sets * most character sets belong to families which are ASCII or EBCDIC superrsets so there is not enough redundant (in the engineering-theoretic sense) information or orthogonality to know which specific sets are actually being used * most transcoders don't actually generate exceptions when an unknown byte sequence is found: older ones just ignored the sequence, others replace it with "?" or some other character, some more recent transcoders are a little better, so you cannot know * detecting encoding from statistical patterns in the text relies on the document conforming to the corpuse, to a certain extent, and may even be skewed by the use of native language markup. * guessing prevents error detection * guessing can corrupt the database So the XML system is then based on solving the problem "How do we read that label reliably?" The UTF-8 default is just low hanging fruit, because it also accepts ISO646-US (ASCII), but again it is not in any sense guessed. A system that guesses encoding is unsuitable for critical data. In a hospital record, you don't want your name to be rejected because it has some Hungarian character but you are in a German hospital, etc. Cheers Rick Jelliffe
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|