[Home] [By Thread] [By Date] [Recent Entries]

  • From: Philippe Poulard <philippe.poulard@s...>
  • To: Rick Jelliffe <rjelliffe@a...>
  • Date: Fri, 21 Sep 2007 09:31:08 +0200

Rick Jelliffe a écrit :
> Philippe Poulard said:
>> I guess some parsers have additional heuristics for reading successfully
>> the sequence <?xml encoding="blah-blah"?> ; maybe some try-catch to
>> apply with the set of charset they know ?
> 
> I hope they don't, unless they are specific tools for repairing broken
> documents.
> 
> Guessing encoding is the *opposite* of the XML approach and should be
> strongly resisted. The XML approach is based on explicit labeling as the
> only approach that is reliable (which is not the same as not-stuff-up-able
> of course).

This was not what I meant

XML documents are either in UTF-8, or in the encoding specified by <?xml 
encoding="blah-blah"?>
I meant that parsers must try to guess what is specified, and then to 
switch to what is specified ; this is exactly what they are doing with 
ASCII (possibly encoded in 1,2 or 4 bytes) as fortunately it is 
compatible with lots of widely used encodings (UTF-8, UCS2, ISO-8859-, 
etc) : they rely on ASCII (1,2 or 4 bytes according to the BOM, if any) 
to understand what is the encoding, or to EBCDIC

Consider this declaration :
<?kzy rapbqvat="EBG-13"?>

The deecoded form of this declaration is :
<?xml encoding="ROT-13"?>
I can get it only if I test "ROT-13" on it ; althouh it is not strictly 
spoken an encoding, a parser that would support "ROT-13" would be able 
to decode it only if it test it or if it recognize the magic ASCII 
string "<?kzy" or whatever is the "guess" heuristic.

-- 
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member