[Home] [By Thread] [By Date] [Recent Entries]

  • From: Philippe Poulard <philippe.poulard@s...>
  • To: Eric Bréchemier <eric.brechemier@g...>
  • Date: Fri, 21 Sep 2007 11:45:03 +0200

Eric Bréchemier a écrit :
>   Hello Philippe,
> 
> On 9/21/07, Philippe Poulard wrote:
>> Consider this declaration :
>> <?kzy rapbqvat="EBG-13"?>
>>
>> The deecoded form of this declaration is :
>> <?xml encoding="ROT-13"?>
>> I can get it only if I test "ROT-13" on it ; althouh it is not strictly
>> spoken an encoding, a parser that would support "ROT-13" would be able
>> to decode it only if it test it or if it recognize the magic ASCII
>> string "<?kzy" or whatever is the "guess" heuristic.
>>
> 
> I think you found an interesting example of ambiguous encoding,
> falling in the category of "Character encodings such as UTF-7 that
> make overloaded usage of ASCII-valued bytes" which "may fail to be
> reliably detected." as mentioned in
> http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
> 
> In this example, without external information, there is no way IMHO to
> know for sure with limited input whether this is a ROT-13 encoding, or
> an UTF-8 document starting with the "kzy" processing instruction.

Yes, it is a special case in the special cases, as this silly encoding 
is fully-compatible with ASCII-based encodings ; the consequence is that 
a parser that doesn't know this encoding will parse correctly this 
document ; in the same way, you can specify <?xml encoding="UTF-8"?> 
whereas your document has been encoded in ISO-8859-1 ; if your document 
doesn't use àéèê characters and others out of the ASCII-7 bit, it will 
be decoded correctly
Back to the ROT-13 example, a parser that support this encoding could 
try something before falling back to UTF-8

The tip for parsers is that when a strategy (BOM and others as specified 
in the spec., or some other hazardous heuristic) lead to <?xml 
encoding="XXX"?>, then XXX can be applied safely

-- 
Cordialement,

               ///
              (. .)
  --------ooO--(_)--Ooo--------
|      Philippe Poulard       |
  -----------------------------
  http://reflex.gforge.inria.fr/
        Have the RefleX !


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member