[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Eric Bréchemier" <eric.brechemier@g...>
  • To: "Philippe Poulard" <philippe.poulard@s...>
  • Date: Fri, 21 Sep 2007 11:10:30 +0200

  Hello Philippe,

On 9/21/07, Philippe Poulard wrote:
> Consider this declaration :
> <?kzy rapbqvat="EBG-13"?>
>
> The deecoded form of this declaration is :
> <?xml encoding="ROT-13"?>
> I can get it only if I test "ROT-13" on it ; althouh it is not strictly
> spoken an encoding, a parser that would support "ROT-13" would be able
> to decode it only if it test it or if it recognize the magic ASCII
> string "<?kzy" or whatever is the "guess" heuristic.
>

I think you found an interesting example of ambiguous encoding,
falling in the category of "Character encodings such as UTF-7 that
make overloaded usage of ASCII-valued bytes" which "may fail to be
reliably detected." as mentioned in
http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info

In this example, without external information, there is no way IMHO to
know for sure with limited input whether this is a ROT-13 encoding, or
an UTF-8 document starting with the "kzy" processing instruction.

Best Regards,

Eric Bréchemier


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member