[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Pete Kirkham" <mach.elf@g...>
  • To: xml-dev@l...
  • Date: Mon, 3 Sep 2007 01:07:19 +0100

On 02/09/07, G. Ken Holman <gkholman@c...> wrote:
> >Notepad doesn't understand UTF-8 encoded files.
>
> False ... I just opened Notepad and wrote out a file using UTF-8 and
> opened it up again and it was preserved.  An XML processor read the
> file and didn't complain about the encoding.  I'm running XP.
If you save as UTF-8 from notepad, it adds a BOM (EF BB BF) which will
let it recognise it as UTF-8 in future, but which isn't recognised by
some XML parsers, such as the default one shipped with Java 1.4
(Crimson). See http://lists.xml.org/archives/xml-dev/200106/msg00358.html
for discussion whether XML should be changed to make such files legal
XML. If you save as UTF-8 from other editors, they often don't add the
BOM and if you open such UTF-8 files in Notepad it doesn't deduce it's
UTF-8 (which there isn't an easy way to do). So notepad isn't able to
produce files which can be processed by some UTF-8 compliant
applications, including spec complient XML parsers, and is not able to
process UTF-8 encoded files created by some other applications. The
same applies to the UTF-8 encoding used by the .net XML writer - it
adds a BOM, which confuses applications expecting UTF-8 encoded XML to
start with '<' or whitespace.

I got the codepoint wrong for the curly quotes.


Pete


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member