[Home] [By Thread] [By Date] [Recent Entries]
On 02/09/07, G. Ken Holman <gkholman@c...> wrote: > >Notepad doesn't understand UTF-8 encoded files. > > False ... I just opened Notepad and wrote out a file using UTF-8 and > opened it up again and it was preserved. An XML processor read the > file and didn't complain about the encoding. I'm running XP. If you save as UTF-8 from notepad, it adds a BOM (EF BB BF) which will let it recognise it as UTF-8 in future, but which isn't recognised by some XML parsers, such as the default one shipped with Java 1.4 (Crimson). See http://lists.xml.org/archives/xml-dev/200106/msg00358.html for discussion whether XML should be changed to make such files legal XML. If you save as UTF-8 from other editors, they often don't add the BOM and if you open such UTF-8 files in Notepad it doesn't deduce it's UTF-8 (which there isn't an easy way to do). So notepad isn't able to produce files which can be processed by some UTF-8 compliant applications, including spec complient XML parsers, and is not able to process UTF-8 encoded files created by some other applications. The same applies to the UTF-8 encoding used by the .net XML writer - it adds a BOM, which confuses applications expecting UTF-8 encoded XML to start with '<' or whitespace. I got the codepoint wrong for the curly quotes. Pete
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



