Re: Question about UTF-8
Gustaf Liljegren wrote: > But the question is about general > ('non-XML-aware') text editors. A general editor has no idea of the > encoding detection mechanism in XML, so I wonder how it knows that the > octets C3 A4 should be written 'ä' and not 'Ã¤' (or something else). It really has no way of knowing, in theory or in practice. This is a big hairy problem. If you're living in a heterogeneous environment where there are multiple encodings, this a good reason to insist on XML. > Many users who see 'Ã¤' when they open a UTF-8 encoded XML document in a > text editor, prefer to use ISO 8859-1 to avoid this effect. That only works until you need to use a character that isn't in 8859-1, such as those used by about two thirds of the world's population. > Maybe the answer is to stay in ISO 8859-1 (or whatever default encoding the > editor has), but I was hoping it was possible to recommend using UTF-8 all > the time (for European scripts). The notion that you can count on never seeing non-European characters is a recipe for disaster in today's world. Good solutions are: (a) as you suggest, use UTF-8 all the time, or (b) use XML for interchange. -- Cheers, Tim Bray (ongoing fragmented essay: http://www.tbray.org/ongoing/)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format