RE: Unix/Java design issues (Was: Re: Is CDATA "structure"?)
From: Arthur Rother [mailto:arthur.rother@o...] > For sure the LF vs CRLF and CR in theory (the spec) and for viewing in > Notepad is all correctly debated or noted, but pragmatically, > does this > really provide a problem? The encoding for XML is UTF-8. So > in allmost any > text viewer/editor, in normal(?) circumstances it will show strange in > these applications, since they do not understand UTF-8 (in windows). > The API on XML, for example DOM, is also UTF-8, which most > applications may > treat as 7-bit ASCII, but for encoding generic applications > this should be > treated as UTF-8. Windows is not UTF-8 aware, so it has to be > converted to > Unicode anyway. <snip/> Actually, the XML spec states in section 2.2 that "A character is an atomic unit of text as specified by ISO/IEC 10646" - in other words, Unicode. Since there are different ways of storing Unicode characters, XML processors are allowed to accept Unicode in any of these formats, and it even goes on to state that "all XML processors <em>must</em> accept the UTF-8 and UTF-16 encodings of 10646" (emphasis added), since [I believe] UTF-8 and UTF-16 are the most common ways to store Unicode characters. <aside>XML processors are also allowed to <em>accept</em> data in any other encoding they want as well, as long as the data is converted to Unicode. At least, I believe that's how Microsoft reads the spec, because I had to study this crap at great length and talk to Microsoft many times for my multi-lingual application. :-) </aside> Windows NT is perfectly Unicode aware, and I routinely view XML documents in Notepad on my NT box. All of the characters are fine, with the only problem being the LF-CRLF-CR problem that started this thread in the first place. I am 87% sure that Windows 95 uses the windows-1250 or windows-1252 character set internally, although it may also have some level of Unicode awareness. (I'm not sure about that.) And I haven't the faintest idea what character set Windows 98 uses natively, although I'd like to hope that it's Unicode. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format