On 2010 Dec 9, at 17:19, Norman Gray wrote: > I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints. The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern. > > The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them. I can put this more compactly, I think. * The current XML spec is at present largely defined in terms of codepoints. * Thus it effectively bakes a UTF-8 to codepoint shim into the standard, even though it doesn't _really_ seem to need to do so. * The main location in the current XML standard where UTF-8 is mentioned repeatedly is in the discussion of entities (including, obviously, the document entity). That seems ripe for simplification along the lines above, especially if there's talk of entities being simplified away. Norman -- Norman Gray : http://nxg.me.uk
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format