|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Supporting Unicode (was Some comments on the 1.1 dr aft)
On Thu, Dec 20, 2001 at 04:59:55PM +1100, Rob Griffin wrote: > Wasn't one of the design goals of XML to be human readable? > > How do I do that? I display the document on my screen, > or I print it out. Surely having the least number of control > characters in the document makes that more readily achieveable. > I don't want to have to use a hex editor to see the 'real' > contents of a document. Nor have my printer go ballistic > or print blocks in place of control characters. One of the arguments that was put forward was that for higher level applications that have string fields somewhere in them, but must be able to store the *occasional* control character (for good or bad reasons - or simply something was wrong originally). For example, I have an application where a human can type in a string from a keyboard. Somehow they accidentally typed in a ^T. The application they were using did not detect it as an error. So the string contains an ugly ^T. It would appear logical to encoding the application string as PCDATA. It makes the document more readable. Almost all of the data is normal text. However, because the original application did not enforce exactly the same constraints as XML as to what characters are legal, its not safe to put it in an XML document without always base64 encoding it. This potentially means any automatically constructed XML document should use base64 encoding for text. If it does not, then it may fail to capture the original content or abort with an error. Some people think this is good. I think it is bad. (Note that if the bogus character was a tab, it would be accepted hapily - so XML is only providing partial protection anyway.) Why bad? The main argument that I have heard (and has real merit by the way) is that other systems might not be able to handle the control characters or might do something weird with them. To me, excluding selected values from an XML document is XML doing *exactly* the same thing to everyone else who wants to use XML. Its putting limits on them, not because *XML* cannot handle it, but because XML *chooses* to limit them. The result that a core and fundamental standard imposes limits on all the layers that want to be put on top. So any data that did not originally come from XML almost certainly needs to be base64 encoded if its to be put into an XML document. It could be a book title, part number, phone number, etc. If this did not occur, then strange unexpected and cryptic errors would be reported by subsystems that tried to use, for example, SOAP to send a phone number. I do agree that putting control characters directly into an XML document is bad form and is likely to break other tools. Ok, its bad practice. Don't do it (use &#n; instead). But I feel that not allowing all code points in an XML document will *increase* the overall problem for other systems. I think we want to reduce the number of standards, not increase them, so encouraging people to invent another standard instead of 'muddying' XML I think is a mistake. I think there is going to be layer upon layer of systems, protocols, information etc with XML at the core. Having the core impose limits can cause problems for every layer on top. And (I claim) the limits are not there because XML needs limits, but because XML thinks its doing the rest of the world a service by limiting them. And thats where it gets hard. Philisophically, is it better to stop people from doing things that might be wrong or better to allow people to do more things and wear the responsibility if it was wrong? Ummmm..... Alan
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








