[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Some comments on the 1.1 draft
Rick, I don't have a strong opinion on the name encoding (since our products and SQLX already use an encoding that is a valid 1.0 name). I don't understand your encoding issues though. I am mainly talking about the Unicode code points. If somebody uses an encoding where U+85 is not a valid character, then it should error. If it is a valid character but not the intended Unicode character, then it is an error that a parser may not be able to detect (we certainly can get this even in XML 1.0). I can assure you that the database community has even more encoding support than many XML processors (look up collations). Best regards Michael > -----Original Message----- > From: Rick Jelliffe [mailto:ricko@a...] > Sent: Tuesday, December 18, 2001 23:03 PM > To: xml-dev@l... > Subject: Re: Some comments on the 1.1 draft > > From: "Michael Rys" <mrys@m...> > > > Well, that may have been the original XML 1.0 use, but looking at where > > XML is currently having the most traction (SOAP, Messaging, WebDav, > > database serialization etc), this has changed. > > One big advantage of disallowing control characters from XML documents > and silly characters from XML names is that it catches most common > encoding errors. > > For example, the very common problem of data labelled ISO 8859-1 > containing > a 0x85 byte (for the Euro character). > > At the moment XML provides the only disiplined point in the processing > chain: > when data is in XML one *must* have the encoding correct. This may > cause some consternation to us programmers, who perhaps have lived in a > fool's > paradise where encoding does not matter, but it is a fundamental point > of Quality Control for XML documents and exposes data corruption at the > point > where it can be corrected. > > To allow control characters would make us sink back into the horrible mess > that everyone familiar with working in multi-character set environments > without > XML is well aware (or, at least, becomes well aware when everything comes > crashing down). > > Most DBMS systems do not perform any checking of encoding. So you > can store almost anything in, say, a DBMS expecting ISO 8859-1. With > a world full of data incorrectly labelled, there is no chance of good > interoperability without some basic checking. And those basic checks > are what XML's data character and naming rules provide. > > Without them, sure XML would be "simpler" and we could attempt to transmit > arbitrary strings around. But then encoding detection or repair would be > the problem of the recipient and not the sender: a responsible recipient > can have no faith that their non-ASCII data has not been corrupted. > > And that lies at the heart of the matter: if we allow control characters > and silly name characters, we won't actually increase the number of > characters that can be reliable sent: we will just make non-ASCII > characters suspect and unreliable. > > Cheers > Rick Jelliffe > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl>
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|