RE: MSXML DOM Special Chars Less Than 32
Hmmm .... On Sat, 2002-03-23 at 10:59, Julian Reschke wrote: > > From: Michael Kay [mailto:michael.h.kay@n...] > > Sent: Saturday, March 23, 2002 3:19 PM > > To: 'Joshua Allen'; 'Rick Jelliffe'; xml-dev@l... > > Subject: RE: MSXML DOM Special Chars Less Than 32 > > > > > > > Why would someone want to use XML if they need to transmit illegal > > > characters? > > > > A: "I want to replicate my WebDAV configuration. I want to do > > this by encoding all the WebDAV properties in an XML file and > > transmitting that over the network". > > > > B: "You can't represent WebDAV properties in XML, because they > > can contain characters that XML doesn't support" > > Actually, I'd rephrase that as: > > When defined as "plain text", a WebDAV property *by definition* can't have > values outside the allowed XML character range. Where we (WebDAV server > developers) get in trouble is when in reality, the WebDAV server is just a > protocol adapter to some kind of back end system, which is NOT XML-based. > > Inevitably, we'll have to find an escaping format which is XML 1.0 > compliant, cheap and generally accepted. As this problem happens with > XML-RPC and SOAP as well, it would be nice to have a single, widely accepted > solution... > > Some of the requirements are: > > - the format of strings that *can* be represented as XML characters doesn't > change > - non-XML characters must be ignored by implementations not knowing the > escaping mechanism There is some point in this, in terms of using XML for transport of unpredictably reliable legacy data. Neither of the XML escape mechanisms can carry this information. I think this is probably a good thing. Note that it isn't only XML that makes such a restriction: most text-oriented network protocols over TCP carry headers, which are typically defined per SMTP or SMTP+MIME, meaning that the header names have an even more restricted set of characters (subset of US-ASCII), and so do header values (larger subset, still US-ASCII). Header values, though, permit at least two forms of escaping worth investigation: quoted-printable and encoded-word. Now, is it an XML problem or an application problem? If it is regarded as an XML problem, then XML could define a form of escape, similar to one of the above, perhaps, which would allow such encoding. Since the unicode escape mechanism already exists, and would simply have to be required to carry C0, that could be used. In my opinion, it's a bad idea. I ought to be able to treat text as text; XML is text (anyone else old enough to have gotten a VT100 escape codes mail bomb?). In short, the C0 characters have no universal interpretation; interpretation depends upon the application. It seems reasonable, then, that the application can encode the bloody things too. Can't use XML mechanisms. Base64, the usual suggestion, incurs an immense overhead. So, define an empty-restricted xsd:string type, app:quoted-printable or app:encoded-word. Adopt and adapt existing algorithms for those encodings. If you're not using schemata, adopt the usage of xsi:type="app:quoted-printable". That doesn't help for attribute values, but it does address elements. Encoded-word seems somehow more appropriate for attribute values anyway. Application encodes and decodes, using a set of characters even more strongly limited than XML's, and indicating need via schema or the in-line xsi:type indicator or prior agreement per-element and per-attribute. No? Amy! -- Amelia A. Lewis amyzing@t... alicorn@m... "How does one hate a country, or love one? ... What is love of one's country; is it hate of one's uncountry? Then it's not a good thing. Is it simply self-love? That's a good thing, but one mustn't make a virtue of it, or a profession." -- Therem Harth rem ir Estraven
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format