[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: CDATA Section
Any ideas? Well, sort of... We have two or perhaps three problems when trying to transfer binary data within XML elements; 1) Transfer should not break XML's character set restrictions 2) Transfer should be economic of bandwidth 3) The encoding/decoding should be cognisant of the character encoding in use 1 is a given; it's why this discussion got started. 2 and 3 are less so and more of a desire on my part to see an elegant and economic solution that lies within XML itself. For instance, BASE64 is _OK_ when encoding binary data into 8-bit characters but what happens when we have 16-bit Unicode characters? The (in)efficiency goes from 133% to 266%. If the encoder knew the character set in use, it could send data more efficiently without relying on a compression algorithm. In fact, XML declares only 29 of the 256 8-bit characters to be illegal so we could pass the 227 legal ones straight through and only escape the remaining 29. Under Unicode we have 63457 legal and 2079 illegal characters. Burying this inside of the XML parser (which does have full knowledge of the character encoding) would make it invisible to end-users. Attributes in the XML namespace could control the encode/decode behavior. In fact, this corresponds with the concept outlined in the latest XML Schemas specification (Part 2, Section 3.2.9) where they talk about an encoding facet**. Assuming a totally random input stream, the inefficiencies drop to 111% for 8-bit characters and only 109% for 16-bit characters, ignoring any overhead in Controlling XML attributes and assuming a 1 : 2 expansion ratio for illegal characters. With this sort of overhead, it now makes economic sense to compress large files, then apply this sort of encoding without worrying too much about the gain of the compression being lost in the encoding. Seen XMLZIP? I'd put in more but I don't want to post too much in one go and besides, I'm sure this sort of thing must've been proposed before (but then, why are we still stuck with using base64 or &#xx; to send binary data?) ** Couldn't this be the URL of a translation service? Just send it the stream to en/decode or request the applet - A Web-centric application as proposed by Tim O'Reilly recently? Regards, Simon Gordon Systems Engineer, Systems Integration, Galileo International, Denver, USA. -----Original Message----- From: John Evdemon [mailto:JohnE01@x...] Sent: Wednesday, October 27, 1999 11:15 To: Gordon, Simon; xml-dev@i... Subject: RE: CDATA Section I've run into a similar issue with CDATA, although we were transferring mainframe reports within XML, not binary data. The suggested workaround was base64 -- I would love to see something more elegant. Any ideas? John Evdemon Architect XML Solutions http://www.xmls.com -----Original Message----- From: owner-xml-dev@i... [mailto:owner-xml-dev@i...]On Behalf Of Gordon, Simon Sent: Wednesday, October 27, 1999 11:48 AM To: xml-dev@i... Subject: RE: CDATA Section Thanks for the info. I guess I was trying to mix the ATTLIST and ELEMENT syntax. The !ATTLIST declaration allows CDATA but then doesn't seem to use it in the same way as when you specify <!ELEMENT x (#PCDATA)> then use <![CDATA[...]]> in the XML. Most confusing. Now for the next question; binary data and CDATA sections. According to Tim Bray's annotated XML spec., I can use a CDATA section to send binary data yet I can't get it to work. [The annotation link is the last in the first paragraph of section 2.7 CDATA Sections]. This (IMHO) seems to contradict the XML Spec. where it defines the CDATA data to consist of chars which is further defined as consisting of TAB, CR, LF and 0x20-..etc. That is, not all possible binary values. I've checked this using the RXP parser (good, very good) and it does reject any values outside the defined ranges. Big disappointment. Now I'll have to look at using base64 to exchange binary data with our vendors; we'll all have to implement an encoding/decoding scheme, probably involving attributes and NOTATIONs and a load more work. Unless anyone has a better idea? Regards, Simon Gordon -----Original Message----- From: John Cowan [mailto:cowan@l...] Sent: Wednesday, October 20, 1999 13:23 To: Simon.Gordon@s... Cc: xml-dev@i... Subject: Re: CDATA Section Gordon, Simon scripsit: > > <?xml version='1.0'?> > > <!DOCTYPE TEST [ > > <!ELEMENT TEST (CDATA)> The problem is with this line, which says that TEST elements must contain a single *element* whose name is "CDATA". This has nothing to do with CDATA sections. > > Warning: CDATA section not allowed here > > in unnamed entity at line 5 char 16 of file:test.xml Quite right; your document is invalid because your TEST element contains character data instead of an element named CDATA. > > PS. Just tried <!ELEMENT TEST {#PCDATA)> and removing the DOCTYPE section > > altogether. The former passes the validation test and the latter passes > > the well-formedness test (rxp -xs test.xml)! And rightly so. You cannot *compel* the content of an element to be a CDATA section or not by using DTD-based validation. Elements declared with #PCDATA content may express that content with or without CDATA sections. -- John Cowan cowan@c... I am a member of a civilization. --David Brin xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|