[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Binary-encoding of XML for communication
I looked at the WAP spec, and the subsequent comments on this list and have concluded: 1) Yeah, a binary spec for XML is a cool idea after all. If nothing else, we can probably parse a binary rep faster than we can parse text. 2) The WAP spec does not seem to have any guiding principle for making the transition from text to binary. In particular, tossing out comments with the bathwater is a strange choice. Also, giving themselves a special set of enumerations for DTDs is politically curious. 3) I wouldn't be surprised if a document encoded in their binary format ended up *bigger* than a text XML doc rammed thru zlib (their use of octets and 32-bit integers is going to lead to lots and lots of 0 bits). Is LZ decompression a problem in embedded devices or something? 4) This spec is a lot closer to a network protocol than it is to the XML spec, and, IMHO, it should be an IETF RFC, not a W3C Rec. Anyone agree? I propose we small-fry developers could do the following: A) Decide *why* we want a binary XML spec, including rationale for that decision B) Produce an elegant spec and a reference implementation in C and java C) Use IETF or a similarly open forum to promulgate it I'm willing to step up to take the lead on this, although I'd happily back off and let someone else take the reigns. I think this can help with both download size and startup time issues with my company's product, so I'm motivated to work on it. With your permission, I'll take a crack at step A (using my best approximation of the funny language of specs): <Preamble> The binary XML format specification, hereafter referred to as XML-bin is required to reduce the transmission size of XML documents, to speed processing of those documents, and to reduce the size and complexity of XML parser software. (For purposes of this specification, the existing XML specification will be referred to as XML-text.) The XML-bin format specification shall be a lossless encoding of a textual XML document. That is, a document can be translated from XML-text to XML-bin and back an arbitrary number of times, and no information content will be lost. Information content, in this sense, excludes those properties of the text which are defined as "insignificant white space" in the XML specification [anything else we need to exclude here?]. <Rationale> The motivation for adjusting the machine representation of XML should be expressed in the terms of computing machinery. Allowing this effort to attempt to change the rules of what should be in an XML document (e.g., the WAP attempt to banish comments), or to fix some bigger issues (e.g., allowing more expressive DTDs) would doubtless interfere with acceptance of this specification as a standard. </Rationale> </Preamble> How's that? The obvious (to me, anyway) way to implement that is to choose a reasonable binary representation of a parse tree -- the way many programming language compilers store data between their front-and and back-end processes. Maybe a string table followed by a binary dump of a heap (a tree stored in a vector, for those of you who never took a data structures course), all rammed thru zlib to compress out common patterns. But before we decide on the implementation, we need to reach consensus on the motivation. Did I capture it? -Joshua Smith xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|