[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: "Binary XML" proposals
On Tue, 10 Apr 2001, Tim Bray wrote: > So, Sean may have used strong language, but in point of fact > he was correct, so it's forgivable. Get some data on how > much space and time a binary representation will save, then > you'll be able to make intelligent quantitative decisions > on where it's worthwhile deploying it. Well, the encoding I am considering will fit a document into a number of bytes that can be calculated thus: 1) Count the number of discrete namespace URIs, attribute names, PI targets, and element names in the document. The same element name under two different namespaces counts as the *same* element name for this purpose. Add the number of bytes (UTF-8) in all of these names (don't include namespace prefixes on names), plus two per name (one for the byte tag saying "this is a symbol def", one for the NUL terminator). 2) Count the number of processing instructions. For each PI, allocate seven bytes (tag + 16 bit symbol number for PI target name + 32 bit content length) plus the number of bytes required to encode the string inside the PI. 3) Count the number of start-elements. Allocate five bytes each (1 byte tag, 16 bit namespace symbol ID, 16 bit element name ID). 4) Count the number of end-elements. Allocate a byte each. 5) Count the number of spans of CDATA, including whitespace (for now we'll assume all whitespace is significant rather than looking in DTDs of DSLs). Allocate five bytes (tag byte + 32 bit length) plus the length of the data (expand all character entity references to UTF-8!) per CDATA. 6) Count the number of attributes, and allocate for each one byte tag, 16 bits of namespace ID, 16 bits of name ID, 32 bits of length, and then the size of the string in UTF-8 I won't bother with the rules for entities for now... > Until then, it's just amusing speculation. -Tim Everything has to start with speculation :-) But as things stand there are numerous proprietary or domain-specific binary XML hacks appearing, presumably because people feel that text-encoded XML is not efficient enough. Even if they are wrong, it would be good to offer a lightning conductor for that wrongness in a standardised binary encoding with a decent and widely available set of tools rather than having it proliferate behind the skirting boards, no? ABS -- Alaric B. Snell http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/ Any sufficiently advanced technology can be emulated in software
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|