[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: "Binary XML" proposals

  • From: Al Snell <alaric@a...>
  • To: Tim Bray <tbray@t...>
  • Date: Tue, 10 Apr 2001 17:14:07 +0100 (BST)

byte to number xml
On Tue, 10 Apr 2001, Tim Bray wrote:

> So, Sean may have used strong language, but in point of fact
> he was correct, so it's forgivable.  Get some data on how
> much space and time a binary representation will save, then
> you'll be able to make intelligent quantitative decisions 
> on where it's worthwhile deploying it.

Well, the encoding I am considering will fit a document into a number of
bytes that can be calculated thus:

1) Count the number of discrete namespace URIs, attribute names, PI
targets, and element names in the document. The same element name under
two different namespaces counts as the *same* element name for this
purpose. Add the number of bytes (UTF-8) in all of these names (don't
include namespace prefixes on names), plus two per name (one for the byte
tag saying "this is a symbol def", one for the NUL terminator).

2) Count the number of processing instructions. For each PI, allocate
seven bytes (tag + 16 bit symbol number for PI target name + 32 bit
content length) plus the number of bytes required to encode the string
inside the PI.

3) Count the number of start-elements. Allocate five bytes each (1 byte
tag, 16 bit namespace symbol ID, 16 bit element name ID).

4) Count the number of end-elements. Allocate a byte each.

5) Count the number of spans of CDATA, including whitespace (for now we'll
assume all whitespace is significant rather than looking in DTDs of
DSLs). Allocate five bytes (tag byte + 32 bit length) plus the length of
the data (expand all character entity references to UTF-8!) per CDATA.

6) Count the number of attributes, and allocate for each one byte tag, 16
bits of namespace ID, 16 bits of name ID, 32 bits of length, and then the
size of the string in UTF-8

I won't bother with the rules for entities for now...

> Until then, it's just amusing speculation.  -Tim

Everything has to start with speculation :-) But as things stand there are
numerous proprietary or domain-specific binary XML hacks appearing,
presumably because people feel that text-encoded XML is not efficient
enough. Even if they are wrong, it would be good to offer a lightning
conductor for that wrongness in a standardised binary encoding with a
decent and widely available set of tools rather than having it proliferate
behind the skirting boards, no?

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software  


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.