[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Parsing efficiency? - why not 'compile'????


tahir hashmi
On Thu, 27 Feb 2003 08:53:47 +0000
Alaric Snell wrote:

> On Wednesday 26 February 2003 09:52, Tahir Hashmi wrote:
> 
> > # Tight coupling between schema revisions:
> >
> >   XML is quite resilient to changes in the schema as long as the
> >   changes are done smartly enough to allow old documents to pass
> >   validation through the new schema. This flexibility would be
> >   restricted the greater is the dependence of the binary encoding on
> >   the schema.
> 
> That's not a problem in practice, I think. Say we have a format that works by 
> storing a dictionary of element and attribute names at the beginning of the 
> document (or distributed through it, whenever the name is first encountered, 
> or whatever) and that stores element and attribute text content as a compact 
> binary representation of the type declared in the schema, including a few 
> bits of type declaration in the header for each value.

That's alright, but a per-document data dictionary wouldn't be
suitable for a server dishing out large numbers of very small
documents due to the space overhead. Secondly, the encoder/decoder
will have to build a lookup table in memory for every document. A long
running application loses the opportunity to cache the lookup table in
some high-speed memory and has to go through the process of building
and tearing down lookup tables frequently. That's the reason why I
prefer data dictionaries per _document_type_ since often an instance
of application would deal with a limited set of document types.

> And in this scheme, the encoder is just using the schema as hints on what 
> information it can discard for efficiency. If the schema says that 
> something's an integer, it can drop all aspects of it apart from the integer 
> value by encoding it is a binary number. But if the schema's constriction 
> widens that integer field into an arbitrary string, then it can start 
> encoding as arbitrary strings.

... and the decoder recognizes some fundamental data types which it
can read without referring to the schema - I like this approach :-)

> >   With schema-based compaction done in all the aggressiveness
> >   possible, how much would be gained against a simple markup
> >   binarization scheme? Perhaps a compaction factor of, say, 5 over
> >   XML. Would this be really significant when compared to a factor of,
> >   say, 4 compaction achieved by markup binarization? This is an
> >   optimization issue - the smaller the binary scheme, the more
> >   computation required to extract information out of it. I'm not
> >   totally against a type-aware encoding but for a standard binary
> >   encoding to evolve, it would have to be in a "sweet spot" on the
> >   size vs. computation vs. generality plane.
> 
> Robin was quoting better numbers than these factors of 4 or 5... But even 
> then, I think a bandwidth-limited company would be happy to do a relatively 
> zero-cost upgrade away from textual XML in order to get a fivefold increase 
> in capacity :-)

Exactly! That's what I want to emphasize. The numbers 4 and 5 are not
significant, what's significant is the difference between them. I'd
favour a slightly sub-optimal encoding that's (ideally) as flexible as
XML rather than one which becomes inflexible just to improve a little
more on what's already a significant improvement.

--
Tahir Hashmi (VSE, NCST)
http://staff.ncst.ernet.in/tahir
tahir AT ncst DOT ernet DOT in

We, the rest of humanity, wish GNU luck and Godspeed

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.