|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Parsing efficiency? - why not 'compile'????
On Thu, 27 Feb 2003 08:53:47 +0000 Alaric Snell wrote: > On Wednesday 26 February 2003 09:52, Tahir Hashmi wrote: > > > # Tight coupling between schema revisions: > > > > XML is quite resilient to changes in the schema as long as the > > changes are done smartly enough to allow old documents to pass > > validation through the new schema. This flexibility would be > > restricted the greater is the dependence of the binary encoding on > > the schema. > > That's not a problem in practice, I think. Say we have a format that works by > storing a dictionary of element and attribute names at the beginning of the > document (or distributed through it, whenever the name is first encountered, > or whatever) and that stores element and attribute text content as a compact > binary representation of the type declared in the schema, including a few > bits of type declaration in the header for each value. That's alright, but a per-document data dictionary wouldn't be suitable for a server dishing out large numbers of very small documents due to the space overhead. Secondly, the encoder/decoder will have to build a lookup table in memory for every document. A long running application loses the opportunity to cache the lookup table in some high-speed memory and has to go through the process of building and tearing down lookup tables frequently. That's the reason why I prefer data dictionaries per _document_type_ since often an instance of application would deal with a limited set of document types. > And in this scheme, the encoder is just using the schema as hints on what > information it can discard for efficiency. If the schema says that > something's an integer, it can drop all aspects of it apart from the integer > value by encoding it is a binary number. But if the schema's constriction > widens that integer field into an arbitrary string, then it can start > encoding as arbitrary strings. ... and the decoder recognizes some fundamental data types which it can read without referring to the schema - I like this approach :-) > > With schema-based compaction done in all the aggressiveness > > possible, how much would be gained against a simple markup > > binarization scheme? Perhaps a compaction factor of, say, 5 over > > XML. Would this be really significant when compared to a factor of, > > say, 4 compaction achieved by markup binarization? This is an > > optimization issue - the smaller the binary scheme, the more > > computation required to extract information out of it. I'm not > > totally against a type-aware encoding but for a standard binary > > encoding to evolve, it would have to be in a "sweet spot" on the > > size vs. computation vs. generality plane. > > Robin was quoting better numbers than these factors of 4 or 5... But even > then, I think a bandwidth-limited company would be happy to do a relatively > zero-cost upgrade away from textual XML in order to get a fivefold increase > in capacity :-) Exactly! That's what I want to emphasize. The numbers 4 and 5 are not significant, what's significant is the difference between them. I'd favour a slightly sub-optimal encoding that's (ideally) as flexible as XML rather than one which becomes inflexible just to improve a little more on what's already a significant improvement. -- Tahir Hashmi (VSE, NCST) http://staff.ncst.ernet.in/tahir tahir AT ncst DOT ernet DOT in We, the rest of humanity, wish GNU luck and Godspeed
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








