[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Parsing efficiency? - why not 'compile'????


compile xml
Tahir Hashmi wrote:
> Robin Berjon wrote:
> In the first group, there could be a subgroup that doesn't need binary
> markup but may use it simply because it can, without affecting the way
> its applications work. That's the group that doesn't need human
> read/write-ability for its XML docs - the group of WYSIWYG Office
> suites, XML-based instant messaging protocols and so on.

I would quite seriously oppose using binary infosets when you don't need them. 
It adds to the complexity of the system and removes a variety of features of 
XML. Office suites can (and in fact do) use zip (if only because it doubles as a 
packaging format with is very convenient for attached files such as images). XML 
IM either needs binary infosets for performance reasons, or doesn't and 
shouldn't use it.

> Consider this: the application is only interested in strings for date
> but the schema designer specified a date type because it is the Right
> Thing(TM) for a date (so that the schema need not be changed if at some
> point of time the same application or another application does get
> interested in the value).
> 
> In a binary representation, the processor will decode the variable
> length binary value to arrive at the number of seconds since epoch,
> then re-construct a string for the application. Note that the
> processor will be *synthesizing* a string that could be read straight
> off the document.
> 
> This approach would be better only if the benefits of saved bandwidth
> are greater than the cost of synthesizing the date string. And we
> can't assume that limited bandwidth is *always* going to be the
> motivating factor for using binary markup.

That's why in BinXML you can specify how you encode your data. In the case you 
cite one would simply ask that the xs:fooDate type use the UTF-8 codec.

> The particular example I gave is illustrative only and as stated
> earlier, I'm not against type-awareness. I'm simply being wary of how
> much flexibility might possibily be lost, and in some cases
> computation be wasted, in the quest of a super-optimized binary
> encoding.

Again, if you don't want something encoded just ask the application to not touch 
it :)

>>As for your remark on the speed of decompaction, note that you may be right for 
>>a naive implementation of the same thing but there's compsci literature out 
>>there on making such tasks fast.
> 
> Well yes, naivete may lead to bad design. The point is that more the
> logic that goes into decoding a format, the higher the bar for small
> devices is raised. While one can have small non-validating SAX parsers
> for XML, the size of a binary format parser may go up since it would
> have to know about synthesizing dates from integers, deducing document
> structure from the schema etc, besides the indispensible passing of
> strings around. The encoding scheme should require least possible
> context information and minimal parsing logic to be accessible
> there. Hope I'm able to explain myself better this time!

It all depends on what you need. I totally agree that there is no 
one-size-fits-all but I do believe that it is very much possible to produce a 
flexible format that can be configured in a variety of ways, without it loosing 
internal coherence. If you want a tiny and ultra fast decoder you can drop 
support for encoding of the more complex types, if you want a slightly larger 
decoder but the smallest possible payload you add codecs to encode the content 
optimally.

-- 
Robin Berjon <robin.berjon@e...>
Research Engineer, Expway        http://expway.fr/
7FC0 6F5F D864 EFB8 08CE  8E74 58E6 D5DB 4889 2488


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.