[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Is it time for the binary XML permathread to start up agai

  • From: Alessandro Triglia <sandro@m...>
  • To: xml-dev@l..., noah_mendelsohn@u...
  • Date: Fri, 20 Jul 2007 20:25:10 +0200

RE:  Is it time for the binary XML permathread to start up agai
 

> -----Original Message-----
> From: noah_mendelsohn@u... [mailto:xml-dev@l...] 
> Sent: Friday, July 20, 2007 12:44
> To: Alexander Philippou
> Cc: 'Costello, Roger L.'; xml-dev@l...
> Subject: RE:  Is it time for the binary XML 
> permathread to start up again?
> 
> Alexander Philippou writes:
> 
> > And since the processing penalty of compression is 
> proportional to doc
> size,
> 
> Yes, typically, at least to a first approximation (actually, some 
> compression algorithms do a bit better on large documents, to 
> the extent 
> that the overhead of building dictionaries of commonly used 
> terms gets 
> done toward the beginning, and leveraged throughout).
> 
> > using FI instead of text makes sense even when doing http+gzip.
> 
> To the extent FI itself compresses, that's surprising.  I'm not 
> disagreeing that gzip might run faster on the FI form than on 
> the larger 
> text form;  I'm surprised that size(gzip(FI)) << size(FI).  
> You wouldn't 
> expect compression systems like gzip to do well on things 
> that are already 
> tightly coded.  


Fast Infoset doesn't try to be extremely tightly coded.  We tried to find a good balance between ease of implementation, encoding/decoding speed, and compactness.  So there is still room for gzip to remove some of the residual redundancy.

Alessandro Triglia


> On the contrary, many compression algorithms 
> will actually 
> somewhat expand things that are already compressed using 
> other algorithms. 
>  Basically, compression algorithms take a gamble that they 
> can recognize 
> some form(s) of redundancy and get them out.  If the input 
> doesn't have 
> redundancy in such forms, then you tend to wind up at best 
> restating the 
> input, plus a bit of overhead for the compression framework itself. 
> 
> If gzip is going to make the FI form larger, or not much 
> smaller, then 
> it's a bad use of time to run it, even if the time to gzip the FI is 
> indeed much lower than the time to gzip the original text.
> 
> --------------------------------------
> Noah Mendelsohn 
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.