[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Is it time for the binary XML permathread to start up agai
> -----Original Message----- > From: noah_mendelsohn@u... [mailto:xml-dev@l...] > Sent: Friday, July 20, 2007 12:44 > To: Alexander Philippou > Cc: 'Costello, Roger L.'; xml-dev@l... > Subject: RE: Is it time for the binary XML > permathread to start up again? > > Alexander Philippou writes: > > > And since the processing penalty of compression is > proportional to doc > size, > > Yes, typically, at least to a first approximation (actually, some > compression algorithms do a bit better on large documents, to > the extent > that the overhead of building dictionaries of commonly used > terms gets > done toward the beginning, and leveraged throughout). > > > using FI instead of text makes sense even when doing http+gzip. > > To the extent FI itself compresses, that's surprising. I'm not > disagreeing that gzip might run faster on the FI form than on > the larger > text form; I'm surprised that size(gzip(FI)) << size(FI). > You wouldn't > expect compression systems like gzip to do well on things > that are already > tightly coded. Fast Infoset doesn't try to be extremely tightly coded. We tried to find a good balance between ease of implementation, encoding/decoding speed, and compactness. So there is still room for gzip to remove some of the residual redundancy. Alessandro Triglia > On the contrary, many compression algorithms > will actually > somewhat expand things that are already compressed using > other algorithms. > Basically, compression algorithms take a gamble that they > can recognize > some form(s) of redundancy and get them out. If the input > doesn't have > redundancy in such forms, then you tend to wind up at best > restating the > input, plus a bit of overhead for the compression framework itself. > > If gzip is going to make the FI form larger, or not much > smaller, then > it's a bad use of time to run it, even if the time to gzip the FI is > indeed much lower than the time to gzip the original text. > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|