|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Request: Techniques for reducing the size of XML instances
>From: Al Snell [mailto:alaric@a...] >> Seconded. gzip is simple, fast, ubiquitous, standard, and gives you far >> better compression than any binary substitution scheme ever will. >Whoa! Crazy incorrect statements alert! My statements were neither crazy nor incorrect. You, sirrah, are a fanatic. gzip is simple, because it's included as a standard library in almost every language and platform these days. In Python or Java, for instance, you can open a gzip stream and treat it like any other stream. Nobody normally reimplements it from scratch, though any programmer worth the name should be able to do it from the spec; it's not rocket science, just a data structures and pattern matching problem. It's faster than almost any other compression scheme that gives anywhere near the same compression ratio. There are a few close contenders, but they fall down in the other factors, especially ubiquity. You can't beat "already installed" for ease of use. I did not state that it was cheap on memory; it's not, but it's also not a significant factor these days. RAM is dirt cheap by the MB now. These are not the '80s any more. Maybe it's time to upgrade your Apple IIe... >2) gzip won't necessarily give "far better compression". A >trivial way of >getting better compression than gzipping XML is to gzip binary XML. >deflate is great at compressing the text of content, and if it doesn't >have redundant whitespace and tag syntax to have to represent >it can shave >off a good few percent extra. Even better would be to take "deflate is great at compressing the text of content" - BUT MARKUP IS TEXT! ROTFL! Not to mention that you're suggesting doing TWO passes of encoding, which you somehow think will be faster than ONE pass; first replace all the tags with binary markers, then gzip, instead of just letting gzip find the repeated sequences on its own... *Which is what it's designed for*. I've done that very experiment, and got marginally better results out of gzip than I did from pre-encoding the XML. Binary XML is a dead end, obsolete before it was even started. Please stop wasting peoples' time on it. >Not that deflate is abad algorithm or zlib a bad >implementation, but the >bit twiddling and block searching required for LZ77 and >Huffman encoding >are awkward operations on von Neuman architectures that happen to have >byte or word aligned memory access... decoding is better on >the LZ77 front >(it's a heavily read-optimised algorithm), but the Huffman stuff still >requires bit shuffling because a Huffman data stream is inherently >bit-oriented! <shrug> Until recently, I'd never met any programmer who couldn't do bit-twiddling as instinctively as they did addition. Some of the latest generation don't have any practice at it, which is a shame, but they also don't have the inappropriately conservative memory and CPU habits that us old fogies have. But it's still very much not rocket science, and it's only an issue if you're reimplementing it. Why would you be reimplementing it? >But is it worth the bother? That's my fundamental question about binary XML, and IMO the answer is: almost never. Obviously you disagree, but I think the whole idea is solving yesterday's problems, instead of focusing in tomorrow's. >The applications where binary XML >are desired >- high throughput transaction processing systems, low bandwidth >communications, and low-power processors with small memory - are not >generally places where complex compression algorithms are >worthwhile, with >the exception of the low bandwidth comms if both ends have CPU >cycles to >spare. For an embedded processor, the costs of parsing textual >XML and the >costs of handling gzip can both be out of the question >compared to dealing >with a very simple binary SAX stream; the storage requirements of this >over gzipped data are often less important than the speed of >processing! Again, this is an area where you're looking back to Elder Days... Most new embedded systems now have pretty decent CPUs, certainly fast enough to run gzip if they have to trade off a little speed and RAM to make up for a slow network pipe. -- <a href="http://kuoi.asui.uidaho.edu/~kamikaze/"> Mark Hughes </a>
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








