[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: almost four years ago....

  • From: Elliotte Rusty Harold <elharo@m...>
  • To: The Deviants <xml-dev@l...>
  • Date: Tue, 19 Jun 2001 08:45:54 -0400

you re almost
At 4:09 PM +0100 6/16/01, Alaric Snell wrote:
>This is easy to do. GZIP is massively crippled by having no information about
>the structure of the file - it's just a string of bytes that it has to make
>some assumptions about the probable structure of with regards to frequency
>distributions that won't even apply very well to XML; it's trivial to write
>something that compresses better, especially if you use gzip for 
>what it's best
>at (the CDATA) and handle the <> bits yourself.
>

I've heard that one before too. In practice, it isn't nearly as easy 
as people think it is. After a great deal of effort, you may be be 
able to shrink 1% or 2% more on some files. However, most people who 
try this end up producing something that is noticeably larger than 
gzip.

Of course you could use a better general purpose compression 
algorithm. bzip can grab you 5% or so a lot of the time, though it 
isn't as widely supported. Frankly, if you can't provide at least a 
10% improvement then it's not worth my time to worry about.

Better than 10% smaller, I don't think you can do without a lossy 
algorithm. You simply run into the limits of information theory.

>>  3. Human legible/human editable data doesn't matter.
>
>Indeed, we must never use image files, filesystems, or gzip - they'll never
>take off :-)
>

This is a canard. Nobody uses XML for this stuff anyway.

>>  All three beliefs have been empirically proven false time and time
>>  again.
>
>Chuckle!
>

Hey, don't let me stop you from trying! I could be wrong, in which 
case we can all benefit from your efforts. But I think that if you're 
really smart and try really hard and devote months of your life to 
this problem, you aren't even going to get a 10% improvement over 
gzip. (You might not get any improvement at all.) And even if you do 
get that 10% improvement, I suspect you'll discover you're system is 
so inconvenient compared to plain or gzipped XML that nobody will use 
it. But after all, it's your life. If you've got the time to spend on 
this, feel free to try. I'm just afraid you'll get the same results 
as the last two dozen people who tried this.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@m... | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.