[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Whitespace handling when compressing XML with WInZIP
I have an XML file with most of the data at a tag nesting level of 5-7, and am compressing it with WINZip 8. While not touching the markup, I've found the whitespace used to "step out" the tags has a big impact on the compression ratio achievable. Using 4 chars per nesting level, the file is 6.7 MB and compresses to 154Kb. Using 1 tab per nesting level, the file is 4.2 MB and compresses to 98 Kb. While the change in the input file is what I'd anticipated, I would have expected the zip file to be roughly the same size. Wouldn't repeated runs of the same character be replaced with a single token, whether they be spaces or tabs? Am I missing something obvious? Does anyone understand the internals of WINZIP enough to explain the descrepancy? The ratio of the compressed files suggests its encoding the multiple spaces with multiple tokens rather than a single token. Thanks Michael P.S. I reran the tests with XMill . It achieved a compressed file size of 60Kb regardless of the change in whitespace. My content data does have a high degree of redundancy within identically names elements which would help it. ------------------------------------------ This e-mail is confidential. If you are not the intended recipient, any use, disclosure or copying of this document is unauthorised and prohibited. If you have received this document in error, please delete the email and notify me by return email or by phoning the NEMMCO Helpdesk on 1300 300 295.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|