|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Binary XML
Quoting from the page at http://www.research.att.com/sw/tools/xmill/ "XMill is a new tool for compressing XML data efficiently. It is based on a regrouping strategy that leverages the effect of highly-efficient compression techniques in compressors such as gzip. XMill groups XML text strings with respect to their meaning and exploits similarities between those text strings for compression. Hence, XMill typically achieves much better compression rates than conventional compressors such as gzip. XML files are typically much larger than the same data represented in some reasonably efficient domain-specific data format. One of the most intriguing results of XMill is that the conversion of proprietary data formats into XML will in fact improve the compression - i.e. the compressed XML file is (up to twice) smaller than the compressed original file! And this astonishing compression improvement is achieved at about the same compression speed." Those are interesting results. Conventional wisdom is that the compression of GZIP is sufficient for most text based formats. If I understand this page, they say that is almost right except that where there are regular patterns, one can add a compression based on regrouping that substantially improves that without loss of speed. Note that in files where the ratio of plain text nodes to markup is high (lots of text nodes, less markup), the XMill strategies are less effective. So, is it the case that this kind of compression is a big helper where the user analyses the file in advance and applies a custom compression per document type? Len Bullard Intergraph Public Safety clbullar@i... http://www.mp3.com/LenBullard Ekam sat.h, Vipraah bahudhaa vadanti. Daamyata. Datta. Dayadhvam.h -----Original Message----- From: Joshua Allen [mailto:joshuaa@m...] Sent: Tuesday, September 26, 2000 8:04 PM To: 'Mike Sharp'; xml-dev@x... Subject: RE: Binary XML The format used for binary tokenisation in WAP is WBXML: http://www.w3.org/TR/wbxml/ Dan Suciu and Hartmut Liefke built a compressor specifically for XML that uses information about the tags to get better compression than normal text-oriented compression (such as gzip) http://www.research.att.com/sw/tools/xmill/ -J > -----Original Message----- > From: Mike Sharp [mailto:msharp@l...] > Sent: Tuesday, September 26, 2000 3:40 PM > To: xml-dev@x... > Subject: Re: Binary XML > > > > > A WAP gateway does a binary tokenizing compression bit on the > original WML, that > results in astonishing compression. Don't know how that > applies to your > comments, but anecdotally, I've seen (and heard about) pretty > good compression > simply by using HTTP 1.1 and turning compression on. > Obviously, this doesn't > help if the XML transport isn't over HTTP (semaphore, anyone?). > > I'd be curious what people think about it--without, as you > say, involving the > wire protocol. Is it really necessary to map a specific > token to a specific > element (for example)? I suppose that it would allow a user > to de-tokenize the > document, returning it to some semblance of readability. But > this could be done > in a particular implementation, if needed, by referencing > some external document > map, couldn't it? > > Of course, the tokenized XML gets tricky if there are > external schemas, DTD's or > other XML,..how do you map the elements in the schema to the > same elements in > the XML, after they've been tokenized? Or did I miss the point...? > > Curiously, > Mike Sharp > > > > > > > > > "Bullard, Claude L (Len)" <clbullar@i...> on 09/26/2000 > 01:01:00 PM > > To: xml-dev@x... > cc: (bcc: Mike Sharp/Lante) > > Subject: Binary XML > > > > Raising an old horse, possibly dead: > > Has a standard XML binary token set, > possibly based on the InfoSet to > enable application to different > XML vocabularies been created? > > Or is the thinking still that > this side of the wireless protocols, > zipping/unzipping is still sufficient > given modem support? > > Len Bullard > Intergraph Public Safety > clbullar@i... > http://www.mp3.com/LenBullard > > Ekam sat.h, Vipraah bahudhaa vadanti. > Daamyata. Datta. Dayadhvam.h > > > > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








