[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML / HTML Transport size
[sorry if this gets through twice, I got weird messages from the mail sevrer] John Cowan wrote: > Robin Berjon scripsit: >>>A variety of >>>small-scale studies have shown that general-purpose compression is generally >>>as good as, or better than, some scheme that knows it's compressing XML. >> >>Err, quite the opposite. XMill beats gzip. > > This one is news to me, but I'm looking into it now. You may also wish to take a look at Box (http://box.sf.net/). I don't remember how well it compares to gzip in compression but it's fast to decode (the website is down today with all other SF sites so I can't look it up right now). >>BiM/BiX requires a schema, > > Yes: by "knows it's compressing XML" I meant to imply "and doesn't know > anything more than that". I know, and that obviously makes things a little bit more complicated. However in most non-pathological cases it is possible to apply machine-learning techniques to deduce schema information (it also works on pathological cases -- ie instances for which the only fathomable pattern is the instance itself -- but it's rather useless there). That's something we're seriously investigating in order to efficiently support xs:any and xs:anyAttribute (for instance). There is also a fair number of cases in which there is no schema per se, but it can be usefully inferred from other metadata such as a WSDL document, an XQuery... >>but there are many ways in which a schema can be deduced, even with just >>a raw document (and it can be done more intelligently than most tools >>that deduces schema information from instances I've seen out there do >>it). > > Pointer(s)? The schema deducers I was referring to are the one included in Castor, and the one on gotdotnet.com: http://www.castor.org/ http://gotdotnet.com/team/xmltools/xsdinference/ Those tools are probably useful in cases where you just need a schema but don't care that it is the simplest schema for the given instance or set of instances. They tend to produce schemata that are pretty much snapshots of the instance and more or less exactly mirror it. The schema inferencer we're developing tries hard to get the simplest schema. The reason for this is that we need it to produce a schema that strikes the correct balance between generality and concision. Obviously if you are to send a decoder update (using decoder bytecode) in the stream, you want that extra information to decode more and better encoded data than it costs to send the decoder itself. I should normally have something to show in that area early next year. -- Robin Berjon <robin.berjon@e...> Research Engineer, Expway 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|