[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Is there anyone working on a binary version of XML?
From: David Megginson <david@m...> >Rick Jelliffe writes: > > > One trivial way to minimise file sizes for transmission is to > > collapse white-space inside markup (e.g. [\ \t \n\ r]+ becomes > > [\n]), > >Yes, that might be helpful (but only minimally in most cases). The reason I suggest it is this: at several stages in a network there is liable to be some point-to-point compression. In particular, of course, at the modem of the receiver (well, most receiving ends). XML's verboseness can be partially justified by the existance of this compression. Attempting to compress already-compressed data does not always lead to increased benefits: in fact, compressing already-compressed data can easily lead to larger files, which is why many compression systems first check that they have made any gains before writing out the compressed blocked. (And if you are going through 7-bit mail systems, then you can increase your transmission size by compressing data, if the data is ACII.) When judging an XML compression, it is important to judge its effect after being recompressed by the kind of compression that is found in modems (i.e., at the bottleneck): the simple, fastest deflate found in gzip can be useful. Furthermore, it is important to recognise that, because of the slow-start algorithm in TCP/IP and the WWW having quite long ACK delays, a compression of 2:1 is not the same thing as a doubling in arrival speed: more data will arrive earlier in each packet group, but the number of packet groups may be the same. In the case of the binary version of XML being mentioned, it would be interesting to see the four-way comparison (raw XML, binary "XML", compressed XML, compressed binary). One interesting results of my tests on the interaction of short-referencing and compression was that collapsing white-space was (for my independently-produced RDF test files) just as effective as short-referencing. (One reason might be that many compression algorithms only have a certain dictionary size, and a certain match-string size: reducing unneeded white-space may free up dictionary entries and allow more useful match-strings. Especially for on-the-fly compression, such as modems. ) I was surprised, because I thought that white-space was fairly insignificant: but I was wrong, for the data I was using (some data would fare better, I would hope, but some may be worse). So developers should pay attention to letting users keep their file sizes down: a 10 percent reduction in file size may not seem much, but if, at an extreme, all the packets are just over the size of the first packet group and the ACK latency is greater than the packet transmission time, it can result in the files completing in half the time. At the smaller file sizes of XML, and the trends to linking to external stylesheets and so on, reducing the crap in headers is quite important. In fact, I would think that it was good policy to have no unneccessary whitespace in header data in XML documents. >> and to minimize whitespace in data: (removing trailing spaces, [\ >> \t]+\n) becomes [\n], is a safe transformation, for example.) >No. It might be a safe transformation for specific XML formats, but >not for XML in general, because you don't know what people might be >using that whitespace for. Of course. But in practise text editors and some kinds of processing systems will often strip out trailing whitespace on opening or closing. So I should have said something like "It is not prudent to generate '[\t\s]+\n' where the whitespace is significant unless you are sure how software which uses that data treats trailing white-space." In any case, I was trying to say that one good way to reduce file sizes is to not generate unneeded characters in the first-place: I was not proposing an external compression mechanism based on white-space collapsing. Rick xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|