[Home] [By Thread] [By Date] [Recent Entries]
I think there are several more projekt out there including one of mine "GREN" which is a BinaryML based on groves. which is basically XML Information items. A few points on speed,: * most BinML should be faster than any TextML since there are usually fewer parsing steps involved. Examples: StringToInt conversion is not neccessary, string sizes are know. * In Write Once-Read Many usecases it is possible to further increase spead and reduces space req. by multipass compression teqniques. * Parsers written in C or C++ are even faster due to fast memory access. Space: * In most cases BinML is smaller but not always: - text represenation of array of doubles is smaller when the array contains small integers * I have used several compression teqniques and in usecases when the DTD is "well-known" to both the writer and the reader the size can be significantly reduced. * There exits many classes of BinML parsers thar are significantly smaller than corresponding TextML ones. * Large documents with a many and different tags are usually smaller when using BinML. Currently Im working on version 2 and implementing a protoype (when I have spare time) whith the goal of creating a BinML version of JAR files ( Java .class files). Ive just discovered that its is possible to save 3-5MB (out of 13MB) by removing redundant information, so it should be possible to "add markup and reduce size" ! Hopefully this remains true when the prototype is done :) /anders Anders W. Tell Financial Toolsmiths AB Murali Mani wrote: > I am aware of a few research projects that used a binary representation of > XML. I think the main things with binary XML are compressed data exchange > and storage and I think they reported faster processing than ordinary DOM. > I shall list the 3 that I have heard of. The last I heard of them was > almost a year ago. > > 1. PDOM - Persistent DOM - They do not do any extra compression other than > the compression due to binarization. The main point here is the DOM is not > fully memory resident - it does partial loading. > 2. Millau - They separate structure and content and do compression. Mainly > it is for exchange, and I think they also got to a very primitive DOM > support. This was in WWW9, and was a research project at IBM. > 3. XMill - This is from AT & T and University of Pennsylvania. I think > this achieved the most compression. Idea was similar to Millau's but in > addition, for values they used something called "column-based compression" > -- I think compression and decompression overhead as well as header > overhead is larger here. > > I also think the space and time savings by the different projects were > something like: > Millau - claimed storage decreased 5 times. also claimed something like > 20% savings in processing the DOM. (recollections). > XMill - claimed storage decreased only for large documents - documents > should be at least 64 kB (recollections). > But I am sure Millau claimed storage and processing got better. > > regards - murali. > > On Tue, 10 Apr 2001, Al Snell wrote: > > > On Tue, 10 Apr 2001, Tim Bray wrote: > > > > > So, Sean may have used strong language, but in point of fact > > > he was correct, so it's forgivable. Get some data on how > > > much space and time a binary representation will save, then > > > you'll be able to make intelligent quantitative decisions > > > on where it's worthwhile deploying it. > > ------------------------------------------------------------------ > The xml-dev list is sponsored by XML.org, an initiative of OASIS > <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To unsubscribe from this elist send a message with the single word > "unsubscribe" in the body to: xml-dev-request@l...
|

Cart



