[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Microsoft FUD on binary XML...


microsoft binary xml
Jeff Lowery wrote:
> What can be achieved by binary XML that can't 
> be similarly achieved using well-known text 
> compression algorithms?
	Basically, the binary encodings sometimes allow you to get
more compact data. They can also allow you to have faster encoders and
decoders. Frequently, of course, if you are really fanatical about
saving every bit, you can get one degree of "compression" by
converting to binary and then get even more by using something like
zip to compress your binary stuff. The results of compressing the
binary representation will often be better than compressing the
original XML.
	Let me give a really trivial example that shows some of the
benefits.
	Let's say that you had the following XML: (warning,
instructive but highly contrived example follows...)

<dataset>
 <x>27</x>
 <y>12</y>
</dataset>

There are about 45 bytes there if you count CRLF and the
pretty-printing spaces. Now, a binary encoding that was schema driven
would look at this and might replace the tags with index numbers which
would be stored as bytes. Thus, dataset becomes "1", x becomes "2" and
y becomes "3." The encoder, if it new that the values of x and y were
supposed to be integers might also replace the two character strings
with one byte integers. If the encoding was of the "tag-length-value"
family, it would first write a tag number to the stream, then a length
and then the value. This would give you the following sequence of 6
bytes for the XML above:
1	Dataset Tag
4	length of data included in Dataset
2	"x" tag
27	value of x
3	"y" tag
12	value of y

The encoding above would be roughly equivalent to what is done in the
ASN.1 defined "Basic Encoding Rules" or BER. Of course, you can go
further -- which is what ASN.1's PER "Packed Encoding Rules" do. A PER
encoder would realize (by reading the schema) that x *always* comes
before y and that both are required elements. Thus, there isn't really
any information provided by including the tags for the fields. The
encoder might also realize that there must always be two bytes of
data, thus, it doesn't do any good to say how long the data field is.
So, you could encode dataset in 3 bytes:
1	Dataset Tag
27	value of x
12	value of y

So, from 45 bytes to 3 bytes... Now, given the above, you would be
free to zip that and (assuming the file was a bit bigger) you would
probably get better compression than if you had just zipped the
original text file. Then again, you might be happy with the
compression you got from converting to binary and not even both with
the zip compression.

While compression benefits of the binary encoding should be obvious,
you can probably also see that deserializing these three bytes into an
in-memory structure is probably going to be much faster than decoding
the 45 bytes of XML text.

But, be careful when considering examples like the one above. It is
very easy to provide examples where binary formats compress amazingly
well. In real life, a binary format *will not always* compress data
well enough to be worth the trouble and you can't really make the
claim that a binary format will be faster to encode and decode. In
general cases, you should probably prefer the text encoding and only
move to binary if you *know* that it will be useful for your specific
datasets and processing requirements.

		bob wyman


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.