Binary XML

How do I get my binary data into XML?
How do I get my binary data out of XML?

Ah, the age-old problem (well, at least since XML was invented) is how to embed binary data in XML? Never mind why; metaphysics is outside the scope of this document. We're going to tell you how to put it in, and how to get it out.

XML is not the idea carrier for binary data. It is a text format, and as such doesn't cope well with raw bits. But if binary data is properly encoded, using something like the W3C XML Schema types base64Binary or hexBinary, then using the XML Converters reading and writing binary files from XSLT and/or XQuery becomes a snap.

base64Binary, Base-64 and XML

We're going to use the Base-64 encoding format for this demonstration, since it packs tighter than hex encoding. But jump to the end to see hexBinary covered quickly.

Binary to XML





First, let's encode binary as XML. We'll take the shirt images that we used in the XML Report demonstration. Our source material will then be:

  • The shirt images
  • An XML file listing the names of the files
  • An XSLT transform that combines the two

The XML file will be very simple:

<?xml version="1.0" encoding="US-ASCII"?>
<list>
    <item>shirt-004.gif</item>
    <item>shirt-076.gif</item>
    <item>shirt-148.gif</item>
    <item>shirt-220.gif</item>
    <item>shirt-292.gif</item>
</list>

And the XSLT not much bigger:

<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="US-ASCII"/>
    <xsl:template match="/">
        <list>
            <xsl:apply-templates select="list/item"/>
        </list>
    </xsl:template>
    <xsl:template match="item">
        <file name="{.}">
            <xsl:value-of select="document(concat('adapter:Base-64?http://www.stylusstudio.com/images/publish/', .))"/>
        </file>
    </xsl:template>
</xsl:stylesheet>

So, how does it work? For each <item>, it wraps it in a new <file> and attaches an attribute named "name" holding the input file name. The content comes from the shirt file and through the Base-64 Deployment Adapter. That adapter takes any raw data and returns it as Base-64 encoded data, which is entirely compatible with XML. (Also note that I've made the reference to the files absolute and pointing to our server, if they were local you could just say <xsl:value-of select="document(concat('adapter:Base-64?', .))"/> without all of the pathiness.)

This makes the resultant XML document look like this:

<?xml version='1.0' encoding='US-ASCII' ?>
<list><file name="shirt-004.gif">
R0lGODlhZABqALMAAFrMYr/BvlKOVJKOg2xZUKmenMfDw8tgWJpVUbaxsPb19v///+bm5tfX1wAA
AAAAACwAAAAAZABqAAAE/3DJSau9mCrGWhkFIopF2TSMkq1s674rk4x0TRB1ETQq7P/AS6gmOhiP
Rxuu0Ag6nyzGCEmtVmkFqHa7MByK1rCV8OWagwUjTsw2FhG9s5ylQCRxtIF+gNjTvmtfDHOEGQ12
....lots more base-64 stuff....</file><file name="shirt-076.gif">
....lots of base-64 stuff....</file><file name="shirt-148.gif">
....lots of base-64 stuff....</file><file name="shirt-220.gif">
....lots of base-64 stuff....</file><file name="shirt-292.gif">
....lots of base-64 stuff....</file></list>

That R0lGODlhZABqAL stuff is the Base-64 equivalent of the binary data. Because any encoding of binary data where the target character set is restricted will mean there is some expansion, Base-64-encoded binary will be 33% larger than the raw data (for every three bytes in, four go out). Base-64 is pretty good; hex-encoding doubles the size (for every one byte in, two go out).

XML to Binary

Now we've got it in; how do we get it out? Supplying that same document we just created as input to this next XSLT document will give us the five shirts' worth of GIF files again. (And we've added a little bit to write out status messages while it is working so we know what we've created.)

<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <xsl:apply-templates select="list/file"/>
    </xsl:template>
    <xsl:template match="file">
        <xsl:text>Writing </xsl:text>
        <xsl:value-of select="@name"/>
        <xsl:text>&#10;</xsl:text>
        <xsl:result-document href="{concat('adapter:Base-64?',@name)}">
            <xsl:copy-of select="."/>
        </xsl:result-document>
    </xsl:template>
</xsl:stylesheet>

The key here is the xsl:result-document section. It opens a new document, and copies the contents of our Base-64 encoded field. Now, without the Base-64 Deployment Adapter, this would just get copied out to that file as a bit of XML. But, since we're now writing through this bi-directional adapter, it catches the XML, and takes all of the Base-64 it sees inside and turns it back into binary. So what persists on disk in the end is the actual GIF file.

hexBinary, Hex and XML

Exactly the same results can be achieved with hex-encoded data. Just use the Binary Deployment Adapter, which by default uses base-16 for encoding — just what the W3C XML Schema data type hexBinary expects.

Read and Write Binary XML

There isn't really any such thing as "binary XML", but even that won't stop you from mixing binary and text inside XML thanks to the Base-64 and Binary Converters. The target format doesn't even have to be a file; the adapter could be used in a web serving environment to feed images right from source XML. Since you control the code, there are no limits. Examine the adapters and the rest of Stylus Studio® X16 XML Enterprise Suite by downloading an evaluation copy and trying these samples today!

 
Free Stylus Studio XML Training:
W3C Member