Writing Custom XML Adapters

By: Tony Lavinio, Software Architect, Principal, The Stylus Studio® Team

Overview: Accessing CSV Data as XML Using a Custom On-the-fly Converter

There are cases where the input format is too complicated for Convert to XML, or both reading and writing of the file format is needed. For those cases, Stylus Studio® XML Enterprise Edition allows you to write and hook in your own adapters.

This section will take you through the process of writing a medium-sized adapter in Java.

By the end of this, you should have a basic knowledge of how to write a bidirectional adapter, and get a copy of the source code for a CVS-to-XML-and-back adapter.

Creating the CSV to XML Converter Class Stubs

All adapters in Stylus Studio® are derived from a base class com.stylusstudio.adapter.AdapterBase. Here is our class, with the necessary stub methods filled in:

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import com.stylusstudio.adapter.AdapterBase;
import com.stylusstudio.adapter.InvalidFormatException;

public class CSV extends AdapterBase
{
    public String getExtensions() {
        return "csv";
    }
    public String getUrlName() {
        return "Comma";
    }
    public String getDescription() {
        return "Comma Separated Values";
    }
    static public final char m_comma = ',';
    // Just change m_comma to '\t' to turn it into a Tab Separated Value adapter
    //   or ':' for a Colon Separated Value adapter, or whatever you desire.

    public void toXML(InputStream in, OutputStream out)
        throws IOException, InvalidFormatException {
    }
    public void fromXML(InputStream in, OutputStream out)
        throws IOException, InvalidFormatException {
    }
    private class CommaHandler extends AdapterHandler {
    }
}

If this were a read-only adapter, we could omit the fromXML() method, and when a user tried to write through the adapter, the framework would automatically catch the problem and inform the user. Similarly, we support write-only adapters, which are implemented by omitting the toXML() method.

  • The getExtensions() method returns a comma-separated list (ironically enough) of extensions that are likely to hold this type of file.
  • getUrlName() is the name of the class as it will appear in the special adapter: url.
  • The method getDescription() returns the longer name of the class as it will appear in the File>Open dialog.
  • m_comma is the separator character we're going to use. We could make this configurable though the adapter UI also, as the built-in CSV class does.
  • toXML(InputStream, OutputStream) takes an input stream in the native format, which in our case will be CSV, and writes out XML.
  • fromXML(InputStream, OutputStream) does the opposite. In order to support this, we're going to use a SAX parser, which we've subclassed and called CommaHandler.

Looking at the Input and Output Formats

Our CSV adapter will support the following features:

  • Commas separate the fields.
  • A field may be quoted with " or ' if it contains commas.
  • Any character within a line may be escaped with \.
  • Inside of a quoted string, a doubled " or ' will be interpreted as an escaped quote.
  • All spaces are significant and kept.

Our CSV file is going to look like this:

bmw,2004,14274
kawasaki,1996,60234
ducati,1997,24000

And our generated XML is going to use the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<table>
    <row>
        <column>bmw</column>
        <column>2004</column>
        <column>14274</column>
    </row>
    <row>
        <column>kawasaki</column>
        <column>1996</column>
        <column>60234</column>
    </row>
    <row>
        <column>ducati</column>
        <column>1997</column>
        <column>24000</column>
    </row>
</table>

Reading the CSV File (the toXML() method)

The code for doing this can be seen here, but the pseudocode for it is this:

  1. Set up a DOM document.
  2. Read a line from the input stream.
  3. Create a DOM element.
  4. As each field is seen, create a DOM element.
  5. Create a Text DOM object, and attach it to the element.
  6. Attach the element to the current element.
  7. If more input, go to the second step.
  8. Serialize the DOM as XML to the output stream.

If you look carefully at the real code, you will also notice another convenience function, getEndOfLine(), which returns the current end-of-line setting based on the user perference. This is a default property of all adapters; how to let your adapter present its own set of options to the user interface will be documented elsewhere.

Writing the CSV File (the fromXML() method)

In this case, we need only a small stub to call our SAX content handler class. Most of the activity will happen there.

The only important thing to remember is not to ignore any exceptions. They can be passed back up to the user interface using the exception(Exception) method that we provide as part of the AdapterBase class.

public void fromXML(InputStream in, OutputStream out)
    throws IOException, InvalidFormatException
{
    SAXParserFactory factory = SAXParserFactory.newInstance();
    try {
        SAXParser parser = factory.newSAXParser();
        parser.parse(in, new CommaHandler(out));
    }
    catch (Exception e) {
        exception(e);
    }
}

Our CommaHandler class is derived from AdapterHandler. That class in turn is a subclass of org.xml.sax.helpers.DefaultHandler with no overridden methods, combined with emply methods implementing org.xml.sax.ext.LexicalHandler. This is useful when you want to intercept and parse additional information in the incoming XML file, such as DTD references.

After keeping a local reference of the output stream passed to it, here is how CommaHandler deals with the various SAX events that come in:

CSV to XML SAX startElement

When startElement is called, we keep track of how deeply nested we are. Any time we start a new second-level node, we create a new row. Anytime we see a third-level row start, we start a field. This means we don't really care whether our XML looks like <a><b><c>one</c><c>two</c></b><a> or <root><row><field>one</field><field>two</field></row></root> or <table><tr><td>one</td><td>two</td></tr></table>, in each case, the output is going to look like one,two.

CSV to XML SAX endElement

When endElement is called for the end of a second-level element, we put out a line feed to finish the line we've been writing.

When it is called for the end of a third-level element, we check the text we've accumulated under the characters method. If it doesn't contain a comma or quote character, we can just emit it. Otherwise, we've got to quote it and escape any embedded quotes. We also have to remember that if we're not the first field in the row to put out a comma first to separate ourselves from the previous field.

CSV to XML SAX characters

In the characters method, we append the characters we see to the current m-text text accumulator. This is set when we start a third-level node, and reset when we leave it. If it is null, it means we're not in the context of a third-level (or deeper) node, so we don't record anything.

CSV to XML SAX endDocument

We can speed up things slightly by flushing the stream when we are done with it when we see the endDocument event.

The Completed CSV to XML Conversion Adapter Class Java Source Code

At this point, we've completed the code for our adapter. The full source code can be seen here: CSV.java.

We just need to compile it and tell Stylus Studio® XML Profession Edition that it is available.

To compile it, we'll assume that Stylus Studio® is installed in the usual place C:\Program Files\Stylus Studio XML Enterprise Edition and that your javac.exe Java compiler is in your system PATH. Of course, all this could be done within your favorite Java IDE as well.

The JARs we need on the classpath are all in the bin directory under that, namely:

  • CustomFileSystem.jar
  • AdapterFileSystem.jar
  • xercesImpl.jar
  • xml-apis.jar

So, this little compile.bat file will build our class for us:

@echo off
set STYLUS=C:\Program Files\Stylus Studio® XML Enterprise Edition
set CP=%STYLUS%\bin\CustomFileSystem.jar
set CP=%CP%;%STYLUS%\bin\AdapterFileSystem.jar
set CP=%CP%;%STYLUS%\bin\xercesImpl.jar
set CP=%CP%;%STYLUS%\bin\xml-apis.jar
@echo on
javac -classpath "%CP%" CSV.java

Assuming we put those files into C:\csv, we go into Tools > Options > XML Converters and add that to our path.


(click image to enlarge)

Stylus Studio® will search the path and list all of the adapters found, so that your screen will look like this, with our newly-added adapter Comma showing clearly. ("CSV" is a similar class that we ship as part of Stylus Studio®.)


(click image to enlarge)

Using the Newly-Created CSV to XML Adapter with Stylus Studio®

To open a sample document with our new adapter, we use the File > Open dialog and build a URL just like for any other adapter:

  1. Select your CSV file from your file system,
  2. Check the box "Convert to XML using adapter",
  3. And then select your adapter.

(click image to enlarge)

Now using this URL, you can read and write CSV files as if they were XML. And more importantly, you may use them as input to XSLT or XQuery, or output from those transformations, as easily as any XML document.

Using the Adapter as a Java CSV to XML Conversion Tool

Since Stylus Studio® exposes the Java API to the StylusFile class, this newly-created class is now callable from Java code through the same interface as any other Stylus Studio® adapter: URL. See Invoking an Adapter Programmatically in the documentation for more details, but a simple example showing the CSV adapter we just wrote is right here:

// Java Source Code for a Stylus Studio® CSV to XML Converter

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.io.InputStream;
import java.io.IOException;

import com.exln.stylus.io.StylusFile;
import com.exln.stylus.io.StylusFileFactory;

public class demo {
    public static void main(String args[]) {
        StylusFileFactory sff = StylusFileFactory.getFactory();
        try {
            StylusFile in = sff.createStylusFile("adapter:///Comma?file://c:\\temp\\zero.csv");
            InputStream is = in.getInputStream();
            File out = new File("zero.xml");
            OutputStream os = new FileOutputStream(out);
            copy(is, os);
            is.close();
            os.close();
            System.out.println("conversion succeeded: zero.csv -> zero.xml");
        } catch (IOException ioe) {
            ioe.printStackTrace();
        }
    }

    static void copy(InputStream is, OutputStream os) throws IOException {
        byte buffer[] = new byte[8192];
        int bytesRead;
        while ((bytesRead = is.read(buffer)) != -1)
            os.write(buffer, 0, bytesRead);
        os.flush();
        os.close();
    }
}

PURCHASE STYLUS STUDIO ONLINE TODAY!!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Learn XQuery in 10 Minutes!

Say goodbye to 10-minute abs, and say Hello to "Learn XQuery in Ten Minutes!", the world's fastest and easiest XQuery primer, now available for free!

Ask Someone You Know

Does your company use Stylus Studio? Do your competitors? Engineers from over 100,000 leading companies use Stylus Studio, and now you can ask someone from your own organization about their experiences using Stylus Studio.

 
Free Stylus Studio XML Training: