# Wednesday, 28 May 2014


Before build 1928b Stylus Studio could not run XSLT transformations using external processors, which output very large files.  The wiring was not designed to scale and the scenario output URL was not taken in consideration.

Our long time customer Yitzhak Khabinsky was working on a project which required transforming a 500 MB XML file into 650 XML output.  Their deployment is Microsoft.NET therefore XslCompiledTranform is the XSLT processor used in production. They tried to test the transformation in Stylus Studio without success.

Yitzhak's team sent us a test case and asked if we could make it work. After few days of hard work we came up with a solution which provides great benefits to any customer running command line processors.

1)  We have wired the transformation output file (either the output URL or the temporary file) into the Preview window, doing so Stylus Studio now can load a large amount of data in the Text Preview with minimal memory consummation, thanks to our custom memory manager.

2)  We introduced a dialog which shows the progress and allows canceling the operation.

 

In the following screenshot Stylus Studio is loading 1.3 GB XML output file in the Preview Window, notice in the task manager that the memory allocation picks no more than 90 Mb.




posted on Wednesday, 28 May 2014 21:04:50 (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Sunday, 10 November 2013



Stylus Studio X15 build 1910m takes an additional step toward supporting XML Schema 1.1. Here an example which shows how to use XML Schema 1.1 in Stylus Studio.

The following schema makes use of assertions, a schema feature introduced in version 1.1, which allows to assert XPath 2.0 expressions against the content model. Here we assert that the element "root" has to have a text node's length greater than 0.









Here we attempt to validate a document against the schema which is properly reported as not valid.







But what if we want to validate and to transform in a single step, taking advantage of Saxon Schema Aware?

The following screenshot shows the Saxon processor settings  in the XSLT Editor scenario dialog, which now features an additional combo-box to pick the validation mode.








If you do not want to link all your XML documents to the schema but still run validation you can make use of the schema cache which can be associated to a Stylus Studio project folder.










Now we just need to add our XSLT transformation to the project folder and, in the next execution, we can see the processor loading the schema automatically and flag the validation error







In addition to validate the XML input document we can also validate the transformation output. Here you can see the post Validation settings in the XSLT scenario dialog, which starting with build 1910m allows to select Saxonica Validator XSD 1.1.  

Post validation also can take advantage of the project folder schema schema cache discussed above. 




posted on Sunday, 10 November 2013 15:37:21 (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Thursday, 20 December 2012

Now that Stylus Studio X15 is finally out we can talk about what we have been working on in the last months.

The Stylus Studio XSLT Editor has been neglected in the last few major releases and user requests were starting to pile up, so it was time to take action. Here you can find a variety of improvements implemented in the Stylus Studio X15 XSLT editor.

Only suggests XSLT instructions based on the context

In the following example, the Auto-complete list shows only the instructions which can be nested inside the xsl:for-each.


An attribute should not be suggested if it’s already defined

We really want to implement this feature without compromising the XSLT editor scalability, so we decided to look ahead no further than a 1000 characters which covers 99% of the use cases. In the following example the suggest list shows only namespace because the attribute name is already defined.


Creating XSLT instruction skeleton

Some XSLT instructions like xsl:choose are very verbose, therefore it is quite handy to be able to create the instruction skeleton and then fill the blanks. If you hold CTRL while hitting the TAB key you get exactly that.



Language Nesting

XSLT transformation can be used to generate common XML grammars like XSL-FO or HTML, the Stylus Studio Auto Complete now handles multiple grammars each based on their own context. In the following example we see a suggested list with XSLT instructions and XSL FO tags all driven by their contexts, their parent tags.


The Odd Green tags

In previous Stylus Studio releases if a tag name was matching an HTML tag name, the XSLT editor rendered it green, even if the transformation wasn’t designed to generate HTML at all.

In Stylus Studio X15 in order to get HTML tag auto-complete and syntax coloring the output method has to be set to html (or xthml for XSLT 2).



posted on Thursday, 20 December 2012 16:06:36 (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback
# Friday, 28 September 2012

We are pleased to announce a new blog on XML technologies and data processing.

Stylus Studio - XML EDITOR BLOG

We look forward to providing the XML and Stylus Studio community with more great content, tutorials and insights.


posted on Friday, 28 September 2012 20:45:38 (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Thursday, 16 August 2012

Extending XSLT with Java and C#

The world is not perfect. If it were, all data you have to process would be in XML and the only transformation language you would have to learn would XSLT. Because the world is not perfect, sometimes you have to find ways to bridge different systems that were not designed to work together.

The most popular XSLT processors have been designed, from the very first release, to take advantage of the framework on which they run; for example Apache Xalan-J and Saxon allow calling Java functions. In this article, we will explore a variety of techniques for invoking native code from XSLT to extend the language beyond its capabilities.

The first example demonstrates how to leverage the Date and Time formatting capabilities available in the Java platform. Imagine you have a list of dates in an XML file that you have to display in a HTML page using different formats. The following screenshot shows a simple XML file with three repeating elements called “date”, each has three attributes: “year”, “month” and “day”.




A separate XML document has the date formats. Each element “entry” has an attribute format with the “picture string” which describes how and which part of the date should be displayed.




Our goal is to merge the information from the two XML documents into a simple HTML page which will display each date in multiple formats.

XSLT 1.0 lacks date and time formatting functions but, Java provides two classes: java.util.Calendar and java.text.SimpleDateFormat, which solve our problem. We just need to create a Java class with a single public static method that will be called from our XSLT transformation. In the following screenshot, we see the Stylus Studio Java extension editor which features syntax coloring, background syntax checking and integrated Java compiler invocation.




When designing Java extension functions for XSLT, it is important to remember that the function parameter type has to be compatible with the processor type mapping. Apache XalanJ defines the following type mapping between XSLT and Java.


XSLT Type 

Java Type 

Node-Set 

org.w3c.dom.traversal.NodeIterator 

String 

java.lang.String 

Boolean 

java.lang.Boolean 

Number 

java.lang.Double 

Result Tree Fragment 

org.w3c.dom.DocumentFragment 


Extension function support is implemented differently on each XSLT processor which makes it difficult to port XSLT code from one processor to another.

In order to run a transformation that makes use of Java code, you have to ensure that the compiled Java code (.class) is reachable from the CLASSPATH. This is a pesky setting which requires changing the environment variable called CLASSPATH. Fortunately Stylus Studio provides a flexible mechanism to include the Java compiled code (directories or Jar files) at the project level and, if the code is located under the project, Stylus Studio saves the path using a relative form. Therefore, you can move your project to a different location without fear of breaking the link between your XSLT and your Java code.




In the following screenshot, we see how to bind a Java class using XalanJ. The Xalan Java namespace declaration is at line 4, the function invocation is at line 19. Notice that the function name is formed with the prefix java: then the full Java class name the “.” and the function name. The Preview shows the transformation result.




Running the same transformation with Saxon requires a small change. The Java class binding is at line 4; the namespace URI is composed of the prefix “java:” and the full Java class name. The function invocation uses the namespace prefix “date:” and the Java method name. The same result is generated in the Preview window.




One major advantage in testing the code with Saxon in Stylus Studio is the ability to run the transformation in the XSLT debugger and step into the Java code to debug the Java extension, which is unique to Stylus Studio. In the following screenshot we see the execution suspended inside the extension function “printDate”. The Call-stack window shows from which XSLT template we came in and which parameter values were passed. The Variables window shows all variables in scope with their values. This is an unparalleled experience for the developers who usually have to write hundreds of trace messages in a log file in order to debug their code.

A side note: notice the second item from the top, in the Call-stack window. The new Saxon Just in Time XSLT to Java compiler generates Java code on the fly!




Stylus Studio mapping tool also provides full support for Java extension functions. In the following screenshot you see how to register a Java extension class, browsing the project CLASSPATH.




Once a Java extension class is registered, all of its functions are exposed. To add a new java function call, simply click on the Java Functions menu item.




With a few additional links, the mapping is complete. The XSLT visual mapping tool allows XSLT developers to take advantage of Java libraries developed by others without the need to know the underlying technical details.




Java is not the only language that can be employed for designing extension functions. If you are developing on Microsoft .NET framework and make use of XslCompiledTransform XSLT processor, you have access to the entire framework API. The following screenshot shows how to implement the date formatter in C# but we could have used JScript as well. The inline code embedded in the XSLT transformation is compiled into MSIL (Microsoft Intermediate Language) by the Just in Time C# compiler.

Inline extension functions have several logistic benefits: you don’t need to compile a separate module and you don’t need to maintain your logic in a different file.




If you need to debug such a transformation, Stylus Studio comes to the rescue Just switch the processor to XslTransform in the XSLT editor scenario dialog and you will be able to debug your code step by step.





In the following screenshot you see the execution suspended inside XSLT match template “date”. The Call-stack window shows the current stack and the variable window shows all variables in scope with their values and XSLT context which represent the XML node currently processed.




The msxsl:script block allows you to import third party .NET libraries which open the door to virtually infinite possibilities. In the following XSLT code fragment, an extension function called “fromEDI” makes use of XML Converters for .NET to parse an EDI file and to return an instance of XPathNavigator which can be manipulated in the XSLT as an XML node.

<msxsl:script implements-prefix='ut' language='C#'>
    <msxsl:assembly href="c:\Program Files (x86)\XML Converters for .NET\bin\XmlConverters.dll"/>
    <msxsl:using namespace="DDTek.XmlConverter" />
    <msxsl:using namespace="System.IO" />
<![CDATA[
    public XPathNavigator fromEDI(string ediPath)
    {
        ConverterFactory factory = new ConverterFactory();
        string url = "converter:EDI?" + ediPath;
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.XmlResolver = factory.CreateResolver();
        XmlReader reader = XmlReader.Create(url, settings);
        XPathDocument doc = new XPathDocument(reader);
        return doc.CreateNavigator();
    }
    ]]
>
</msxsl:script>

 <xsl:template match="/">
    <xsl:variable name="EDIXML" select="ut:fromEDI($EDI)"/>




We hope you enjoyed reading this article. If you have any questions, do not hesitate to contact us.

You can download the Project Zip file by clicking here.

- Stylus Studio Team


 Technical Support


 Follow us on Twitter


 Connect on Facebook
 

posted on Thursday, 16 August 2012 15:07:37 (Eastern Daylight Time, UTC-04:00)  #    Comments [0] Trackback
# Tuesday, 27 March 2012

Introduction to XSLT 3.0

While many W3C specifications take years to reach the recommendation state, XSLT has evolved quickly and deterministically, thanks not in small part to the great talent and sobriety of its spec. chair and a dedicated board committee.

The Stylus Studio team decided to be on the cutting edge, introducing support for the current XSLT 3.0 working draft in version X14 in order to give a chance to the community to start developing using the new language edition.

A variety of exciting new features have been introduced to make the language modern and to allow implementers to take advantage of modern hardware for transforming large data sets.

Support for Streaming

The need to process XML in streaming fashion, in other words, without loading the entire input document in memory, has risen over the years.  Several use cases require processing very large streams of XML events, for example stocking tickers or social media user's stream.

Here I show the specification formally defines streaming:

<<" A processor that claims conformance with the streaming option offers a guarantee that  ... an algorithm will be adopted ... allowing documents to be processed that are orders-of-magnitude larger than the physical memory available.">>

In 2007, a team of XML experts came up with a dedicated language called STX, Streaming Transformation for XML, to tackle the problem.  Even if the language did not gain significant popularity, it was a valuable exercise to identify use cases and come up with a declarative approach. Such experience has been an important inspiration for introducing the streaming feature in XSLT 3.0.

XSLT 3.0 introduces new constructs (xsl:stream, xsl:mode streamable="yes") to explicitly indicate to stream the execution of its instruction body.  Under streaming mode, there are a number of restrictions to be aware of:

·         You have access only to the current element attributes and namespace declaration.

·         Sibling nodes and ancestor sibling are not reachable.

·         You can visit child nodes only once.


 

The following diagram illustrates which nodes are accessible while processing an xml document that contains a list of books.


Here is an example of how to split a very large document into small fragments:

<?xml version="1.0"?>
<xsl:stylesheet version="3.0"
                xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
        <xsl:stream href="books.xml">
            <xsl:iterate select="/books/book">
                <xsl:result-document href="{concat('book', position(),'.xml')}">
                    <xsl:copy-of select="."/>
                </xsl:result-document>
                <xsl:next-iteration/>
            </xsl:iterate>
        </xsl:stream>
    </xsl:template>

</xsl:stylesheet>

 

Also of interest is the new instruction xsl:fork which declares that an XSLT block can be executed independently, during a single pass of a streamed input document.

Unfortunately, Saxon does not implement declarative streaming at the time of this writing.


 

Higher-Order Functions

Higher order functions are functions that either take functions as parameters or return a function.

XPath 3.0 introduces the ability to define anonymous functions and the XDM has been extended with the function item type. Such changes open the door to meta-programming using lambda expressions.

Let us start with an example: here is a lambda expression that calculates the square of two numbers and sums them.
(x, y) x*x + y*

Such expressions can be can be reworked into an equivalent function that accepts a single input, and as output returns another function, that in turn accepts a single input .
x (y x*x + y*y)

The variable f1 is assigned to an anonymous function that takes an integer and returns a function that takes an integer and returns an integer.

<?xml version='1.0'?>
<xsl:stylesheet
    
version="3.0"
    xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform"
    xmlns:xs=
"http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
        <xsl:variable name="f1" select="
            function($x as xs:integer) as (function(xs:integer) as xs:integer){

                    function ($y as xs:integer) as xs:integer{
                        $x*$x + $y * $y
                    }

            }
        "
/>
        <xsl:value-of select="$f1(2)(3)"/>
</xsl:template>
</xsl:stylesheet>

 

XPath 3.0 provides built-in support for common lambda patterns  such as map, filter, fold-left, fold-right, map-pairs. Here is an example of folding that sums only positive numbers from a list:

<?xml version="1.0"?>
<xsl:stylesheet  version="3.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:variable name="list" select="(10,-20,30,-40)"/>

    <xsl:template match="/">
        <xsl:variable name="f1" select="
        function($accumulator as item()*, $nextItem as item()) as item()*
        {
            if($nextItem &gt; 0) then
                $accumulator + $nextItem
            else
                $accumulator
        }"
/>

        <xsl:value-of select="fold-left($f1, 0, $list)"/>
    </xsl:template>
</xsl:stylesheet>

Text Manipulations

The language designers had always considered text manipulation an important feature, starting from XSLT 1. Functions for formatting numbers, date and time played an important role in building html content and eventually were moved to XPath in order to be shared with XQuery. XPath 2.0 introduced a large number of functions for manipulating strings: tokenize, matches, replace, string-join, upper-case, and lower-case. 

 Version 3 introduces a variety of new built-in functions for manipulating text, which are very useful when dealing with CSV data such as unparsed-text-lines, unparsed-text-available.

The following example shows how to implement a simple CSV to XML converter:

<?xml version="1.0"?>
<xsl:stylesheet version="3.0"
     xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform"
        xmlns:xs=
"http://www.w3.org/2001/XMLSchema"
        xmlns:hd=
"urn:header">

    <xsl:param name="csv" select="'one.csv'"/>
    <xsl:param name="sep" select="','"/>
    <xsl:param name="rootElement" select="'root'"/>
    <xsl:param name="rowElement" select="'row'"/>
    <xsl:param name="firstRow" select="true()"/>

    <xsl:variable name="header" select="tokenize(unparsed-text-lines($csv)[1], $sep)"/>

    <xsl:function name="hd:header" as="xs:string">
        <xsl:param name="col"/>
        <xsl:choose>
            <xsl:when test="$firstRow">
                <xsl:value-of select="$header[$col]"/>
            </xsl:when>
            <xsl:otherwise>item</xsl:otherwise>
    </xsl:choose>
    </xsl:function>

    <xsl:template match="/">
        <xsl:element name="{$rootElement}">
            <xsl:for-each select="unparsed-text-lines($csv)[position() &gt; 1]">
                <xsl:element name="{$rowElement}">
                    <xsl:for-each select="tokenize(., $sep)">
                        <xsl:variable name="pos" select="position()"/>
                        <xsl:element name="{hd:header($pos)}">
                            <xsl:value-of select="."/>
                        </xsl:element>
                    </xsl:for-each>
                </xsl:element>
            </xsl:for-each>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>

 

When processing in input a file like the following:

make,model,year,mileage
BMW,R1150RS,2004,14274
Kawasaki,GPz1100,1996,60234
Ducati,ST2,1997,24000
Moto Guzzi,LeMans,2001,12393
BMW,R1150R,2002,17439
Ducati,Monster,2000,15682
Aprilia,Futura,2001,17320

 

Produces as output

<?xml version='1.0' ?>
<root>
  <row>
    <make>BMW</make>
    <model>R1150RS</model>
    <year>2004</year>
    <mileage>14274</mileage>
  </row>

...
</root>

 

Conclusions

As you can see, there are many changes to look forward to in the upcoming XSLT 3.0 version. The specification is still under discussion and has not been finalized.  The Stylus Studio Team will follow this closely and will release intermediate builds to provide a reference implementation in order to be prepared when version 3.0 goes live.
 
 
 
 

posted on Tuesday, 27 March 2012 13:37:06 (Eastern Standard Time, UTC-05:00)  #    Comments [0] Trackback