[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Processing milestoned XML leads to many preceding:: ca

Subject: Processing milestoned XML leads to many preceding:: calls and horrible performance
From: Matěj Cepl <mcepl@xxxxxxxxxx>
Date: Tue, 21 Feb 2012 10:07:36 +0100
 Processing milestoned XML leads to many preceding:: ca
Hi,

I am again working on a XSLT stylesheet to convert a Czech Bible translation from home-brew schema to OSIS and I got to some performance problems.

Whole stylesheet is https://gitorious.org/sword/czekms-csp_bible/blobs/master/CEP2OSIS.xsl (and git repo can be clone from ...), but I believe the relevant parts are

    <xsl:template name="genRef">
        <xsl:variable name="refKniha" select="//kniha[1]/@jmeno"/>
        <xsl:variable name="refKapitola" select="preceding::kap[1]/@n"/>
        <xsl:value-of select="concat($refKniha,'.',$refKapitola,'.')"/>
    </xsl:template>

    <xsl:template name="endVerse">
        <xsl:param name="rBase" />
        <xsl:element name="verse">
            <xsl:variable name="prevVerseID">
                <xsl:value-of select="./preceding::vers[1]/@n" />
            </xsl:variable>
            <xsl:attribute name="eID">
                <xsl:value-of select="concat($rBase,$prevVerseID)" />
            </xsl:attribute>
        </xsl:element>
    </xsl:template>

<!-- ... -->

<xsl:template match="vers">
<xsl:variable name="refBase">
<xsl:call-template name="genRef" />
</xsl:variable>
<xsl:variable name="refID" select="concat($refBase,./@n)" />
<!-- Find out whether this is a first verse in a chapter; notice that <kap/> element is milestoned as well,
so we have to count a distance in <verse/> elements from it, rather than use plain count() -->
<xsl:variable name="curPos"


select="count(./preceding::kap[1]/following::*[not(count(preceding-sibling::vers|current()) = count(preceding-sibling::vers))])" />
<xsl:if test="not($curPos=1)">
<xsl:call-template name="endVerse">
<xsl:with-param name="rBase">
<xsl:value-of select="$refBase" />
</xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:element name="verse">
<xsl:attribute name="sID">
<xsl:value-of select="$refID" />
</xsl:attribute>
<xsl:attribute name="osisID">
<xsl:value-of select="$refID" />
</xsl:attribute>
</xsl:element>
</xsl:template>


This works (at least as much as I was able to test it give then the circumstances), but the performance is absolutely dreadful. Just book of Genesis took almost an hour before being processed (with one core of my dual-core CPU being constantly at 100%).

Obviously the problem is that <xsl:variable name="curPos"/>, and I read about how preceding* axes are horribly inefficient all over the Internet, but unfortunately I haven't figured out any other way how to do what I am doing and most laments about preceding* axes don't provide much hints either.

The problem is (I think) in both <vers/> (that's "verse" in Czech) and <kap/> (that's an abbreviation of "chapter") are just milestones, so I have to go through all verses in whole book all the time (yes, this is http://www.joelonsoftware.com/articles/fog0000000319.html all over again).

Any ideas? Would some other XSLT processors other than xsltproc (libxml 20706, libxslt 10126 and libexslt 815) I am using be able to optimize this somehow?

Thanks a lot,

MatDj

--
http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC

P2 QQP6P>P9 P<P>P=P0QQQQQ QP> QP2P>P8P< QQQP0P2P>P< P=P5 QP>P4QQ.
    -- Russian proverb (this time actually checked by a native
       Russian)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.