[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Performance of predicate-based patterns

Subject: Re: Performance of predicate-based patterns
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 23 Jan 2015 19:52:46 -0000
Re:  Performance of predicate-based patterns
I don't think anyone at all familiar with normal DITA XSLT practice would
use anything other than [contains(@class, ' foo/bar ')] or the DITA
Community df:class() function:

<xsl:function name="df:class" as="xs:boolean">
    <xsl:param name="elem" as="element()"/>
    <xsl:param name="classSpec" as="xs:string"/>

      <!-- '\$" in the regex is a workaround for a bug in MarkLogic 3.x
and for a common user
         error, where trailing space in class= attribute is dropped.
      -->
    <xsl:variable name="normalizedClassSpec" as="xs:string"
select="normalize-space($classSpec)"/>
    <xsl:variable name="result"
       select="matches($elem/@class,
                       concat(' ', $normalizedClassSpec, ' | ',
$normalizedClassSpec, '$'))"
       as="xs:boolean"/>

    <xsl:sequence select="$result"/>
  </xsl:function>

The df:class() function handles the case where a @class attribute value is
missing the required trailing space in the @class value (a problem that
MarkLogic used to cause but that was fixed in ML 4 I think).


If there's a more efficient way to match values in the @class attribute,
I'd certainly like to know about it.

Cheers,

E.
bbbbb
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 1/23/15, 8:19 AM, "Graydon graydon@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>On Fri, Jan 23, 2015 at 11:28:31AM -0000, Michael Kay mike@xxxxxxxxxxxx
>scripsit:
>> We've started doing some performance work in Saxon on the DITA
>> stylesheets, which use large numbers of match patterns in the form
>>
>> <xsl:template match="*[contains(@class, ' token ')]">
>
>If anybody ever starts using XSLT 2.0 for DITA processing, there are
>going to be things like
>
><xsl:template match="*[(tokenize(@class,'\p{Zs}+')[normalize-space()])[2]
>eq 'topic/li']]">
>
>showing up.  ("some $x in tokenize(@class,...."  seems pretty likely,
>too.)
>
>> Currently these require a very inefficient sequential search to find
>> the matching rule for each element.
>>
>> Does anyone know of any other commonly-used stylesheets (or even,
>> uncommonly used ones) which show similar characteristics, that is,
>> large numbers of match patterns using predicate matching only, with no
>> explicit element names? We'd like any optimizations we implement to be
>> as general-purpose as possible.
>
>I've done some conversion work on legal documents where the goal was to
>get everything back on a single schema after a couple decades of
>evolution in the element names of various DTDs.  Matches of the form
>
><xsl:template match="*[name() = ('P','NP','PARA')]">
>
>showed up a fair bit to match on the abstract "that's a paragraph"
>across the range of evolved element names.
>
>There was also a fair bit of
>
><xsl:template match="*[not(name() = ('PARA','LIST','TABLE')))]">
>
>used as general "we don't think there's anything but those in the data
>but let's not make rash assumptions" surprise handler templates.
>
>-- Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.