[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why is the variable and regex slow in saxon and fa

Subject: Re: Why is the variable and regex slow in saxon and fast in regex Buddy?
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Tue, 28 Sep 2010 18:43:44 +0200
Re:  Why is the variable and regex slow in saxon and fa
Two comments, which may not shed any light on the non-termination, but
anyway:

First, the pattern "\([^\)]*\)" is supposed to remove any
parenthesized text, but there's
no point in using "[^\)]" since the set of "any character except ')'
is simply denoted
by "[^)]" becaue a parenthesis is not a meta-character within brackets.

Second, to remove all characters of a kind (single character or class)
it's better
form to use a repetition, e.g.,  "\d+" rather than just "\d".

-W


On 28 September 2010 14:44, Alex Muir <alex.g.muir@xxxxxxxxx> wrote:
> Hi,
>
> I found something quite interesting which may help further understand the
issue.
>
> Independently none of the following variable takes long to process,
> such that when  I no longer chain the variables together but just run
> the template calling only one variable and comment out the others the
> time to run is short.
>
>   <xsl:variable name="title"
>       select="mh:stripTextNewline(normalize-space(.))"/>
>
>     <xsl:variable name="titleBraketedTextRemoved"
>       select="replace($title,'\([^\)]*\)','')"/>
>
>     <xsl:variable name="titleNumberRemoved"
>       select="replace($titleBraketedTextRemoved,'\d','')"/>
>
>     <xsl:variable name="titleStripPunctuation"
>       select="mh:stripPunctuation($titleNumberRemoved)"/>
>
>     <xsl:variable name="titleStopWordsRemoved"
>      
select="normalize-space(mh:removeStopwords($titleStripPunctuation,$stopwords)
)"/>
>
> As the variables are combined together they take more and more time to
> execute and finally if all together they do not stop running.
>
> So initially I was wrong to suggest that the titleBraketedTextRemoved
> variable was causing the problem. It's just that the problem is
> exacerbated when I finally add that variable into the chain of
> variables.
>
> I reduced the size of the input file so that the $title contains one
> small line of text in order to get an idea on the profiling however
> the processing does not complete.
>
> I'll have to talk to my client later today before posting the full code.
>
> Thanks
> Alex
>
>
>
>
>
>
>
> Alex
>
>
> On Mon, Sep 27, 2010 at 7:54 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>>  I don't know - they are both, I think, using the Java regular expression
>> engine underneath. It may be a function of how you are measuring it. It
>> could be that the cost is dominated not by the cost of evaluating the
regex,
>> but by the cost of checking that it conforms to the XPath rules. Did you
run
>> a Java profile to determine where the time is being spent?
>>
>> Michael Kay
>> Saxonica
>>
>> On 27/09/2010 7:21 PM, Alex Muir wrote:
>>>
>>> HI,
>>>
>>> I'm unable to figure out why this regex is so very time consuming such
>>> that it does not end in oxygen but works quickly in regex buddy on the
>>> same content.
>>>
>>>     <xsl:variable name="BraketedTextRemoved"
>>>        select="replace($title,'\([^\)]*\)','')"/>
>>>
>>> I'm just trying to remove content with brackets ( dfd234**#*$#*$#fdfd )
>>>
>>> Running on vendor="SAXON 9.2.0.6 from Saxonica" version="2.0"
>>>
>>> Any Ideas?
>>>
>>> Thanks
>>> Alex

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.