[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why is the variable and regex slow in saxon and fa
Two comments, which may not shed any light on the non-termination, but anyway: First, the pattern "\([^\)]*\)" is supposed to remove any parenthesized text, but there's no point in using "[^\)]" since the set of "any character except ')' is simply denoted by "[^)]" becaue a parenthesis is not a meta-character within brackets. Second, to remove all characters of a kind (single character or class) it's better form to use a repetition, e.g., "\d+" rather than just "\d". -W On 28 September 2010 14:44, Alex Muir <alex.g.muir@xxxxxxxxx> wrote: > Hi, > > I found something quite interesting which may help further understand the issue. > > Independently none of the following variable takes long to process, > such that when I no longer chain the variables together but just run > the template calling only one variable and comment out the others the > time to run is short. > > <xsl:variable name="title" > select="mh:stripTextNewline(normalize-space(.))"/> > > <xsl:variable name="titleBraketedTextRemoved" > select="replace($title,'\([^\)]*\)','')"/> > > <xsl:variable name="titleNumberRemoved" > select="replace($titleBraketedTextRemoved,'\d','')"/> > > <xsl:variable name="titleStripPunctuation" > select="mh:stripPunctuation($titleNumberRemoved)"/> > > <xsl:variable name="titleStopWordsRemoved" > select="normalize-space(mh:removeStopwords($titleStripPunctuation,$stopwords) )"/> > > As the variables are combined together they take more and more time to > execute and finally if all together they do not stop running. > > So initially I was wrong to suggest that the titleBraketedTextRemoved > variable was causing the problem. It's just that the problem is > exacerbated when I finally add that variable into the chain of > variables. > > I reduced the size of the input file so that the $title contains one > small line of text in order to get an idea on the profiling however > the processing does not complete. > > I'll have to talk to my client later today before posting the full code. > > Thanks > Alex > > > > > > > > Alex > > > On Mon, Sep 27, 2010 at 7:54 PM, Michael Kay <mike@xxxxxxxxxxxx> wrote: >> I don't know - they are both, I think, using the Java regular expression >> engine underneath. It may be a function of how you are measuring it. It >> could be that the cost is dominated not by the cost of evaluating the regex, >> but by the cost of checking that it conforms to the XPath rules. Did you run >> a Java profile to determine where the time is being spent? >> >> Michael Kay >> Saxonica >> >> On 27/09/2010 7:21 PM, Alex Muir wrote: >>> >>> HI, >>> >>> I'm unable to figure out why this regex is so very time consuming such >>> that it does not end in oxygen but works quickly in regex buddy on the >>> same content. >>> >>> <xsl:variable name="BraketedTextRemoved" >>> select="replace($title,'\([^\)]*\)','')"/> >>> >>> I'm just trying to remove content with brackets ( dfd234**#*$#*$#fdfd ) >>> >>> Running on vendor="SAXON 9.2.0.6 from Saxonica" version="2.0" >>> >>> Any Ideas? >>> >>> Thanks >>> Alex
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|