RE: Processing two documents, which order?

Play the video

Subject: RE: Processing two documents, which order?
From: Tony Nassar <tnassar@xxxxxxxxxxxx>
Date: Thu, 7 Apr 2011 14:52:23 -0700

Is there any reason to use a regex for *static* keywords from the shorter
list? Wouldn't you simply hash that using xsl:key, then
use the key() function to determine whether regex-group(1) is a key?
Presumably you'd lower-case everything, etc. In fact, putting
all the keywords in a regex is likely to work badly. I don't know if a regex
engine is going to optimize an "or" with 300 words in it...it's
probably going to assume that you wouldn't be using a regex to check for
equality with 1 of 300 words!

________________________________________
From: Dave Pawson [davep@xxxxxxxxxxxxx]
Sent: Thursday, April 07, 2011 10:57 AM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Cc: mike@xxxxxxxxxxxx
Subject: Re:  Processing two documents, which order?

On Thu, 07 Apr 2011 15:25:55 +0100
Michael Kay <mike@xxxxxxxxxxxx> wrote:

> On 07/04/2011 14:25, Dave Pawson wrote:
> >
> > I have two xml documents.
> > The first is a list of marked up words (1),
> > the second a 'normal' xml document (2)
> >
> > For each occurrence in 2 of a word from 1
> > I need to mark up the word with<property>  </property>
> >
> > Which order is anywhere near optimum?
> > Document 1 has about 300 words,
> > Document 2 is 33,000 lines.
> I'm having trouble seeing how this description of the problem relates
> to the code given below.
>
>  From first principles, if you do a nested loop then you're doing
> either 300*33000 operations or 33000*300 - its not a big difference
> either way. On the other hand if you use keys, then you are basically
> doing 300+33000 operations either way - but the key will be smaller
> if you build it on the smaller document, so that's what I would do.
>
> Using regex matching with a dynamically computed regex looks like bad
> news - or is it really a regex in the source document? Saxon
> precompiles the regex if it's known statically, but if not there's no
> caching or anything - it gets compiled on each use. From this
> viewpoint, using each regex once (in a single analyze-string call) is
> going to be better.
>
> Michael Kay
> Saxonica

The regex is required as I see it to determine starting and ending
conditions for the 300 'words'? I don't see how one...
Could I build and hold 300 regexen for later use, is that what
you were thinking Mike?

I'm still unsure of the approach though.
1. Build the keys on the smaller list of words
2. ??? build the sequence of regexen?
3. then....
   AFAICT I'm still going to have to process the entire long document
with each regex in the sequence?

Confused of Chorley.

--

regards

--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

Current Thread

Re: Processing two documents, which order?, (continued)

<- Previous	Index	Next ->
Re: Processing two documents,, Wolfgang Laun	Thread	Re: Processing two documents,, Dave Pawson
Re: spreadsheet xml and group, Fred Christian	Date	Re: Processing two documents,, Dave Pawson
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >