[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Find the number of elements that are prior to the

Subject: Re: Find the number of elements that are prior to the series of elements that match a string?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 12 Mar 2019 21:46:28 -0000
Re:  Find the number of elements that are prior to the
On Tue, Mar 12, 2019 at 1:31 PM Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> In performance terms, of course, a solution using keys incurs a one-time cost to build indexes, which is amortized over a number of searches; so the performance benefits depend on whether the search is done once, or repeatedly.
>
> If there are "hundreds of thousands" of Byte elements, the space used by the index can become a significant factor.
>
> Also, a solution using keys will almost inevitably process the whole input sequence. Other solutions are likely to stop reading the input sequence when the first match is found.

Wouldn't "hundreds of thousands" of Byte elements be also a (both
performance and) space problem when using index-of() -- especially if
a starting subsequence of the searched string happens regularly in the
document? Also RegEx matching becomes not so efficient with large
strings, too.

It would be nice if a particular XSLT processor implements lazily the
index-of() function, allowing its partial results to be immediately
accessible to the evaluator of the whole XPath expression -- in this
case only, the solution using index-of() will stop immediately after
the successful index value is produced -- otherwise the evaluator will
need to perform on the complete index.

Of course, exactly the same reasoning applies to the implementation of
<xsl:key> and the key() function, if a smart enough XSLT processor
allows the key() function to operate on a partially-built index, and
when a solution is found the building of the index is halted.

So, **depending on implementation details**, the other solutions
**may** have a better average performance for a single search. With
multiple searches though, using keys will be more performant. Also,
depending on where the sought substring is located for the first time
in the document, if it isn't found at all or is at the end of the
sequence of Byte elements, then exhaustive string scanning will be
performed with these other solutions too.

If we accept that an "average" search will involve scanning half of
the string, then if 3 or more searches are performed, the key-based
solution will have incurred the cost of processing the document just
once, while the accumulated string scanning of the other solutions
will be over greater (accumulated) length of sequences of Byte
elements.

Maybe we need a performance comparison of the different solutions,
over different source XML documents that cover these cases?

Cheers,
Dimitre
>
> Michael Kay
> Saxonica
>
> On 12 Mar 2019, at 20:20, Dimitre Novatchev dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Here is an XSLT 2.0 solution using keys. In its current form it
> returns the wanted offset (length) as a hexadecimal number. To convert
> this to a decimal number, have a look at the archives where AFAIR
> there is such code -- could be even posted by me ... This conversion
> to decimal can certainly be written elegantly as fold-left() in
> XSLT/XPath 3
>
>
> XSL-List info and archive
> EasyUnsubscribe (by email)



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.