[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Using memory addressing to retrieve a value vice

Subject: Re: Using memory addressing to retrieve a value vice using a software string library to retrieve a value
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 20 Nov 2015 16:56:41 -0000
Re:  Using memory addressing to retrieve a value vice
> I am aware of at least two very efficient algorithms:

While the two described algorithms (of contains() and index-of())
aren't directly related to the question, what I wanted to say was that
just by saying "string search algorithms" this shouldn't imply in
general that any such algorithm is inefficient.


Cheers,
Dimitre

On Fri, Nov 20, 2015 at 8:38 AM, Dimitre Novatchev
dnovatchev@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> In addition to the replies by Dr. Kay and David Carlisle:
>
> It depends on the concrete case and the concrete XML document and the
> string itself.
>
> 1. Imagine a string starting with a space and then having the wanted
> substring (that is not too-long). This may be faster than scanning
> sequentially through thousands of nodes -- as in general an XPath
> engine may use a linked-list and not a vector.
>
> 2. On the other side, imagine an XPath engine that has the sequence of
> nodes in a vector (self-expandable array structure). Then the wanted
> node can be obtained in O(1). Imagine at the same time that you have a
> sting long many Megs, snd the first space happens only at the end of
> the string, and the wanted substring is also very long. In this case
> searching through the sting using the specified expression can be
> orders of magnitude slower.
>
> A general remark about finding the first/all  occurrence of a given
> string as a substring of another given string, or whether a string1
> contains a string2.
>
> I am aware of at least two very efficient algorithms:
>
>   1. Using *suffix-trees*  -- all possible suffixes of the given
> string are organized as a trie.  The search (same as the contains()
> function) is very fast.
>
>   2. Using the hash of the search-string and scanning the given string
> and sequentially computing the hash of every next substring with the
> same length as the search string. Because computing the hash of the
> moving "window" can be done incrementally, this algorithm is O(N)  --
> much better than the naC/ve O(N*M) algorithm. More about this -- in the
> book of Steven Skiena "The Algorithm Design Manual",
>
http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1849967202/ref
=sr_1_1?s=books&ie=UTF8&qid=1448037467&sr=1-1&keywords=skiena&pebp=1448037471
640&perid=0D1Z0T6Y90MCMDPDA330
>
>
> Cheers,
> Dimitre
>
>
> To summarize:  All depends.
>
>
>
> On Fri, Nov 20, 2015 at 4:13 AM, Costello, Roger L. costello@xxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi Folks,
>>
>> I want to retrieve "west".
>>
>> Which of these is faster?
>>
>> -------------------
>> Approach #1
>> -------------------
>> The <edge> element contains text:
>>
>>         <edge>garden west door</edge>
>>
>> "west" is retrieved using this XPath:
>>
>>         substring-before(substring-after(., ' '), ' ')
>>
>> Note: assume that <edge> is the context node.
>>
>> -------------------
>> Approach #2
>> -------------------
>> The <edge> element contains markup:
>>
>>         <edge>
>>             <garden/>
>>             <west/>
>>             <door/>
>>         </edge>
>>
>> "west" is retrieved using this XPath:
>>
>>         *[2]/name()
>>
>>
>> I believe that Approach #2 is much, much faster. Do you agree?
>>
>> As I see it, Approach #2 is simply a memory addressing task (which
computers do very well) to obtain <west/> and then output the symbols at that
memory address. Approach #1, on the other hand, requires the use of software
string libraries, which, I imagine, results in hundreds or thousands of
machine instructions. In fact, I would imagine that Approach #2 is hundreds or
thousands of times faster than Approach #1. Do you agree?
>>
>> /Roger
>>
>
>
>
> --
> Cheers,
> Dimitre Novatchev
> ---------------------------------------
> Truly great madness cannot be achieved without significant intelligence.
> ---------------------------------------
> To invent, you need a good imagination and a pile of junk
> -------------------------------------
> Never fight an inanimate object
> -------------------------------------
> To avoid situations in which you might make mistakes may be the
> biggest mistake of all
> ------------------------------------
> Quality means doing it right when no one is looking.
> -------------------------------------
> You've achieved success in your field when you don't know whether what
> you're doing is work or play
> -------------------------------------
> To achieve the impossible dream, try going to sleep.
> -------------------------------------
> Facts do not cease to exist because they are ignored.
> -------------------------------------
> Typing monkeys will write all Shakespeare's works in 200yrs.Will they
> write all patents, too? :)
> -------------------------------------
> Sanity is madness put to good use.
> -------------------------------------
> I finally figured out the only reason to be alive is to enjoy it.
>



--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.