[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How the other half live

Subject: Re: How the other half live
From: "Dimitre Novatchev" <dnovatchev@xxxxxxxxx>
Date: Tue, 18 Nov 2008 09:28:52 -0800
Re:  How the other half live
> It's actually O(n*m) where n is the number of values and m the number of
> distinct values. So it's O(n^2) in any case where the number of distinct
> values is proportional to the size of the population, which means in effect
> in any "open-ended" population.


Probably this could be a challenge to Saxon:

When dealing with the following expression:

   $vSeq[index-of($vSeq,.)[$i]]

it should be possible for a good optimizer to treat it in the following way:

 1. If $i  >  count($vSeq)    ==>   Do nothing.   Otherwise:

 2. Deduce the fact that the indices of all items will be needed.

 3. Decide that from the above the indices of the distinct items will suffice

 4. Decide that the needed indices can be built within a single pass
over the sequence. Do not index any non-first occurence of an item.

 5. Decide that the distinct items together with their indices can be
found while doing the same single pass over the sequence.

 6. While finding each distinct item, record the number of its
occurences (the length of its index-of($vSeq, .)   ). In this process
record each distinct item that has more than $i occurences. Stop
producing the index after the $ith occurence.

 7. Generate code to produce every item, recorded in step 6.


It seems to me that the time complexity of this algorithm is probably
less than N*Log(N) (as no sorting is needed and hashing techniques can
be used).

Also, this algorithm can be used well in streaming mode.


-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play



On Tue, Nov 18, 2008 at 7:58 AM, Michael Kay <mike@xxxxxxxxxxxx> wrote:
>> > They are both O(n^2).
>>
>> Only in the worst case though isn't it, which is a list of
>> unique values?
>
> It's actually O(n*m) where n is the number of values and m the number of
> distinct values. So it's O(n^2) in any case where the number of distinct
> values is proportional to the size of the population, which means in effect
> in any "open-ended" population.
>
> Michael Kay
> http://www.saxonica.com/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.