[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Building a Trie

Subject: Re: Building a Trie
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 22 Jun 2023 07:44:41 -0000
Re:  Building a Trie
Saxon (-PE and higher) has extensions that might help.

First, on xsl:key you can declare saxon:range-key="yes" (which for some reason
is not mentioned at
https://www.saxonica.com/documentation12/index.html#!extensions/attributes but
only at
https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map ;
the documentation also fails to mention that it doesn't work on SaxonCS.)

You can then view any part of the key as a sorted map using saxon:key-map(),
and within that sorted map, keys() delivers the keys in sorted order.

It's a bit convoluted, but to find words starting with "PQR" you could define
a key-map with a range from "PQR" to "PQR&#x10FFFF;"; and then use keys() on
that map. (Note that key-map doesn't build a new index, it just creates a view
of a subset of the existing index).

Having got the data structure, I'm sure I could find a better way of
exploiting it....

Meanwhile I'm going to add an issue on the qt4 list to add sorted maps.

Michael Kay
Saxonica

> On 22 Jun 2023, at 07:29, Michael Mueller-Hillebrand
michael.mueller-hillebrand@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
>
> Dear community,
>
> Now I learned about Tries https://en.wikipedia.org/wiki/Trie
<https://en.wikipedia.org/wiki/Trie> which is possibly a helpful method to
tackle the left-over problem with my DemoJam contribution at Markup UK (thanks
a lot for the applause!).
>
> Without spending too much time at the details, but at the heart of the
performance problem was this function:
>
> <xsl:function name="my:wordMatches" as="xs:string*">
>   <xsl:param name="start" as="xs:string"/>
>   <xsl:sequence select="$words[starts-with(., $start)]"/>
> </xsl:function>
>
> With $words containing 25,000 entries, the performance was sufficient, but
with 160,000 entries it is just bad.
>
> I already improved the performance by building a map, where the keys are the
first two letters of each word, and the value are all matching words.
>
> <xsl:function name="my:wordMatches" as="xs:string*">
>   <xsl:param name="start" as="xs:string"/>
>   <xsl:variable name="key" select="substring($start, 1, 2)"
as="xs:string"/>
>   <xsl:variable name="words" select="map:get($wordsMap, $key)"
as="xs:string*"/>
>   <xsl:sequence select="$words[starts-with(., $start)]"/>
> </xsl:function>
>
> Building that map takes some noticeable time, but then performance is great.
But only because I reduced the length of $words by factor 200b&300.
>
> Before In look deeper myself, has anyone in the community already created a
Trie-like structure (map of maps?) or can share suggestions how to tackle this
in XSLT/XPath? (Because those are the hammers I plan to use for this
challenge/nail.)
>
> Thanks,
> - Michael
>
> Michael MC<ller-Hillebrand
> Senior Consultant
> Phone +49 951-20859-752
> Mobil +49 172-819 34 13
> michael.mueller-hillebrand@xxxxxxxxx
<mailto:michael.mueller-hillebrand@xxxxxxxxx>
> www.docufy.de <https://www.docufy.de/> | DOCUFY@LinkedIN
<https://www.linkedin.com/company/3845358/>
> Datenschutz <https://www.docufy.de/datenschutz/>
> DOCUFY GmbH | KirschC$ckerstr. 27 | 96052 Bamberg | Deutschland
> CEO: Nadine Prill | Amtsgericht Bamberg HRB10571
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.