[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Building a Trie
Saxon (-PE and higher) has extensions that might help. First, on xsl:key you can declare saxon:range-key="yes" (which for some reason is not mentioned at https://www.saxonica.com/documentation12/index.html#!extensions/attributes but only at https://www.saxonica.com/documentation12/index.html#!functions/saxon/key-map ; the documentation also fails to mention that it doesn't work on SaxonCS.) You can then view any part of the key as a sorted map using saxon:key-map(), and within that sorted map, keys() delivers the keys in sorted order. It's a bit convoluted, but to find words starting with "PQR" you could define a key-map with a range from "PQR" to "PQR"; and then use keys() on that map. (Note that key-map doesn't build a new index, it just creates a view of a subset of the existing index). Having got the data structure, I'm sure I could find a better way of exploiting it.... Meanwhile I'm going to add an issue on the qt4 list to add sorted maps. Michael Kay Saxonica > On 22 Jun 2023, at 07:29, Michael Mueller-Hillebrand michael.mueller-hillebrand@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > > Dear community, > > Now I learned about Tries https://en.wikipedia.org/wiki/Trie <https://en.wikipedia.org/wiki/Trie> which is possibly a helpful method to tackle the left-over problem with my DemoJam contribution at Markup UK (thanks a lot for the applause!). > > Without spending too much time at the details, but at the heart of the performance problem was this function: > > <xsl:function name="my:wordMatches" as="xs:string*"> > <xsl:param name="start" as="xs:string"/> > <xsl:sequence select="$words[starts-with(., $start)]"/> > </xsl:function> > > With $words containing 25,000 entries, the performance was sufficient, but with 160,000 entries it is just bad. > > I already improved the performance by building a map, where the keys are the first two letters of each word, and the value are all matching words. > > <xsl:function name="my:wordMatches" as="xs:string*"> > <xsl:param name="start" as="xs:string"/> > <xsl:variable name="key" select="substring($start, 1, 2)" as="xs:string"/> > <xsl:variable name="words" select="map:get($wordsMap, $key)" as="xs:string*"/> > <xsl:sequence select="$words[starts-with(., $start)]"/> > </xsl:function> > > Building that map takes some noticeable time, but then performance is great. But only because I reduced the length of $words by factor 200b&300. > > Before In look deeper myself, has anyone in the community already created a Trie-like structure (map of maps?) or can share suggestions how to tackle this in XSLT/XPath? (Because those are the hammers I plan to use for this challenge/nail.) > > Thanks, > - Michael > > Michael MC<ller-Hillebrand > Senior Consultant > Phone +49 951-20859-752 > Mobil +49 172-819 34 13 > michael.mueller-hillebrand@xxxxxxxxx <mailto:michael.mueller-hillebrand@xxxxxxxxx> > www.docufy.de <https://www.docufy.de/> | DOCUFY@LinkedIN <https://www.linkedin.com/company/3845358/> > Datenschutz <https://www.docufy.de/datenschutz/> > DOCUFY GmbH | KirschC$ckerstr. 27 | 96052 Bamberg | Deutschland > CEO: Nadine Prill | Amtsgericht Bamberg HRB10571 > > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list> > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by email <>)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|