[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Techniques for Sorting and Reducing Maps in XSLT 3/XPa

Subject: Techniques for Sorting and Reducing Maps in XSLT 3/XPath 3?
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 5 Jul 2018 20:50:32 -0000
 Techniques for Sorting and Reducing Maps in XSLT 3/XPa
I need to process a set of documents organized into directories where for a given parent directory there may be any number of subdirectories representing multiple versions of the same logical artifact, where the directory name reflects the versions, e.g.:

/A/B/C/en/1.0/foo.xml
/A/B/C/en/1.2/foo.xml
/A/B/C/fr/1.0/foo.xml
/A/B/C/fr/1.2/foo.xml
/A/B/C/fr/1.3/foo.xml
/A/B/D/en/1.0/foo.xml
/A/B/D/en/1.2/foo.xml
/A/B/D/en/1.3/foo.xml
/A/B/D/en/1.4/foo.xml
/A/B/D/fr/1.0/foo.xml
/A/B/D/fr/1.2/foo.xml
/A/B/D/fr/1.3/foo.xml

I need to process only those foo.xml files that are the latest version under a given common ancestor (i.e., the latest version for each language, where the /A/B/C path represents a single course in this case).

I'm doing this entirely within XSLT 3 (rather than using e.g., a bash shell to determine the set of files to process), mostly because I'm tasked with inserting an XSLT transform into an existing system where adding anything other than an XSLT is problematic.

But I think this also serves as a useful exercise in general XSLT/XPath map manipulation, at least as I've initially gone about trying to solve this problem.

Given the list of URLs for all of these foo.xml files I want to reduce it to just /A/B/C/en/1.3/foo.xml, /A/B/C/fr/1.2/foo.xml, /A/B/D/en/1.4/foo.xml, and /A/B/D/fr/1.3/foo.xml

That is, for each locale in each course, get the latest version.

In addition, I want to group the files by the 3rd directory ("C", or "D"), which serves as a "course ID.

Maps seem like an obvious way to do this:

1. Use Saxon's collection() function with the metadata=yes option to get a set of maps, one for each file, that includes the full path to the file (this avoids loading a bunch of files I don't actually want and gives me maps as a starting point).

2. Using these maps, add the version, locale, and 3rd-level directory name as separate entries in each map, creating a more complete set of "descriptor" maps that make it easy to access to relevant fields I care about.

3. Create a new map where the keys are 3rd directory name ("course ID") and the values are the descriptor maps a given course id/locale pair with the highest version.

My question: How best to implement step 3?

Step 2 is simple data processing: pull apart each URL and create the maps.

Step 3 is less obvious because you have to compare entries based on both the course ID and version values.

My initial solution for step 3 is to use xsl:iterate to construct a result map:

    <xsl:variable name="courses-by-id" as="map(xs:string, map(*)*)">
      <xsl:iterate select="$configs-to-use">
        <xsl:param name="result-map" as="map(xs:string, map(*))" select="map{}"/>
        <xsl:on-completion>
          <xsl:sequence select="$result-map"/>
        </xsl:on-completion>
        
        <xsl:variable name="this-version" as="xs:double" select="xs:double(.?version)"/>
        <xsl:variable name="previous-course-entry" as="map(*)?"
          select="map:get($result-map, .?course-id)"
        />
        <xsl:variable name="test-version" as="xs:double"
          select="
            if (exists($previous-course-entry))
            then xs:double($previous-course-entry?version)
            else 0.0
          "
        />
        <xsl:next-iteration>
          <xsl:with-param name="result-map" as="map(xs:string, map(*))"
            select="
                    if ($this-version gt $test-version) 
                    then map:put($result-map, .?course-id, .)
                    else $result-map"
          />
        </xsl:next-iteration>
      </xsl:iterate>

This works (or at least appears to in my initial small tests) but it feels like there ought to be a less verbose way to do this same kind of operation.

What is the better way to do this kind of "find the map entries that meet a specific requirement relative to other members of the map" processing?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.