[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: tokenizing and counting with xsl:analyze-string

Subject: Re: tokenizing and counting with xsl:analyze-string
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Sat, 17 Oct 2020 09:39:05 -0000
Re:  tokenizing and counting with xsl:analyze-string
If you're really keen to avoid putting temporary results in memory, then with
Saxon, I think you can do:

     <xsl:variable name="temp_result" as="xs:boolean*">
          <xsl:analyze-string
select="'abhello1cdehello2fghijklhello3hello4mhello5nhello6'"
                                         regex="hello[1-9]">
             <xsl:matching-substring>
                <xsl:sequence select="true()"/>
             </xsl:matching-substring>
             <xsl:non-matching-substring>
	        <xsl:sequence select="false0"/>
             </xsl:non-matching-substring>
          </xsl:analyze-string>
      </xsl:variable>
      <xsl:iterate select="$temp_result">
          <xsl:param name="m" select="0" as="xs:integer"/>
          <xsl:param name="n" select="0" as="xs:integer"/>
          <xsl:on-completion>
             <result>
                 <yes count="{$m}"/>
                 <no count="{$n}"/>
             </result>
         </xsl:on-completion>
         <xsl:next-iteration>
             <xsl:with-param name="m" select="$m + xs:integer(.)"/>
             <xsl:with-param name="n" select="$n + xs:integer(not(.))"/>
         </xsl:next-iteration>
   </xsl:iterate>

This relies on the fact that Saxon will always try to inline a variable that's
only referenced once; and if the variable is a sequence, this means that the
value will be pipelined ratehr than being materialized in memory. For a
sequence containing a few dozen booleans, that's not going to give any
bottom-line savings. But if the sequence contains millions of items, it
might.

The `xsl:iterate` could also be replaced with a fold:

<xsl:variable name="counts" select="fold-left($temp_result,
                                                                           ma
p{true():0, false():0},
                                                                           fu
nction($val, $next){map:put($val, $next, $val($next)+1)})"
                      as="map(xs:boolean, xs:integer)"/>
<result>
    <yes count="{$counts(true())}"/>
    <no count="{$counts(false())}"/>
</result>

> On 17 Oct 2020, at 10:14, Michael Kay mike@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> You can construct a sequence of booleans, in which case you should use
<xsl:sequence select="true()"/> in place of <xsl:value-of select="1"/>, and
then you can use `count($temp_result[.])` and `count($temp_result[not(.)]` to
count the number of true and false items respectively.
>
> If you want to construct the variable as a single string, you can use
xsl:value-of as I suggested, but then you must declare the variable
as="xs:string". But using a sequence of booleans is probably better.
>
> Michael Kay
> Saxonica
>
>
>
>> On 17 Oct 2020, at 10:04, Mukul Gandhi gandhi.mukul@xxxxxxxxx
<mailto:gandhi.mukul@xxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>>
>> On Sat, Oct 17, 2020 at 1:22 PM Michael Kay mike@xxxxxxxxxxxx
<mailto:mike@xxxxxxxxxxxx> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>> With xsl:analyse-string you would still need a variable, but it could be a
simpler variable: for example it might just contain a "1" for a match, and a
"0" for a non-match; at the end you then need to count the ones and zeros
which you can do with string-length(translate(...)).
>>
>> With your suggestion, below mentioned is my new XSLT stylesheet,
>>
>> <xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform
<http://www.w3.org/1999/XSL/Transform>"
>>
xmlns:xs="http://www.w3.org/2001/XMLSchema
<http://www.w3.org/2001/XMLSchema>"
>>
exclude-result-prefixes="xs">
>>
>>    <xsl:output method="xml" indent="yes"/>
>>
>>    <xsl:template match="/">
>>       <xsl:variable name="temp_result" as="xs:boolean*">
>>           <xsl:analyze-string
select="'abhello1cdehello2fghijklhello3hello4mhello5nhello6'"
>>                                          regex="hello[1-9]">
>>              <xsl:matching-substring>
>>                 <xsl:value-of select="1"/>
>>              </xsl:matching-substring>
>>              <xsl:non-matching-substring>
>> 	        <xsl:value-of select="0"/>
>>              </xsl:non-matching-substring>
>>           </xsl:analyze-string>
>>       </xsl:variable>
>>       <result>
>>          <yes count="{count(index-of($temp_result, true()))}"/>
>>          <no count="{count(index-of($temp_result, false()))}"/>
>>       </result>
>>    </xsl:template>
>>
>> </xsl:stylesheet>
>>
>> The above stylesheet gives me the desired result.
>>
>> But the above mentioned XSLT stylesheet, doesn't do exactly what you've
suggested.
>>
>> I would preferably, wish to declare my XSLT variable as follows,
>>
>> <xsl:variable name="temp_result" as="xs:string">
>>     <xsl:analyze-string ...
>> </xsl:variable>
>>
>> with an expectation that, content of this new kind of variable would be a
string (i.e, an atomic xs:string value) of 1 s & 0 s characters, on which I
could do string-length(translate(...)). Is this doable?
>>
>>
>>
>> --
>> Regards,
>> Mukul Gandhi
>> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
>> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <applewebdata://8452EB5E-55B9-494F-A5B8-B9C3F798A4B0>)
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
email <>)

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.