[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: faster complicated counting

Subject: Re: faster complicated counting
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Thu, 1 Mar 2012 10:02:13 +0100
Re:  faster complicated counting
Can't you run a three-level for-each so that you can compute all three
numbers in one go?
-W

2012/3/1 Emmanuel Bigui <medusis@xxxxxxxxx>
>
> One way is to compute the respective position in variables, and then
> look them up with keys, so that each position is only computed once.
>
> For example, for the global position, you can add to the root of the
> stylesheet:
>
> <xsl:key name="l" match="l" use="@id"/>
>
> <xsl:variable name="global">
>        <xsl:for-each select="//l">
>                <l pos="{position()}" id="{generate-id(.)}"/>
>                </xsl:for-each>
>        </xsl:variable>
>
> and then, in each l element, look up the value of wwp:num-global like
> this:
>
> <xsl:attribute name="wwp:num-global" select="key('l', generate-id(.),
> $global)/@pos"/>
>
> Regards,
> EB
>
> 2012/2/29 Syd Bauman <Syd_Bauman@xxxxxxxxx>:
> > I am working with a relatively small dataset (~ 1 MiB) which uses a
> > TEI encoding. In TEI, a line of verse is encoded with an <l> element
> > (of which I have just about 306,000), which are grouped into groups
> > (like poems or stanzas) using <lg> (for "line group").
> >
> > In the output of the particular process I am working on now, I'd like
> > to adorn each <l> element with three new attributes that indicate the
> > count of the current <l> element in various contexts:
> >  wwp:num-global   = with respect to the entire document
> >  wwp:num-local    = with respect to the current stanza or other
> >                     small unit of poetry
> >  wwp:num-regional = with respect to the current poem or other
> >                     large unit of poetry
> >
> > So, as a toy example, see tiny.in.xml and tiny.out.xml, below.
> >
> > I have worked out code that gets me the desired counts. My problem is
> > that all the tree-walking it does slows down my process by well over
> > an order of magnitude. I am betting there is a much better way to do
> > this, probably using keys or <xsl:number>, but have not been able to
> > wrap my mind around it.
> >
> > The English-like pseudo-code for @num-local is "the count in the
> > context of the closest ancestor <lg> that itself has > 4 metrical
> > lines".
> >
> > The English-like pseudo-code for @num-regional is "the count in the
> > context of the closest ancestor <lg> that has a @type that contains
> > "poem" or whose first descendant <l> has n='1'".
> >
> > Here's what I have (note that we are only counting those <l> elements
> > that have an @part of 'I' or do not have a @part attribute at all):
> >
> >  <xsl:attribute name="wwp:num-global">
> >    <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
> >  </xsl:attribute>
> >  <xsl:attribute name="wwp:num-regional">
> >    <xsl:variable name="region"
> >     select="(ancestor::lg[contains( @type,'poem') ]|ancestor::lg[
> > descendant::l[ @n eq '1'] ])[last()]"/>
> >    <xsl:value-of
> >
> >
select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg
/generate-id()
> > = $region/generate-id() ] ) +1"/>
> >  </xsl:attribute>
> >  <xsl:attribute name="wwp:num-local">
> >    <xsl:variable name="region"
> >     select="ancestor::lg[count( descendant::l[not(@part) or @part='I'] )
> > > 4 ][1]"/>
> >    <xsl:value-of
> >
> >
select="count((preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg
/generate-id()
> > = $region/generate-id() ] ) +1"/>
> >  </xsl:attribute>
> >
> > Thoughts appreciated.
> >
> > Notes
> > -----
> > * Yes, I realize that the test above is for *any* descendant <l> with
> >  n='1', not the first. We simply don't have any that aren't the
> >  first, so I didn't worry about it.
> >
> > * It's pretty likely we'll change the definition of what is
> >  "regional" in the near future, but it probably won't affect the
> >  basic problem I'm having. I.e., I'm hoping that if someone shows me
> >  how to do this "regional" better, I'll be able to do any future
> >  version on my own. Cross your fingers :-)
> >
> >
> > toy input
> > --- -----
> > <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> > <TEI xmlns="http://www.tei-c.org/ns/1.0"
> >     xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0">
> >  <teiHeader>
> >    <!-- blah, blah, blah -->
> >  </teiHeader>
> >  <text>
> >    <body>
> >      <lg type="superStructure">
> >        <lg type="poem.duck">
> >          <l>one</l>
> >          <l>two</l>
> >          <l>three</l>
> >          <l>four</l>
> >          <l>five</l>
> >          <l>six</l>
> >          <l>seven</l>
> >          <l>eight</l>
> >          <l>nine</l>
> >          <l>ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <l>one</l>
> >          <l>two</l>
> >          <l>three</l>
> >          <l>four</l>
> >          <lg type="tercet">
> >            <l>five</l>
> >            <l>six</l>
> >            <l>seven</l>
> >          </lg>
> >          <l>eight</l>
> >          <l>nine</l>
> >          <l>ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <lg type="stanza">
> >            <l>one</l>
> >            <l>two</l>
> >            <l>three</l>
> >            <l>four</l>
> >            <l>five</l>
> >            <l>six</l>
> >            <l>seven</l>
> >            <l>eight</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l>nine</l>
> >            <l>ten</l>
> >            <l>eleven</l>
> >            <l>twelve</l>
> >            <l>thirteen</l>
> >            <l>fourteen</l>
> >            <l>fifteen</l>
> >            <l>sixteen</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l>seventeen</l>
> >            <l>eighteen</l>
> >            <l>nineteen</l>
> >            <l>twenty</l>
> >            <l>twentyone</l>
> >            <l>twentytwo</l>
> >            <l>twentythree</l>
> >            <l>twentyfour</l>
> >          </lg>
> >        </lg>
> >      </lg>
> >    </body>
> >  </text>
> > </TEI>
> >
> > toy code
> > --- ----
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> >  xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0"
> > xmlns="http://www.tei-c.org/ns/1.0"
> >  xpath-default-namespace="http://www.tei-c.org/ns/1.0" version="2.0">
> >
> >  <xsl:template match="/">
> >    <xsl:text>&#x0A;</xsl:text>
> >    <xsl:apply-templates/>
> >  </xsl:template>
> >  <xsl:template match="@*|text()|processing-instruction()|comment()">
> >    <xsl:copy/>
> >  </xsl:template>
> >  <xsl:template match="*">
> >    <xsl:copy>
> >      <xsl:apply-templates select="@*|node()"/>
> >    </xsl:copy>
> >  </xsl:template>
> >
> >  <xsl:template match="l">
> >    <xsl:copy>
> >      <xsl:attribute name="wwp:num-global">
> >        <xsl:number count="l[not(@part)]|l[@part='I']" level="any"/>
> >      </xsl:attribute>
> >      <xsl:attribute name="wwp:num-regional">
> >        <xsl:variable name="region"
> >          select="(ancestor::lg[ contains( @type,'poem') ]|ancestor::lg[
> > descendant::l[ @n eq '1'] ])[last()]"/>
> >        <xsl:value-of
> >          select="count(
> >
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
> > = $region/generate-id() ] ) +1"
> >        />
> >      </xsl:attribute>
> >      <xsl:attribute name="wwp:num-local">
> >        <xsl:variable name="region"
> >          select="ancestor::lg[count( descendant::l[not(@part) or
> > @part='I'] ) > 4 ][1]"/>
> >        <xsl:value-of
> >          select="count(
> >
(preceding::l[not(@part)]|preceding::l[@part='I'])[ancestor::lg/generate-id()
> > = $region/generate-id() ] ) +1"
> >        />
> >      </xsl:attribute>
> >      <xsl:apply-templates select="@*|node()"/>
> >    </xsl:copy>
> >  </xsl:template>
> >
> > </xsl:stylesheet>
> >
> > toy output
> > --- ------
> > <?xml version="1.0" encoding="UTF-8"?>
> > <TEI xmlns="http://www.tei-c.org/ns/1.0"
> > xmlns:wwp="http://www.wwp.brown.edu/ns/textbase/storage/1.0">
> >  <teiHeader>
> >    <!-- blah, blah, blah -->
> >  </teiHeader>
> >  <text>
> >    <body>
> >      <lg type="superStructure">
> >        <lg type="poem.duck">
> >          <l wwp:num-global="1" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >          <l wwp:num-global="2" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >          <l wwp:num-global="3" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >          <l wwp:num-global="4" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >          <l wwp:num-global="5" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >          <l wwp:num-global="6" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >          <l wwp:num-global="7" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >          <l wwp:num-global="8" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          <l wwp:num-global="9" wwp:num-regional="9"
> > wwp:num-local="9">nine</l>
> >          <l wwp:num-global="10" wwp:num-regional="10"
> > wwp:num-local="10">ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <l wwp:num-global="11" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >          <l wwp:num-global="12" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >          <l wwp:num-global="13" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >          <l wwp:num-global="14" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >          <lg type="tercet">
> >            <l wwp:num-global="15" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >            <l wwp:num-global="16" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >            <l wwp:num-global="17" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >          </lg>
> >          <l wwp:num-global="18" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          <l wwp:num-global="19" wwp:num-regional="9"
> > wwp:num-local="9">nine</l>
> >          <l wwp:num-global="20" wwp:num-regional="10"
> > wwp:num-local="10">ten</l>
> >        </lg>
> >        <lg type="poem.duck">
> >          <lg type="stanza">
> >            <l wwp:num-global="21" wwp:num-regional="1"
> > wwp:num-local="1">one</l>
> >            <l wwp:num-global="22" wwp:num-regional="2"
> > wwp:num-local="2">two</l>
> >            <l wwp:num-global="23" wwp:num-regional="3"
> > wwp:num-local="3">three</l>
> >            <l wwp:num-global="24" wwp:num-regional="4"
> > wwp:num-local="4">four</l>
> >            <l wwp:num-global="25" wwp:num-regional="5"
> > wwp:num-local="5">five</l>
> >            <l wwp:num-global="26" wwp:num-regional="6"
> > wwp:num-local="6">six</l>
> >            <l wwp:num-global="27" wwp:num-regional="7"
> > wwp:num-local="7">seven</l>
> >            <l wwp:num-global="28" wwp:num-regional="8"
> > wwp:num-local="8">eight</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l wwp:num-global="29" wwp:num-regional="9"
> > wwp:num-local="1">nine</l>
> >            <l wwp:num-global="30" wwp:num-regional="10"
> > wwp:num-local="2">ten</l>
> >            <l wwp:num-global="31" wwp:num-regional="11"
> > wwp:num-local="3">eleven</l>
> >            <l wwp:num-global="32" wwp:num-regional="12"
> > wwp:num-local="4">twelve</l>
> >            <l wwp:num-global="33" wwp:num-regional="13"
> > wwp:num-local="5">thirteen</l>
> >            <l wwp:num-global="34" wwp:num-regional="14"
> > wwp:num-local="6">fourteen</l>
> >            <l wwp:num-global="35" wwp:num-regional="15"
> > wwp:num-local="7">fifteen</l>
> >            <l wwp:num-global="36" wwp:num-regional="16"
> > wwp:num-local="8">sixteen</l>
> >          </lg>
> >          <lg type="stanza">
> >            <l wwp:num-global="37" wwp:num-regional="17"
> > wwp:num-local="1">seventeen</l>
> >            <l wwp:num-global="38" wwp:num-regional="18"
> > wwp:num-local="2">eighteen</l>
> >            <l wwp:num-global="39" wwp:num-regional="19"
> > wwp:num-local="3">nineteen</l>
> >            <l wwp:num-global="40" wwp:num-regional="20"
> > wwp:num-local="4">twenty</l>
> >            <l wwp:num-global="41" wwp:num-regional="21"
> > wwp:num-local="5">twentyone</l>
> >            <l wwp:num-global="42" wwp:num-regional="22"
> > wwp:num-local="6">twentytwo</l>
> >            <l wwp:num-global="43" wwp:num-regional="23"
> > wwp:num-local="7">twentythree</l>
> >            <l wwp:num-global="44" wwp:num-regional="24"
> > wwp:num-local="8">twentyfour</l>
> >          </lg>
> >        </lg>
> >      </lg>
> >    </body>
> >  </text>
> > </TEI>

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.