[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Muenchian method on nodes with two or more items f

Subject: Re: Muenchian method on nodes with two or more items for indexing
From: larry_hayashi@xxxxxxxxxxx
Date: Fri, 20 Sep 2002 11:56:59 -0600
ll2xml
Sorry about mismatch between input and output. I over simplified. Here is a
file that I actually ran through the XSL with its output and what I would
LIKE to see. As for my source document, it has approximately 6000 entries
with another 6000 index items.

Thanks for any help!

Larry


Input:

<?xml-stylesheet type="text/xsl"
href="C:\LL2XML\TransXML2HTML\xml2ReverseIndex2.xsl"?>
<LexicalDatabase>
  <minor>
    <base>'wah 'nabuuysk</base>
    <sense num=" 1">
      <index enc="ENG">unexpected</index>
    </sense>
  </minor>
  <minor>
    <base>'wah wil&#226;ontk</base>
  </minor>
  <major>
    <base>'w&#224;hamaniits'&#224;</base>
    <sense num=" 1">
      <pos>v</pos>
      <def enc="ENG">careless</def>
      <index enc="ENG">careless</index>
    </sense>
  </major>
  <major>
    <base>xbimooksk</base>
    <sense num=" 1">
      <pos>n</pos>
      <def enc="ENG">half-white </def>
      <index enc="ENG">metis</index>
      <index enc="ENG">half-white</index>
      <sense num="1.1">
        <pos>n</pos>
        <def enc="ENG">test</def>
        <index enc="ENG">test</index>
      </sense>
    </sense>
  </major>
  <major>
    <base>xbismsg&#232;&#232;</base>
    <sense num=" 1">
      <pos>v</pos>
      <index enc="ENG">bow your head</index>
      <index enc="ENG">bend down</index>
    </sense>
  </major>
</LexicalDatabase>


XSL:

<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://icl.com/saxon">
  <xsl:output method="xml" encoding="ISO-8859-1"/>
  <xsl:key name="BaseForm" match="LexicalDatabase/*"
use="concat(base,baseHom)"/>
  <xsl:key name="entries-by-index" match="//LexicalDatabase/*"
use=".//index"/>
  <xsl:template match="/">
    <ReverseEntries>
      <xsl:apply-templates/>
    </ReverseEntries>
  </xsl:template>
  <xsl:template match="LexicalDatabase">
    <xsl:for-each
select="//LexicalDatabase/*[generate-id(.)=generate-id(key('entries-by-index
',.//index))]">
      <xsl:sort select=".//index" order="ascending" />
      <IndexItem>
        <xsl:attribute name="value"><xsl:value-of
select=".//index"/></xsl:attribute>
        <xsl:for-each select="key('entries-by-index', .//index)">
          <!--xsl:sort select="base"/ Should be presorted coming from
LinguaLinks to account for multigraphs-->
          <entry>
            <xsl:attribute name="base"><xsl:value-of
select="base"/></xsl:attribute>
          </entry>
        </xsl:for-each>
      </IndexItem>
    </xsl:for-each>
  </xsl:template>
  <xsl:template match="text()"/>
</xsl:stylesheet>

Actual Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon">
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>
</ReverseEntries>

Desired Output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<ReverseEntries xmlns:saxon="http://icl.com/saxon">
  <IndexItem value="bend down">  <-- This is missing above.
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="bow your head">
    <entry base="xbismsgèè"/>
  </IndexItem>
  <IndexItem value="careless">
    <entry base="'wàhamaniits'à"/>
  </IndexItem>
  <IndexItem value="half-white">  <-- This is missing above.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="metis">
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="test"> <-- This is missing above. Comes from sense
within another sense.
    <entry base="xbimooksk"/>
  </IndexItem>
  <IndexItem value="unexpected">
    <entry base="'wah 'nabuuysk"/>
  </IndexItem>

</ReverseEntries>


----- Original Message -----
From: <Jarno.Elovirta@xxxxxxxxx>
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, September 20, 2002 12:20 AM
Subject: RE:  Muenchian method on nodes with two or more items for
indexing


> Hi,
>
> > I just tried using an axes method with this problem and it
> > took more than 15
> > minutes to crunch through on a 2 GHZ Pentium with lots of
> > RAM. I need to
>
> How big was your source document?
>
> > I have data of the following sort. You will note that minor or major
> > elements and their senses can have one or more index elements.
> >
> > <LexicalDatabase>
> > <minor>
> > <base>'wah 'nabuuysk</base>
> > <sense num=" 1">
> > <index enc="ENG">unexpected</index>
> > </sense>
> > </minor>
> > <minor>
> > <base>'wah wil&#226;ontk</base>
> > </minor>
> > <major>
> > <base>'w&#224;hamaniits'&#224;</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <def enc="ENG">careless</def>
> > <index enc="ENG">careless</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbimooksk</base>
> > <sense num=" 1">
> > <pos>n</pos>
> > <def enc="ENG">half-white </def>
> > <index enc="ENG">metis</index>
> > <index enc="ENG">half-white</index>
> > </sense>
> > </major>
> > <major>
> > <base>xbismsg&#232;&#232;</base>
> > <sense num=" 1">
> > <pos>v</pos>
> > <index enc="ENG">bow your head</index>
> > <index enc="ENG">bend down</index>
> > </sense>
> > </major>
> > </LexicalDatabase>
> >
> > What I would like to do is get output a file that has index elements
> > containing their major or minor entries. It is similar to
> > grouping by last
> > name or city except that each person could have one, two or
> > more of these.
> > Perhaps "Schools attended" would be a good example. Anyhow,
> > here is a sample
> > of what I would like to output.
> >
> > <IndexList>
> > <IndexItem value="metis">
> > <entry base="xbimooksk" baseHom="" />
> > </IndexItem>
> > <IndexItem value="microwave">
> > <entry base="âànuut" baseHom="2"/>
> > </IndexItem>
> > <IndexItem value="midday">
> > <entry base="nsèèlga sah" baseHom=""/>
> > <entry base="sèèlgyàxsk" baseHom=""/>
> > </IndexItem>
> > <IndexItem value="middle (in the _)">
> > <entry base="lusèèlk" baseHom=""/>
> > <entry base="xts'a" baseHom=""/>
> > </IndexItem>
> > </IndexList>
>
> Your source and desired output don't match (e.g. no "microwave" in
source), so it's bit hard to see how it should work.
>
> <xsl:key name="entries-by-index" match="index" use="."/>
>
> <xsl:template match="LexicalDatabase">
>   <IndexList>
>     <xsl:for-each select="*/sense/index[generate-id() =
generate-id(key('entries-by-index', .)]">
>       <xsl:sort select="." data-type="text"/>
>       <IndexItem value="{.}">
>         <xsl:for-each select="key('entries-by-index', .)/../../base">
>           <entry base="{.}" baseHom=""/>
>         </xsl:for-each>
>       </IndexItem>
>     </xsl:for-each>
>   </IndexList>
> </xsl:template>
>
> Will get you somewhere, but I didn't understand where the value of baseHom
comes from.
>
> J - Wumpscut: Deliverance (Alternative Club Mix)
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.