[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Muenchian keys ... plus a bit?

Subject: Re: Muenchian keys ... plus a bit?
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Wed, 22 Aug 2001 11:05:46 -0400
Re:  Muenchian keys ... plus a bit?
Dave, here's what I would try.

1) Create a list of all items and assign it to a variable" all-items".

2) Create a list of all unique items (based on their PCDATA - that is,  all
<item>content</item>
elements get represented by one element in this list.  Assign it to a
variable "unique-items".  This is the "Muenchian" part, of course.

3) Do a for-each on $unique-items.  At each iteration, output that item's
header (e.g., "content"), then find all the item nodes with that name:
<variable name='this-items-name' select='name()'/>
<variable name='these-items' select='$all-items[name()=$this-items-name]'/>

4) Do a for-each over $these-items.  You could sort them, too.  This is
where you output the pages.

Once this is working, you could create some keys if your files are big and
you need some speed-up action.

I didn't try this so some details may need tuning up, but it should work
nicely.

Cheers,

Tom P

[<DPawson@xxxxxxxxxxx>]
> Given
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE idx [
> <!ELEMENT idx (ent+)>
> <!ELEMENT ent (item, pge+)>
> <!ELEMENT item (#PCDATA)>
> <!ELEMENT pge  (#PCDATA)>
> <!ATTLIST pge key (t|f) 'f'>
>
>
> ]>
>
> <idx>
>  <ent>
>   <item>content</item>
>   <pge key="f">98</pge>
>  </ent>
>  <ent>
>   <item>content</item>
>   <pge key="f">108</pge>
>   <pge>110</pge>
>  </ent>
>  <ent>
>   <item>another</item>
>   <pge key="f">100</pge>
>  </ent>
>  <ent>
>   <item>zero</item>
>   <pge key="t">210</pge>
>  </ent>
> </idx>
>
>
> And indexing DTD.
>
> I want to present it as
>
> A  B  C .... Z
> (each hotlinked to the start of that letter).
>
> Then
>
> A  (the anchor)
>
> aardvark, page 1,67,79
>   (say with page 67
> -------------------
> B
>
> bathtub, page 3,5,7
>
> ------------------
>
> Z
>
> zero, page 210
>    (210 in bold, its the main entry)
> etc.
>
> Two pass solution, first sorting, to make data entry easy.
> Being lazy, I don't always remember that I've already made
> an entry for a particular element, so there are duplicates.
> the <item> is duplicated, but the page numbers are not,
> hence the 'remove duplicates' approach of keys only partially works.
> Hence the Muenchian plus (I think :-).
>
> Question, how to remove the duplicate entries without losing
> the page numbers associated with the duplicate?
>
> I found this quite an interesting stylesheet, till I couldn't
> figure out the key definitions/usage, then I was stopped.
>
> I have everything except the 'remove duplicates' bit.
>



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.