Subject: Re: finding a minimum set of references
From: Wolfgang Laun <wolfgang.laun@xxxxxxxxx>
Date: Wed, 9 Jan 2013 09:02:54 +0100
|
If I've understood the problem correctly...
You can use a key to map a location (@area,@cite) to a sequence of
element(tree). Iterating over the tree elements, do not copy a loc
unless it is first in the sequence returned by that key: This
guarantees that all locs are covered. It may happen that this
eliminates all locs from a tree: if this must not happen, copy an
arbitrary loc.
(If the pass over all tree elements is written as a recursive
function, you may pass down a set of loc's that has been used, and
select one from this set if it can be used to avoid the empty set
within a tree.)
This may not produce the smalles set, but it should at least deal with
all necessities.
-W
On 09/01/2013, Graydon <graydon@xxxxxxxxx> wrote:
> So I have a lot (~600) tree elements that look like:
>
> <tree parent="attribution">
> <count count="1"/>
> <locations>
> <loc area="loiprop" cite="2010-07-prop10"/>
> </locations>
> </tree>
> <tree parent="block">
> <count count="10"/>
> <locations>
> <loc area="loiprop" cite="2004-09-bud2004"/>
> <loc area="loiprop" cite="2010-07-prop10"/>
> <loc area="loiprop" cite="2010-08-allprop"/>
> </locations>
> </tree>
> <tree parent="block">
> <key>section</key>
> <count count="6689"/>
> <locations>
> <loc area="CRCc945-en" cite="2606"/>
> <loc area="CRCc945-en" cite="402"/>
> <loc area="CRCc945-en" cite="6804"/>
> <loc area="CRCc945-en" cite="8301"/>
> <loc area="CRCc945-en" cite="8308.1"/>
> ....
> </locations>
> <child>section</child>
> </tree>
>
> The @area,@cite pairs represent documents with the locations of particular
> patterns (via the @parent and the child elements of the tree elements) of
> documentation content, one tree element per pattern. (So there are lots of
> tree/@parent="block" elements, etc.)
>
> What's wanted is the minimum number of documents that contain _all_ the
> patterns, for output testing purposes.
>
> So I need to produce the, or at least _a_, shortest list of <loc/> elements
> so that every (all ~600) tree element contains at least one loc element on
> the list.
>
> This has turned out to be much less straightforward than I thought. I can't
> shake the feeling that there's a grouping solution, but I'm not seeing it if
> there is. So I'd appreciate any algorithm hints anybody has got;
> unfortunately, efficiency is a concern.
>
> Demonstration that this is really an NP-complete problem also grateful
> accepted!
>
> Thanks!
> Graydon
|