[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Locating arbitrary duplicate structure

Subject: Locating arbitrary duplicate structure
From: "Daniel Bowen" <dbowen2@xxxxxxxxx>
Date: Wed, 10 Jan 2001 12:28:50 -0700
detecting duplicates xsl
(Thanks for the replies on the ends-with, and variables/parameters in match
of xsl:key)

Here's another issue that I'm facing.  I'm hoping for some input on possible
approaches I can take.


Let's say I have XML with a well defined schema, but arbitrary hierarchy.
Using XSLT, I need to identify branches that are identical, or that differ
by a set number of attributes. I don't mind if the approach depends on
extension script.

As a simple example (although it doesn't necessarily demonstrate the
arbitrary hierarchy), let's say I have the XML:

 <LinearFeatureModel name="Light Poles">
  <Composite name="Composite">
   <OffsetPath name="Offset to the Left" offset="-3.6">
    <RegPopLinear name="Regularly place the poles" spacing="6">
     <Point name="Pole" relative="1" model="pole.flt" />
    </RegPopLinear>
   </OffsetPath>
   <OffsetPath name="Offset to the Right" offset="3.6">
    <RegPopLinear name="Regularly place the poles" spacing="6">
     <Point name="Pole" relative="1" model="pole.flt" />
    </RegPopLinear>
   </OffsetPath>
  </Composite>
 </LinearFeatureModel>

The branch (we'll call it 'branch 1')
      <Point name="Pier Pole" relative="1" model="pierpole.flt" />
is found twice.

However, 'branch 2':
     <RegPopLinear name="Regularly place the poles" spacing="6">
      <Point name="Pier Pole" relative="1" model="pierpole.flt" />
     </RegPopLinear>
is also found exactly twice, and includes branch 1.

The branch starting with the "OffsetPath" nodes is very similar in both
cases, but differs by both the "name" attribute and the "offset" attribute.
I'll call the first OffsetPath branch 3, and the second branch 4.

I'd like to be able to detect:
* branch 2 is repeated twice
* branch 1 is a sub-part of branch 2
* that branch 3 and 4 are similar, and differ by 2 attributes or attribute
values.


There is already a first cut of a solution in place that I'm trying to
replace (done by someone else :-) ).  It only recognizes branches that are
exact duplicates, and does not recognize if a sub branch is in a higher
branch that includes it (where the higher branch is also identical in all
cases).  It is also extremely inefficient (its at least O(n^2) if not
worse).  The approach essentially has a nested loop, and compares the entire
XML stringized representation of each node (with all its descendants) with
every other node (and their descendants).

What are some other approaches that I could take?  Thanks!

-Daniel


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.