[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Tree Comparing Algorithm

Subject: Re: Tree Comparing Algorithm
From: "Vasu Chakkera vasucv@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 3 Feb 2020 20:10:04 -0000
Re:  Tree Comparing Algorithm
Thanks both. Martin's solution sort of worked, but it only gave me 21
children, but I had around 21000 nodes in the xml. I am not sure to what
depth the comparison is happening.
Vasu

On Mon, 3 Feb 2020 at 12:16, Michael Kay mike@xxxxxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> The only facility in the XSLT 3.0 to allow streaming of two input files
> "in parallel" is xsl:merge, and as Martin points out, that's rather
> specialised and not really suited to your requirements.
>
> In Saxon, streaming is in most cases done in push mode (where the parser
> owns the control loop, and sends events to the XSLT processor). You can't
> have two parallel control loops except with multi-threading, so the
> opportunities for streaming multiple files are limited (with xsl:merge,
> Saxon indeed uses multi-threading).
>
> At first sight, I don't see an XSLT-based answer to this one.
>
> Except, perhaps: you could do a streamed transformation of each input
> documents into an XML representation of an event stream, like
>
> <startElement name="folder" path="" hash=""/>
> <startElement name="folder" path="" hash=""/>
> <endElement name="folder"/>
>
> etc
>
> and then attempt to do an xsl:merge of the two event streams.
>
> Michael Kay
> Saxonica
>
> On 3 Feb 2020, at 13:47, Vasu Chakkera vasucv@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi All,
> I am planning to write a XML Tree comparing XSLT using streaming.
> The XML Trees look something like this
> <root path="" mhash =" ">
>   <folder path ="" mhash ="">
>     <folder path ="" mhash ="">
>        <leaf path ="" mhash ="">
>        </leaf>
>     </folder>
>   </folder>
> </root>
> There will be two such XML files to compare . These two XMLs are generated
> before and after moving a folder from source to destination. Source and
> destination could be two different OS.
> This is essentially the serialized Merkle Tree output of a folder
> structure. The idea is to run a Merkle Tree comparator that will pick the
> nodes that did not match. Rules are as follows.
>
>    1. If the root node in both the tree matches, then there is not
>    difference in the entire tree(because of how the Merkle tree is generated)
>    2. If root node hash does not match, we go to the child container and
>    compare the hash of the child container in both the XML files. ( the XML
>    folders structure will be identical with respect to the hash, but the
>    folder  path may be different because of the linux, windows path
>    conventions. Otherwise the folder structure is meant to be the same.)
>    3. If the hash of a folder from both the trees are same, the entire
>    tree under the folder that matches the hash is ignored.
>    4. if the hash of a folder from both the trees are not the same, then
>    the tree is further traversed and the step 3 is repeated.
>    5. The XSLT keeps writing out the nodes that do not match the hashes
>    in the source and target xml files
>
>
> So at the end of the processing, A comparator tree should be serialized,
> that has the nodes that have a non matching leaf node.
> Looking at the serialized tree, we can determine, which files got messed
> up while doing a transfer from Source to target.
>
>
>
> I am able to do this using non streaming xslt, but with streaming, since
> we need to stream two trees at a time and match compare the nodes,  i am
> not very sure how to proceed.
> I am able to do manipulations on one XML with streaming. I tried a few
> tricks, but did not get anywhere ( I am not very comfortable copying my
> code scribbling here)
>
> I need streaming because the XML files may be big.
> If someone has done something similar, or point me to an  intelligent way
> to do this, I will be thankful.
>
> Vasu
>
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/293509> (by
> email)
>
>
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/620062> (by
> email <>)
>


-- 
Vasu Chakkera
NodeLogic Limited
Oxford
www.node-logic.com
==============

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.