[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: How Do I Generate A Set-Difference With Context -

Subject: RE: How Do I Generate A Set-Difference With Context - Part A
From: "Aron Bock" <aronbock@xxxxxxxxxxx>
Date: Sun, 13 Mar 2005 00:52:23 +0000
xslt set difference
This continues my earlier post, unfortunately unresponded to, on the same subject. The original post is here: http://www.biglist.com/lists/xsl-list/archives/200503/msg00332.html

I've put on the back burner my initial quest of writing a generic diff-finding stylesheet. While I know I could do this in a, say, procedural programming language (walk the trees, compare children, recurse), and since [I read that] XSLT is Turing Complete, I know I can achieve this in XSLT as well. Just not right away.

The diff quest came from the following problem: I get periodic XML "feeds" from a news syndicate; these feeds are parsed, formatted in HTML, and published on a website. Each feed is an XML file, and contains zero or more "stories". A story may be exactly like that in the immediately-prior feed, may be slightly different, or may be completely new. Hence my desire to "diff" 2 feeds rather than simply regenerate all stories. When only, say, 20 stories change among 1000+ stories, this is a processing win.

I've modified my initial diff quest to one that is content with looking for differences in known nodes of an input document. In this case I pass-through everything other than a <story> node; when I encounter a <story> node I look for a matching story in the prior feed, and pass through if not found. "matching" means local-names are same, attributes are same in number, name and value, and all child nodes of <story> are present and similarly equal to those in the prior <story>, if any. I realize "match" may be application dependant.

I used an augmented vset:difference from "XSLT Cookbook" to determine <story> differences. MSXML 4x (?) from a .NET app took 6-something minutes for a 1200-node transform. Saxon v653, JDK 1.4, from the commandline took 4-something minutes. Then I used keys to "index" stories in the prior, compared-to document. Now Saxon 653 took 42 seconds for the transform. Then I modified vset:difference to "short-circuit" at the first difference; now Saxon 653 took 30 seconds for the transform. I can't think of other algorithmic improvements to make; if anybody else can, please post.

Now for a question: each <story> has, say, a @date attribute; multiple stories in 1 feed may have the same @date value. If I find a changed (or new) story, I want to pass through not just that story, but also all other stories with that same @date value, no matter they're unchanged.

My first thougth about accomplishing this is to perform some sort of pipeline processing (it's immaterial if it's done in-situ in the XSLT processor). The first pass would generate a diff result, and a second pass would add back all stories of the same [unique] dates as the ones in the diff result.

It would be nice to do this in one [conceptual] pass:
--for every story check if it should pass through
--if a passthrough also pass through otheries stories with the same @date

Of course, a naive approach would have stories output multiply, unless I somehow "remember" what [say, dates] has already passed through. I could do that in a procedural language, but don't know how to [efficiently] in a functional one such as XSLT. While I read up on functional programming techniques, if anybody can suggest a way to accomplish this, please post.

Here are 2 sample feeds:

feed1.xml
======
<feed>
 <story @date="1" text="a"/>
 <story @date="2" text="a"/>
</feed>


feed2.xml ====== <feed> <story @date="1" text="a"/> <story @date="1" text="b"/> <story @date="2" text="a"/> </feed>

If feed2.xml were the "current" feed, and feed1.xml the propr one, I want to end up with the following, in 1 pass.

final.xml
=====
<feed>
 <story @date="1" text="a"/>
 <story @date="1" text="aa"/>
</feed>

Thank you

--A

_________________________________________________________________
Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.