|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: How Do I Generate A Set-Difference With Context -
This continues my earlier post, unfortunately unresponded to, on the same
subject. The original post is here:
http://www.biglist.com/lists/xsl-list/archives/200503/msg00332.html
I've put on the back burner my initial quest of writing a generic diff-finding stylesheet. While I know I could do this in a, say, procedural programming language (walk the trees, compare children, recurse), and since [I read that] XSLT is Turing Complete, I know I can achieve this in XSLT as well. Just not right away. The diff quest came from the following problem: I get periodic XML "feeds" from a news syndicate; these feeds are parsed, formatted in HTML, and published on a website. Each feed is an XML file, and contains zero or more "stories". A story may be exactly like that in the immediately-prior feed, may be slightly different, or may be completely new. Hence my desire to "diff" 2 feeds rather than simply regenerate all stories. When only, say, 20 stories change among 1000+ stories, this is a processing win. I've modified my initial diff quest to one that is content with looking for differences in known nodes of an input document. In this case I pass-through everything other than a <story> node; when I encounter a <story> node I look for a matching story in the prior feed, and pass through if not found. "matching" means local-names are same, attributes are same in number, name and value, and all child nodes of <story> are present and similarly equal to those in the prior <story>, if any. I realize "match" may be application dependant. I used an augmented vset:difference from "XSLT Cookbook" to determine <story> differences. MSXML 4x (?) from a .NET app took 6-something minutes for a 1200-node transform. Saxon v653, JDK 1.4, from the commandline took 4-something minutes. Then I used keys to "index" stories in the prior, compared-to document. Now Saxon 653 took 42 seconds for the transform. Then I modified vset:difference to "short-circuit" at the first difference; now Saxon 653 took 30 seconds for the transform. I can't think of other algorithmic improvements to make; if anybody else can, please post. Now for a question: each <story> has, say, a @date attribute; multiple stories in 1 feed may have the same @date value. If I find a changed (or new) story, I want to pass through not just that story, but also all other stories with that same @date value, no matter they're unchanged. My first thougth about accomplishing this is to perform some sort of pipeline processing (it's immaterial if it's done in-situ in the XSLT processor). The first pass would generate a diff result, and a second pass would add back all stories of the same [unique] dates as the ones in the diff result. It would be nice to do this in one [conceptual] pass: --for every story check if it should pass through --if a passthrough also pass through otheries stories with the same @date Of course, a naive approach would have stories output multiply, unless I somehow "remember" what [say, dates] has already passed through. I could do that in a procedural language, but don't know how to [efficiently] in a functional one such as XSLT. While I read up on functional programming techniques, if anybody can suggest a way to accomplish this, please post. Here are 2 sample feeds: feed1.xml ====== <feed> <story @date="1" text="a"/> <story @date="2" text="a"/> </feed>
If feed2.xml were the "current" feed, and feed1.xml the propr one, I want to end up with the following, in 1 pass. final.xml ===== <feed> <story @date="1" text="a"/> <story @date="1" text="aa"/> </feed> Thank you --A _________________________________________________________________ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








