Re: Testing 2 XML documents for equality - a solution
(i.e. introducing an extra character between attribute name and value, which is unlikely to occur in the attribute value; for e.g. a newline character) how do you define unlikely? I can easily provide a counter example. (although actually adding such a separator works even if the separator is in the attribute value, as it uniquely terminates teh name in the string, you only need to use a character that is not a name character. I mentioned attributes but you do the same for elements so you need the same fix there (with a different character) as you otherwise don't distringuish element nodes from attribute nodes of the same name. I also notice that you don't record which element an attribute is on, so looking at your proposed fix <xsl:for-each select="$doc1//@*"> <xsl:value-of select="name()" /><xsl:text>
</xsl:text><xsl:value-of select="." /> </xsl:for-each> <x a="2"> <b/> </x> and <x> <b a="2"/> </x> would both generate the same attribute test string of "a
2" so would compare equal. These documents are reported not equal! are you sure? I think here I am right! hmm:-) For this example, the $doc1//node() path expression returns 4 nodes (2 element nodes and 2 "white space text nodes") yes The "white space text nodes" will be filtered by the predicate [not(normalize-space(self::text()) = '')] yes but also any element node will be filtered as self::text() on an element node will return an empty node set (as it isn't a text node) and normalize-space() on that will return '' so the whole select expression on the for-each returns an empty node set. I agree that the XML parser is not expected to report attribute nodes in same order. But I guess we can reasonably assume that a "specific XML parser" would report attributes in same order. more guesses. I have tested the same example with a single product multiple times, and always I am getting same result.. probably true, but you never really know. attributes are often put into some kind of hashed data structure so the order they come out can depend on all sorts of strange factors. These things can be fixed by (eg) sorting attribute nodes to be alphabetical) but as Michael just indicated the process is always likely to be very inefficient. You _always_ generate a really huge string for each document even if the top level nodes are <foo version="1"> and <foo version="2"> you'd really like to stop there and not generate a text string of the 100001 child nodes below foo. Given that you are walking over the trees anyway to generate the strings, you should be able to walk over th etwo trees in parallel and stop whenever you find a difference. David See what saxon says: $ saxon eq.xsl eq.xsl iws=y Equal $ cat file1.xml <a> <b/> </a> $ cat file2.xml <x/> so when ignoring white space text nodes the stylesheet reports <a> <b/> </a> as equal to <x/> ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format