[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Schema based XML compare

  • From: "David Lee" <dlee@calldei.com>
  • To: "'Mukul Gandhi'" <gandhi.mukul@gmail.com>
  • Date: Fri, 24 Dec 2010 07:39:06 -0500

RE:  Schema based XML compare
Interesting ideas.   
What I have in mind is probably closer to deep-equal ... but could run on a non-schema aware processor.
I happen to have  a StAX based compare program which runs in streaming mode and handles blankspace or not as a global option.

I was thinking of "simply" keeping track of the XSTypeDefnition for each node as it encounters it and replace the current string-compare 
with a data type compare.

I did not consider something like

================================
the element sequence
<x>hello</x>

would be schema equivalent to

<x>hello</x>
<x>hello</x>
<x>hello</x>
============================

Which actually I don’t think follows ... 
If I went that far all you would have to do is validate the 2 documents against a schema and not bother comparing them.

For my use cases I would NOT consider the above to be equivilent.
By 'schema equivilent'   I mean the document *instances* are equivilent but the data comparison of text values uses schema type information 
so say a xs:double  "6.0"  == "6"   but not if it were an xs:string





----------------------------------------
David A. Lee
dlee@calldei.com
http://www.xmlsh.org

-----Original Message-----
From: Mukul Gandhi [mailto:gandhi.mukul@gmail.com] 
Sent: Thursday, December 23, 2010 11:59 PM
To: David Lee
Cc: xml-dev@lists.xml.org
Subject: Re:  Schema based XML compare


Hi David,
    I believe, Mike's idea to use XPath 2.x deep-equal function would
be useful if you consider true deep-equality (i.e same number of child
nodes, siblings etc at equivalent locations, in XML documents that
you're comparing) of XML trees as notion of XML documents equality.

It also seems that deep-equal function doesn't allow a configuration
to minus the effect of white-spaces in XML document equality. But I
believe, this concern has different repercussions on data oriented and
document oriented XML documents. It seems white-spaces would be
significant for document oriented XML but not for data oriented XML.

All of above is not to say that deep-equal method is not useful. It's
very useful for lots of use cases.

If your notion of XML documents equality is purely XML Schema aware
(the below schema example [1] is an example for this) (and not
strictly equivalent in XPath 2.x deep-equal sense) then you could
explore using something like the JAXP Schema validation API to derive
this equivalence.

[1]
For this particular schema fragment,
<xs:element name="x" type="xs:string" maxOccurs="unbounded" />

the element sequence
<x>hello</x>

would be schema equivalent to

<x>hello</x>
<x>hello</x>
<x>hello</x>

And I'm not sure if your use-case considers the above two XML
fragments equivalent given the above XML schema element declaration
[1].

On Thu, Dec 23, 2010 at 5:59 PM, David Lee <dlee@calldei.com> wrote:
> I've run into an age-old issue but I don’t see any off-the-shelf solutions
> for.
>
>
>
> Suppose I have 2 XML documents I want to compare (not diff, just give me
> yes/no are they equivalent).
>
> This is pretty simple to do even with things like ignoring whitespace
> options etc.  Many tools out there, including one I wrote
>
> ( http://www.xmlsh.org/CommandXcmp)
>
>
>
> Now here's the twist …
>
>
>
> Suppose I want to compare for XSD  data model equivalence, not XDM
>  equivalence ?
>
>
>
> Example.
>
>
>
> <number>1.0</number>
>
> vs.
>
> <number>1</number>
>
>
>
> Without type annotation these are different.
> But if I declare the type for number to  be xs:double
>
> they should compare equal.
>
>
>
> Thus a compare tool should be able to be given a schema and do a comparison
> and report that these 2 documents are equivalent at the XSD data model
> level.
>
>
>
> Has anyone seen anything like this ?
>
> Would anyone have a use for it ? (I may end up writing it for my own uses).
>
>
>
> Not sure how far one can take this before entering murky waters …
>
> Even in the numeric cases there are edge cases where comparisons are not
> well defined (rounding/precision issues on floating point numbers).
>
> Then add in things like date/times …
>
> But suppose I'm willing to avoid the murky edges and just stick to the
> obvious cases … shouldn’t be too hard right ?
> In fact I suspect its so obvious its been done but I can't find one
> anywhere.
>
>
>
> -David

>
>
>
>
> ----------------------------------------
>
> David A. Lee
>
> dlee@calldei.com
>
> http://www.xmlsh.org




-- 
Regards,
Mukul Gandhi



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.