[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: comparing XML document structure

Subject: Re: comparing XML document structure
From: Graydon <graydon@xxxxxxxxx>
Date: Wed, 17 Aug 2011 21:35:06 -0400
Re:  comparing XML document structure
On Thu, Aug 18, 2011 at 12:44:02AM +0100, Tony Graham scripsit:
> On Wed, August 17, 2011 11:48 pm, Wendell Piez wrote:
> > It sounds like you want to infer content models on the fly and then
> > validate against them. I can imagine approaches to this, but I doubt
> > that I'd trust many algorithms that actually attempted it -- not because
> > of XSLT, but because of the problem specifying the problem.
> ...
> > But why not use a schema? There are processors such as Trang that can
> > infer schemas from documents.
> 
> What Wendell said.

Using trang to generated a schema from the DTD in question has
historically tended to fail.  (Not a whole lot, but some; generally
usable for creating a schema to get saxon to validate the output, but
not usable on the fly for structure.)

So I've got a relatively fixed content model, in the form of a
comprehensive DTD and a much less comprehensive example of how to use
that DTD for a particular content type.

Initially, what I want to do is eat the exemplar, use it to generate a
parent child list -- so I'd have section/num, section/para, and
section/subsection -- and then take an output file and get the same list
from it, then compare the lists and produce a message for mis-matches.
So if a particular output file had section/num, section/subsection, and
section/list in it, for example, there should be an exception noted for
the presence of the list. (Valid, but not expected.)
> ...
> > On 8/17/2011 5:57 PM, Graydon wrote:
> ...
> >> The desired goal is to be able to programmatically pull the structure,
> >> at least to the extent of parent-child element pairs, from the
> >> semantics-defining file, and compare that to each output file in turn.
> >>
> >> So if the semantics-defining file gives an example section element,
> >> which has num, para, and subsection element children, what I want to be
> >> able to do is create a sequence of axis relationships and test the
> >> section elements of the output for axis relationships that are not
> >> members of that sequence.
> 
> It would help the rest of us wrap our heads around the problem if you
> could provide a sample fragment of the "semantics-defining file" so we can
> see what you are dealing with.

It would, but the whole NDA thing rears its ugly head.

It's just a document, to the same DTD as the output.  Instead of having
actual content in it, it has things like <para>This para is optional; if
present, it should contain introductory text</para> in it.

> You may be able to create the tests you want in Schematron, but it's a bit
> hard to tell without having an example to look at.  (If you can generate
> Schematron from your definitions, you could directly create XSLT for the
> axis tests about as easily, but the advantage could be that there are
> tools such as XML IDEs that already understand the Schematron report
> format.)

Schematron is certainly something to look at, yes.

Thanks!
Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.