[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Write an XSLT program that generates an XSLT program orwri

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: xml-dev <xml-dev@lists.xml.org>
  • Date: Thu, 12 May 2022 17:49:20 +1000

Re:  Write an XSLT program that generates an XSLT program orwri
People interested in doing this should feel free to grab code from https://github.com/Schematron/schematron/tree/master/trunk/xsd2sch (or even update it!)

In about 2008, JSTOR sponsored an R&D project to implement the reasonably large subset of XSD 1.0 that they used, to run as Schematron: this was not only to advance the state of the art, but because they were (I gather) finding XSD validators of the time just spewed out standard messages and numbers, which were as unhelpful as Voynich to editors and so on. (Perhaps they wanted to use apps and pipelines that did not support XSD too? Phases/progressive validation could also open up some extra workflow possibilities.)

The coverage is approximately:
  • simple datatypes: believed to be 100%
  • list and union datatypes: not supported
  • structural constraints on elements and attributes: supported (~)
  • multiple namespaces, import and include: supported (~)
  • identity constraints: not supported
  • dynamic constraints: (xsi:type, xsi:nill) not supported
  • tricky prefixes: (elementFormDefault) not supported

Obviously implementing identity constraints and xsd:assert would be a doddle. (There is a page on identity constraints at the link below to give the idea.) It needs much more testing to be ready for commercial use, but is good enough for targetted use or cannibalization.

The main difficulty of the project was retaining technical staff, if I recall: they absolutely hated having to deal with the XSD specification and found the technology had too many edge cases to be tractable, which meant that the project had to be organized in small discrete chunks-- not for Scrum reasons but just for mental fatigue. (These were not dummies: one was working through his PhD, another ended up in Redmond.)

Anyway, the code is there, and descriptions of the approaches (originally on OReilly's blog) is at Schematron.com (find "Converting XML Schemas to Schematron" for background)  with details at  https://schematron.com/document/2974.html

I guess the main surprise to come out of it was that we could validate content models using XPath 2. Originally we started with just pairwise validation for element content types: x/y can only be followed by z, etc but it dawned on me that we could make a string listing the names of child elements in sequence, separated by spaces (e.g. "head body"), and test if that matched a regex generated from the content model, which took care of cardinality constraints too. (Which meant that Schematron was strictly more powerful than XSD 1.0.)  

The joy at finding we could do content model grammar validation was tempered by the realization that we could not give much better validation diagnostics: the messages always had to be in terms of where the error was detected rather than what caused it. E.b if the content model was ( A, ( B, Z, X) | Z) and the instand had A, Z, X it would say  "we found unexpected X here instead of Z" rather than e.g "After A, B is missing, so you cannot have the Z followed by an X."  Presumably some extra smarts could be added fir this, and perhaps the XSD could gave sone annotations to help. 

The larger issue was that Schematron allows semantic assertions and diagnostics: you can express a constraint in natural language in the terms that target user understands, and give feedback to them. (A real example: I was working on a pipeline system where the edited documents were translated into several intermediate XML vocabs and structures before being output and validated. The company employed devops people to look at the validation logs, then trace back to the original authoring format, then decide if it were a programming error or markup error.) So merely converting an XSD to Schematron did not allow the advantage of having efficient, specific, targetted feedback.

(It goes deeper than the names. The grammar-based schemas have no capability of capturing and transmitting intention: if an attribute or element is required, why is it required? If a content model is super-complicated, what simpler pattern is actually being modelled, albeit clumsily? )

I would not want to implement this again using XSLT 2. Maybe 3 is better (?) but I think doing at least some of the stages in some general-purpose language (Java, etc) that allowed decoratable objects would have reduced the mental complexity a lot: immutability just [expletive deleted] sometimes. 


Cheers
Rick



On Mon, 9 May 2022, 21:16 Roger L Costello, <costello@mitre.org> wrote:

Hi Folks,

 

The Schematron processor that I use is an XSLT program that takes as input a Schematron schema and the XSLT program transforms the Schematron schema into an XSLT program that is specific to the Schematron schema:

 

Schematron schema --> XSLT --> XSLT for the particular Schematron schema

 

Then the “XSLT for the particular Schematron schema” is run and it inputs the XML document to be validated. The output is the validation results:

 

XML doc to be validated --> XSLT for the particular Schematron schema --> validation results

 

Rick et al chose to implement Schematron validation by generating a stylesheet for the particular Schematron schema.

 

An alternative strategy would have been to create a universal stylesheet that directly performs Schematron validation on the XML doc to be validated:

 

XML doc to be validated --> universal stylesheet --> validation results

 

Interestingly, Michael Kay has a blog post (https://dev.saxonica.com/blog/mike/2018/02/could-we-write-an-xsd-schema-processor-in-xslt.html) in which he discusses the idea of using XSLT to build an XML Schema validator. He explores the idea of whether to write an XSLT program that generates another XSLT program (as Schematron does) or whether to write a universal XSLT program. At the end of his blog, Michael writes:

 

I still have an open mind about whether a universal stylesheet should be used, or a generated stylesheet for a particular schema.

 

A fascinating parallel, I think.

 

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.