[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Specifying formal semantics in XML languages
Thanks, I have so far had three suggestions which I could how to implement - ideally they have to be based on XML syntax as that means the amount of new code is minimised (I do not wish to write complex interpreters in a portable environment). (A) little languages At 10:13 20/06/2006, Rick Jelliffe wrote: >In some of my company's products we use our own little schema >language that says > >* what elements are allowed or required >* what attributes are allowed or required >* what elements are only every found in first or last position This is my preferred solution, but only if there is a critical mass of other XML developers who have the same view. >We also have "usage schemas" which sample documents and generate all >the possible grandparent/parent/child paths in the document, and >checks other documents against these. > >Checking lists of tokens is indeed a very problematic area for >Schematron using the default XSLT 1 implementations. Agreed. This is one reason for special languages. A related area is checking dataTypes. For example we might wish to check that a point in a graphics language contained two positive integers, such as <point2>12 34</point2>. I don't think Schematron has any special support for asserting that something is a positive integer. So it could make sense to have a function like: <assert test="dataType(point2, 2, xsd:positiveInteger)"/> which checks both the length of the list and the dataType. This will not work with custom simpleTypes (unless there is access to the schema and tools to process it). So we may need to have tools to define custom types by extending xsd builtin types. It also doesn't allow us to do arithmetic - we might wish to assert that the length sqrt(x^2+y^2) is within given limits. It doesn't seem to me that this is an unrealistically complicated type of validation test. >ISO DSDL was created to give a home and official status to these >kind of little languages. If anyone can come up with a technically >excellent and implemented little schema language that helps validate >some significant kinds of markup idioms that XSD or the other ISO >DSDL schema languages do not cover well (as is *entirely* possible), >I am certain the ISO SC34 WG1 group would be interested in >considering it for standardization, in typically unpanicked fashion. If there are others interested then I would be interested in suggesting use-cases for a little language that checked simpleTypes. It should be fairly acceptable to add XSD facets to the language, perhaps like: minInclusive($list, value) // do all values correspond to the minInclusive criterion minInclusive(length($list), value) // does the length of the list correspond to the minInclusive criterion unique($list) // components of list are all distinct hasId($value, XPathContext) // does the $value correspond to the id of an element describable by the context (I'm sure there are better suggestions here) ... and I would like to be able to do STM maths (e.g. Math.* in Java). I am not sure how much of this is covered by XSLT2 (B) Schematron >To be honest, I suspect that Schematron with a particular extension >could pretty much do what Peter requires. In particular, ISO >Schematron has a macro facility called abstract patterns that allow >you to be much more declarative in labelling the participants in a >schema relationship: you could have one like > ><sch:pattern name="required-child" abstract="true"> > <sch:rule context="$parent"> > <sch:assert test="$child">The parent should have a child</sch:assert> > </sch:rule> ></sch:pattern> > >where the $ tokens are macro arguments that are replaced by their >invocation to give conventional Schematron schemas > ><sch:pattern name="eg" is-a="required-child"> > <sch:param name="parent" value="Angela"/> > <sch:param name="child" value="Suhai"/> > <sch:param name="position" value="1" /> ></sch:pattern> > >What this gives is enough markup that a custom processor can take >the schema and >generate code based on it. For example, to append a Suhai element >to the Angela >element in the first position. In fact, you might even decide not to >ever validate using the Schematron schema per se, (use it as >documentation) but to drive your superduper custom processor with >the information specified using abstract patterns! > >Abstract patterns represent, I hope, a significant advance in >home-made schema languages, because not only do you get the >background boring power of XPath validation, but you also get the >extra labelling required to enable identification of the parts of >constraints and assertion >tests. And that identification opens the door for re-targetting the >schema for purposes such as code generation or any kind of useful >purpose. XPaths are great because they are terse; abstract patterns >overcome the concomitant lack of declative expressiveness. I have read the spec - thanks - and this may well be able to manage much of the content validation that I currently require. It may be that it is complementary to the dataTyping in (A) (C) XQuery Why not XQuery, combined with MUST / MAY / MUSTNOT conditions? XQuery is a declarative language that can express the conditions given below. And I'd expect it would be fairly easy to define the user-declared functions you need. Jonathan Robie I have not used XQuery very much but it looks sufficiently complex to parse that it would be difficult to extract the declarative logic from it without having an XQuery processor inbuilt and called at each stage. But I would be happy to see more detail. Implementation. =========== In general XSD schema, Schematron and other approaches seem aimed primarily at validating static or static-like instances of complete documents. While this is important to me, there are at least two other requirements: (a) generating code. For example I have an element scalar that can have either a "value" attribute and element-only content or PCDATA content of the same value (this may not be the happiest design, but that it how it is. (I am increasingly finding that I need to add children to elements that were designed for text-only content). Example: <scalar dictRef="a:height">123.4</scalar> <scalar dictRef="a:height" value="123.4"><metadata name="dc:date" value="2006-06-23"></scalar> Currently my autogenerator will create: String Scalar.getXMLContent(); // reserved name for accessing PCDATA String Scalar.getValue(); // If we allow something like: <assert test=" @value and normalize-space(.)='' or (not(@value) and count(*)=0 and not(normalize-space(.)='' )"/> (my XSLT is rusty, but that is meant to say that exactly one of @value and non-empty PCDATA is allowed) then the code logic would be something like this (I use a XOM binding): String Scalar.getValue() { String value = super.getValue(); // there is a superclass that provides a simple getter String x = super.getXMLContent(); Assert.assertTrue("cannot have value and text content", value != null && (x == null || x.trim().equals("")); Assert.assertTrue("Cannot have text and children", value == null && (this.getChildElements().size()==0 && !x.trim().equals("")); } This will automatically capture the data in the required order and should be autogeneratable from the declarative language (b) validation during parsing. I am increasingly using this approach to validate as a document is parsed. Where possible XML tools are used but obviously some of this has to be bespoke (although it will be autogenerated). This means there is no need for heavyweight tools such as Xerces and that I only need as much apparatus to validate the input as is defined in the schema. (c) validation of complete documents. Ideally this should be possible using Schematron and other commodity approaches without the custom code. But it requires extensions to the current toolkit. ============ In summary, therefore, I would be interested in: - a communal little language for validating dataTypes - exploration of the range of concepts that are not supported in current schemas ideally to find a consensus of the cost and benefits of extensions. - any other experience and comments. Many thanks P. Peter Murray-Rust Unilever Centre for Molecular Sciences Informatics University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK +44-1223-763069
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|