Re: Specifying formal semantics in XML languages

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: xml-dev@l...
Subject: Re: Specifying formal semantics in XML languages
From: peter murray-rust <pm286@c...>
Date: Sat, 24 Jun 2006 09:30:08 +0100
Cc: henry Rzepa <h.rzepa@i...>
In-reply-to: <4497BC56.7060905@a...>
References: <7.0.1.0.0.20060620075919.0478eba8@c...><4497BC56.7060905@a...>

Thanks,
         I have so far had three suggestions which I could how to 
implement - ideally they have to be based on XML syntax as that means 
the amount of new code is minimised (I do not wish to write complex 
interpreters in a portable environment).

(A) little languages

At 10:13 20/06/2006, Rick Jelliffe wrote:

>In some of my company's products we use our own little schema 
>language that says
>
>* what elements are allowed or required
>* what attributes are allowed or required
>* what elements are only every found in first or last position

This is my preferred solution, but only if there is a critical mass 
of other XML developers who have the same view.

>We also have "usage schemas" which sample documents and generate all 
>the possible grandparent/parent/child paths in the document, and 
>checks other documents against these.
>
>Checking lists of tokens is indeed a very problematic area for 
>Schematron using the default XSLT 1 implementations.

Agreed. This is one reason for special languages. A related area is 
checking dataTypes. For example we might wish to check that a point 
in a graphics language contained two positive integers, such as 
<point2>12 34</point2>. I don't think Schematron has any special 
support for asserting that something is a positive integer. So it 
could make sense to have a function like:

<assert test="dataType(point2, 2, xsd:positiveInteger)"/>

which checks both the length of the list and the dataType.

This will not work with custom simpleTypes (unless there is access to 
the schema and tools to process it). So we may need to have tools to 
define custom types by extending xsd builtin types.

It also doesn't allow us to do arithmetic - we might wish to assert 
that the length sqrt(x^2+y^2) is within given limits. It doesn't seem 
to me that this is an unrealistically complicated type of validation test.

>ISO DSDL was created to give a home and official status to these 
>kind of little languages. If anyone can come up with a technically 
>excellent and implemented little schema language that helps validate 
>some significant kinds of markup idioms that XSD or the other ISO 
>DSDL schema languages do not cover well (as is *entirely* possible), 
>I am certain the ISO SC34 WG1 group would be interested in 
>considering it for standardization, in typically unpanicked fashion.

If there are others interested then I would be interested in 
suggesting use-cases for a little language that checked simpleTypes. 
It should be fairly acceptable to add XSD facets to the language, perhaps like:

minInclusive($list, value)  // do all values correspond to the 
minInclusive criterion
minInclusive(length($list), value) // does the length of the list 
correspond to the minInclusive criterion
unique($list)  // components of list are all distinct
hasId($value, XPathContext) // does the $value correspond to the id 
of an element describable by the context (I'm sure there are better 
suggestions here)
...
and I would like to be able to do STM maths (e.g. Math.* in Java).

I am not sure how much of this is covered by XSLT2

(B) Schematron

>To be honest, I suspect that Schematron with a particular extension 
>could pretty much do what Peter requires. In particular, ISO 
>Schematron has a macro facility called abstract patterns that allow 
>you to be much more declarative in labelling the participants in a 
>schema relationship: you could have one like
>
><sch:pattern name="required-child" abstract="true">
>   <sch:rule context="$parent">
>     <sch:assert test="$child">The parent should have a child</sch:assert>
>  </sch:rule>
></sch:pattern>
>
>where the $ tokens are macro arguments that are replaced by their 
>invocation to give conventional Schematron schemas
>
><sch:pattern name="eg"  is-a="required-child">
>     <sch:param name="parent" value="Angela"/>
>    <sch:param name="child" value="Suhai"/>
>    <sch:param name="position" value="1" />
></sch:pattern>
>
>What this gives is enough markup that  a custom processor can take 
>the schema and
>generate  code based on it. For example, to append a Suhai element 
>to the Angela
>element in the first position. In fact, you might even decide not to 
>ever validate using the Schematron schema per se, (use it as 
>documentation) but to drive your superduper custom processor with 
>the information specified using abstract patterns!
>
>Abstract patterns represent, I hope, a significant advance in 
>home-made schema languages, because not only do you get the 
>background boring power of XPath validation, but you also get the 
>extra labelling required to enable identification of the parts of 
>constraints and assertion
>tests. And that identification opens the door for re-targetting the 
>schema for purposes such as code generation or any kind of useful 
>purpose. XPaths are great because they are terse; abstract patterns 
>overcome the concomitant lack of declative expressiveness.

I have read the spec - thanks - and this may well be able to manage 
much of the content validation that I currently require. It may be 
that it is complementary to the dataTyping in (A)

(C)  XQuery

Why not XQuery, combined with MUST / MAY / MUSTNOT conditions? XQuery 
is a declarative language that can express the conditions given 
below. And I'd expect it would be fairly easy to define the 
user-declared functions you need.

Jonathan Robie

I have not used XQuery very much but it looks sufficiently complex to 
parse that it would be difficult to extract the declarative logic 
from it without having an XQuery processor inbuilt and called at each 
stage. But I would be happy to see more detail.

Implementation.
===========

In general XSD schema, Schematron and other approaches seem aimed 
primarily at validating static or static-like instances of complete 
documents. While this is important to me, there are at least two 
other requirements:

(a) generating code. For example I have an element scalar that can 
have either a "value" attribute and element-only content or PCDATA 
content of the same value (this may not be the happiest design, but 
that it how it is. (I am increasingly finding that I need to add 
children to elements that were designed for text-only content).

Example:
<scalar dictRef="a:height">123.4</scalar>
<scalar dictRef="a:height" value="123.4"><metadata name="dc:date" 
value="2006-06-23"></scalar>

Currently my autogenerator will create:

String Scalar.getXMLContent(); // reserved name for accessing PCDATA
String Scalar.getValue(); //

If we allow something like:
<assert test="
   @value and normalize-space(.)='' or
   (not(@value) and count(*)=0 and not(normalize-space(.)='' )"/>
(my XSLT is rusty, but that is meant to say that exactly one of 
@value and non-empty PCDATA is allowed) then the code logic would be 
something like this (I use a XOM binding):

String Scalar.getValue() {
   String value = super.getValue();   // there is a superclass that 
provides a simple getter
   String x = super.getXMLContent();
   Assert.assertTrue("cannot have value and text content", value != null
     && (x == null || x.trim().equals(""));
   Assert.assertTrue("Cannot have text and children",
     value == null && (this.getChildElements().size()==0 && 
!x.trim().equals(""));
}

This will automatically capture the data in the required order and 
should be autogeneratable from the declarative language

(b) validation during parsing.
I am increasingly using this approach to validate as a document is 
parsed. Where possible XML tools are used but obviously some of this 
has to be bespoke (although it will be autogenerated). This means 
there is no need for heavyweight tools such as Xerces and that I only 
need as much apparatus to validate the input as is defined in the schema.

(c) validation of complete documents.
Ideally this should be possible using Schematron and other commodity 
approaches without the custom code. But it requires extensions to the 
current toolkit.

============
In summary, therefore, I would be interested in:
- a communal little language for validating dataTypes
- exploration of the range of concepts that are not supported in 
current schemas ideally to find a consensus of the cost and benefits 
of extensions.
- any other experience and comments.

Many thanks

P.

Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069

Follow-Ups:
- Re: Specifying formal semantics in XML languages
  - From: "Rick Jelliffe" <rjelliffe@a...>

References:
- Specifying formal semantics in XML languages
  - From: peter murray-rust <pm286@c...>
- Re: Specifying formal semantics in XML languages
  - From: Rick Jelliffe <rjelliffe@a...>

Prev by Date: Help! Any ideas for language for validating links?
Next by Date: Re: Help! Any ideas for language for validating links?
Previous by thread: Re: Re: Specifying formal semantics in XML languages
Next by thread: Re: Specifying formal semantics in XML languages
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >