[XML-DEV Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
Re: defining correctness for an XML transformation - how?
- From: Rick Jelliffe <rjelliffe@allette.com.au>
- To: xml-dev <xml-dev@lists.xml.org>
- Date: Fri, 5 Jul 2024 16:42:59 +1000

I haven't read all the thread so apologies if I am re-making some point, or saying something dumb that has been pointed out.
Three quick points.
1.
If your schema is recursive, then the problem of determining whether it
is always correct is NP, isn't it? Infinite number of inputs. Which, if
so, probably needs to limit or condition our expectations or
approaches.
2. Obviously
people use Schematron to validate pre and post conditions, including
co-constraints between input and output all the time. it is a common
use.
(Where Schematron
is lacking is that it does not have a built in mechanism to force
completeness: obviously it can report if there are unexplained elements
that no rule in a pattern catches, but it cannot enforce eg if there is
an element found that is caught by no pattern... though you can make
another pattern for that. )
What
becomes useful (unless every element name is only used once per document instance) is for the
transformation to generate some tracing data; so that e.g. for
each output element we know which element ID or xpath of the input document(s) generated it. That
gets rid of many logical errors: the issue is rarely "does this input
value appear in the output" but rather "does this input value appear in
the output because it actually came from that input?"
3.
The problem might be considered as a category error: we do not actually
want to validate the Inputs and outputs, we want to validate the XSLT. I
came up with an approach for a bank years ago where I converted an XSLT
into just the possible element branches it could generate. This then
could be stitched in with other similar data from subsequent stages in
the pipeline, so that the result was an abstract document (on a
spreadsheet) that showed for each input what elements could appear in
the final output. This helped them feel on top of the pipeline, to
reason about flaws and behaviour, without needing XSLT expertise.
This
then could be paired with data extracted from a schema: for example to
just do pairwise tests: for every parent/child in the abstract output,
is it an allowed pair by the output schema? (IIRC James Clark put in a
partial validation mode in Jing that allowed you to just test
parent-child constraints not sequence or absense: this could be useful
too. ) Are any required pairs missing in the input or output?
Rick
Roger L Costello <costello@mitre.org> writes:
> Michael Sperberg-McQueen made this fascinating statement:
>
> ... my question [is] focused not on how to prove a transformation
> correct, but how to specify what correctness is for that
> transform.
>
> I would really like to understand that. What is the difference between
> “proving a transformation is correct” versus “specifying what
> correctness is for a transform”?
(Michael Kay has already answered this pretty well; this is partly to
say that I agree with what he said and partly to add a simple example.)
The second is (the formulation of) a statement; the first is (the
construction of) a proof that the statement is true.
For example: Suppose we specify as follows the correctness of a
transformation T taking input I and producing output O:
pre-condition: I is an XML document containing zero or more
Airport_Name elements.
post-condition: O is an XML document containing zero or more 'name'
elements, such that
(1) for every Airport_Name element $i in I, there is some name
element $o in O such that $o has only one child node (a text
node) and string($o) = normalize-space($i);
(2) for every name element $o in O, there is some Airport_Name
element $i in I such that string($o) = normalize-space($i).
Then for any specific input/output pair we can imagine checking that the
pre-condition and post-condition both hold. If I have succeeded in
making the specification unambiguous, it should always be clear whether
a given pair of I and O do or do not satisfy the conditions. From this
spec, for example, it's clear that any name in the input may occur any
non-zero number of times in the output, and any name in the output may
have occurred any non-zero number of times in the input, and that the
ordering of names in the output, and the structure of the output
document, are not constrained.
And to prove T correct, instead of just proving an individual I/O pair
correct, we can imagine proving that *whenever* the pre-condition is
satisfied, the result of running T will *always* satisfy the
post-condition. Suppose T is written in XSLT. Then we might want to
show that for every Airport_Name element in I, a particular template
will be evaluated, that that template produces zero or more suitable
'name' elements, and that no other template in the stylesheet will
produce any 'name' elements. Or we might show that a particular
variable is assigned a sequence of strings containing every distinct
Airport_Name string in I, and that a given expression in the stylesheet
serializes them all in 'name' elements (and again that nothing else in
the program will ever produce a 'name' element).
[Note: the example pre- and post-conditions I've given are formulated in
English; many people prefer a more formal language, because
statements in more formal languages can be manipulated
mechanically in ways that are helpful. Proving that T satisfies
its specification can similarly be done in prose or in a purely
formal way.
We get more confidence that the proof is correct if it can be
checked mechanically, but plenty of anecdotal evidence suggests
that even writing conditions and proofs in prose can reduce the
defect rate for software. (See for example the 'clean room
engineering' approach developed by Harlan Mills at IBM, perhaps
best described as 'semi-formal'.) There are also anecdotal
reports that just formulating the pre- and post-conditions helps,
even if there is no effort to produce a proof. I find this
plausible, since formulating pre- and post-conditions even in
prose makes me think about the program and possible edge cases in
a way that I do not otherwise always do.
I will be happiest if the answer to my question of the other day
describes a formal language analogous to those used in formal
proofs of correctness for imperative languages, but suitable for
XML. Even a coherent account of how to formulate clear, crisp,
complete, and accurate specifications of correctness in prose,
however, would look to me like progress. End of note.]
I hope this helps.
Michael
--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0 |
 |
Atom 0.3 |
 |
|
Stylus Studio has published XML-DEV in RSS and ATOM formats,
enabling users to easily subcribe to the list from their preferred news reader application.
|
Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website.
they were not included by the author in the initial post. To view the content without the Sponsor Links please
click here.
|
|