[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: defining correctness for an XML transformation - how?
On 7/4/2024 9:49 AM, C. M. Sperberg-McQueen wrote: By pure coincidence I just happened across this paper which deals in part with what software correctness is and how it can be measured -Roger L Costello <costello@mitre.org> writes:Michael Sperberg-McQueen made this fascinating statement: ... my question [is] focused not on how to prove a transformation correct, but how to specify what correctness is for that transform. I would really like to understand that. What is the difference between “proving a transformation is correct” versus “specifying what correctness is for a transform”? "Cleanroom Software Engineering Harlan D. Mills, M. Dyer, R. C. Linger" https://trace.tennessee.edu/cgi/viewcontent.cgi?article=1017&context=utk_harlan Harlan D. Mills was the IBM engineer behind "cleanroom software engineering. The following is a retrospective on his work - https://ieeecs-media.computer.org/assets/pdf/millslegacy.pdf One interesting twist is to measure software correctness by statistical measures instead of formally by mathematical deduction. You still need formal specifications, and they can and must be verified by humans, and that verification may involve mathematics, but it's different kind of thing. As part of correctness testing, one has to include a statistical distribution of expected inputs. Then in testing those are applied and failures are noted. (It sounds a lot like fuzzing, doesn't it?). From them the mean time to failure can be reliably estimated, among other things. (Michael Kay has already answered this pretty well; this is partly to say that I agree with what he said and partly to add a simple example.) The second is (the formulation of) a statement; the first is (the construction of) a proof that the statement is true. For example: Suppose we specify as follows the correctness of a transformation T taking input I and producing output O: pre-condition: I is an XML document containing zero or more Airport_Name elements. post-condition: O is an XML document containing zero or more 'name' elements, such that (1) for every Airport_Name element $i in I, there is some name element $o in O such that $o has only one child node (a text node) and string($o) = normalize-space($i); (2) for every name element $o in O, there is some Airport_Name element $i in I such that string($o) = normalize-space($i). Then for any specific input/output pair we can imagine checking that the pre-condition and post-condition both hold. If I have succeeded in making the specification unambiguous, it should always be clear whether a given pair of I and O do or do not satisfy the conditions. From this spec, for example, it's clear that any name in the input may occur any non-zero number of times in the output, and any name in the output may have occurred any non-zero number of times in the input, and that the ordering of names in the output, and the structure of the output document, are not constrained. And to prove T correct, instead of just proving an individual I/O pair correct, we can imagine proving that *whenever* the pre-condition is satisfied, the result of running T will *always* satisfy the post-condition. Suppose T is written in XSLT. Then we might want to show that for every Airport_Name element in I, a particular template will be evaluated, that that template produces zero or more suitable 'name' elements, and that no other template in the stylesheet will produce any 'name' elements. Or we might show that a particular variable is assigned a sequence of strings containing every distinct Airport_Name string in I, and that a given expression in the stylesheet serializes them all in 'name' elements (and again that nothing else in the program will ever produce a 'name' element). [Note: the example pre- and post-conditions I've given are formulated in English; many people prefer a more formal language, because statements in more formal languages can be manipulated mechanically in ways that are helpful. Proving that T satisfies its specification can similarly be done in prose or in a purely formal way. We get more confidence that the proof is correct if it can be checked mechanically, but plenty of anecdotal evidence suggests that even writing conditions and proofs in prose can reduce the defect rate for software. (See for example the 'clean room engineering' approach developed by Harlan Mills at IBM, perhaps best described as 'semi-formal'.) There are also anecdotal reports that just formulating the pre- and post-conditions helps, even if there is no effort to produce a proof. I find this plausible, since formulating pre- and post-conditions even in prose makes me think about the program and possible edge cases in a way that I do not otherwise always do. I will be happiest if the answer to my question of the other day describes a formal language analogous to those used in formal proofs of correctness for imperative languages, but suitable for XML. Even a coherent account of how to formulate clear, crisp, complete, and accurate specifications of correctness in prose, however, would look to me like progress. End of note.] I hope this helps. Michael
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|