[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: defining correctness for an XML transformation - how?
Thank you. I think you are right for some cases, but not all. For cases where what we want is to translate some body of information from one format to another (whether it's from one XML format to another or between XML and some non-XML format), I think you are right that the information of the data seems to provide a good handle. I tried to describe a systematic approach to that problem in C. M. Sperberg-McQueen, "What constitutes successful format conversion? Towards a formalization of 'intellectual content'" International Journal of Digital Curation (IJDC) 6.1 (2011): 153-164. http://www.ijdc.net/article/view/170/238 http://www.ijdc.net/article/download/170/238/0 A concrete but simple example is described in Sperberg-McQueen, C. M., Yves Marcoux and Claus Huitfeldt. "Document lattices: Equivalence, compatibility, and contradiction in document markup." Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). https://doi.org/10.4242/BalisageVol13.Sperberg-McQueen01 https://balisage.net/Proceedings/vol13/html/Sperberg-McQueen01/BalisageVol13-Sperberg-McQueen01.html Two caveats apply: (1) Those papers assume that what you want is to show that in converting a particular resource from one form to another, no information you care about has been corrupted, deleted, or added. That's challenging in part because it requires us to specify what information we care about and what information we don't care about, which is not always easy. But if the task is to show that the program performing the conversion is correct, more is required than showing that the right thing happened in a particular case. We will need a way to specify correctness (this is the question I was pondering yesterday), and then a way to prove that the conversion program meets the specification. (2) As Mike Kay has already pointed out, some programs we wish to write to process XML are not aimed at preserving all and only the important information in the input. Can we apply to them the methods we develop to check information preservation? I would have said "no, clearly not", but your mail (thank you!) has made me think again. Take the common case of a stylesheet to display an XML document in a web browser. Some information in the input document is almost certain to be lost in the HTML version of the document -- it is, as people sometimes say, a down-hill conversion (at least usually). But perhaps we can specify that that information counts as information we don't (in this case) care about. I'm not quite sure what to do with requirements like "Keywords should be typographically distinct, and there must be a blank line between function definitions" (I'm thinking about the HTML stylesheet I was writing the other day, for the Alloy modeling language). But perhaps we can treat those typographical points as conveying information (and any failure to observe them as conveying distinct and erroneous information). So, thank you. I'll have to think more about that direction. But to take another case -- suppose we have a program (XQuery or XSLT or something else) which takes (a) an invisible-XML grammar, in XML form, and (b) a string of characters, and has the task of producing either a correct parse tree for the input string, given the input grammar, or else an XML document correctly signaling that the input string is not a sentence in the language defined by the grammar. There is, perhaps, some sense in which all the information conveyed by the parse tree is already present, implicitly, in the input string and input grammar taken together. But that feels to me a little too much like Wittgenstein's claim that if we really understand any sentence S, then the logical consequences cannot possibly surprise us. (If that is so, then has anyone ever really understood any sentence in any language?) In practice, the information we can readily extract from the grammar and sentence is not the same as the information we can readily extract from the parse tree -- that is why the program is potentially useful. In that and similar cases, I don't see a way for comparison of information content to help us. Or to take a simpler case: for many implementations of ixml, the first thing they do with the input grammar is translate it into an equivalent grammar in a BNF-like form. The two grammars are (if the translation is correct) equivalent, in the sense that L(G) = L(G'). But since the relation between the form of the grammar and the language it describes is complex, I don't see a practical way to check for correctness by checking that the information content of G and of G' is the same. (If I remember correctly, the identity of L(G) and L(G') is in the general case undecidable.) Thank you for making me think harder. Michael Hans-Juergen Rennau <hrennau@yahoo.de> writes: > A most interesting question. My thoughts tend in a direction different > from pre- and post-conditions. The focus should be on the comparison > of structured information. As you are speaking of "translation from > one format to another", I consider information content as a possible > yardstick. Imagine source and target formats can be mapped to their > semantic information content - expressed e.g. by RDF or some other > representation as recently presented at xmlprague 2024 [1]. The > correctness of the transformation can then be assessed by comparing > the information content of source and target. I think the possibility > of mapping tree-structured information (XML, JSON, ...) to > tree-independent information content does not receive the interest > which it deserves. > > Kind regards, > Hans-Jürgen > > [1] Kottmann, Renzo; Cedric Pauken; Andreas Schmitz: Simple Semantic Data Modeling in XML (SeMoX), xmlprague 2024 > https://archive.xmlprague.cz/2024/files/xmlprague-2024-proceedings.pdf , p. 231 f > > Am Mittwoch, 3. Juli 2024 um 15:47:59 MESZ hat C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> Folgendes geschrieben: > > Roger Costello's recent question about how to show the correctness of a > translation from one XML format to another very similar one suggests a > related question. Forget *showing* that an XML transformation is > correct -- how would you define correctness formally, if you wanted to > be able in principle to provide a machine-checkable proof of > correctness? > > For imperative languages, one way is to define a pre-condition which the > caller of a program or function must guarantee, and a post-condition > which describes what the program or function will achieve. Written in a > Hoare triple, pre-condition P, post-condition Q, and code S can be > depicted as {P}S{Q}. > > But the logical world illustrated by typical descriptions of Hoare > triples feels remarkably simple -- atomic values assigned to variables. > > What language would one need in order to formulate plausible pre- and > post-conditions on XML transformations, or more generally on functions > or procedures that operate on XDM instances? > > Asking for a friend. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|