[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: defining correctness for an XML transformation - how?
Roger L Costello <costello@mitre.org> writes: > As I mentioned previously, I need to convert <Airport_Name> to > <name>. Here is an example conversion: > ... > The value of <name> is identical to the value of <Airport_Name>, > except trailing spaces are removed. > I have a file containing an <Airport_Name> for every airport in the > world. The airport names are expressed using ASCII characters. > > Suppose I iterate over every <Airport_Name> element in the file, > convert its value to the appropriate <name> value, and evaluate the > following predicate: > > normalize-space(Airport_Name) eq name > > Suppose the predicate evaluates to true for every conversion. > > Have I verified the correctness of my conversion? Maybe. Your specification of correctness seems to involve more hand-waving and more silent assumptions than I think is healthy if you care about verifying the correctness of the conversion. Even if I take the observation that The value of <name> is identical to the value of <Airport_Name>, except trailing spaces are removed. as a requirement and not just an observation about the one example, I have the feeling that you have left some things out. - Is it required that every 'Airport_Name' element in the input correspond to exactly one 'name' element in the output? Possible failures: omissions; duplications. - Is it required that every 'name' element in the output? correspond to exactly one 'Airport_Name' element in the output? Possible failures: injection of new data; merger of input elements. - Is it required that the airport names in the input be distinct? - Is it required that the airport names in the input be spelled with characters in the range U+0020 to U+007F? You say that, but the schema fragment you give does not enforce it. - Is the input an XML document or just a sequence of 'Airport_Name' elements? If the former, what is required to happen to the root element? - Is the output to be an XML document or just a sequence of 'name' elements? - Is the input allowed, required, or forbidden to have elements other than 'Airport_Name' elements and a root element? What happens to those elements? - Is the output allowed, required, or forbidden to have elements other than 'name' elements and a root element? - If the input does not satisfy the pre-condition(s), is the transformation obligated to recover silently (how?), recover with a warning (how?), or fail? - Is the order of 'name' elements in the output required to match the order of 'Airport_Name' elements in the input? Note that unless the order of names in the input carries some information, re-ordering in the output will not change the information content of the dataset. But if the presence of duplicates carries information, you won't be able to say "the set of names in the output is the same as the set of names produced by stripping whitespace from the names of the input"; you'll need to talk about bags or multi-sets instead. > If yes, then the way to verify the correctness of an XML-to-XML conversion is: > > 1 Validate the source element against the XML Schema for the source element > > 2 Validate the converted target element against the XML Schema for the target element > > 3 Create a predicate that the source-target conversion must satisfy and evaluate the predicate > > 4 If all predicates return true, then the correctness of the conversion is verified > > Do you agree? Not in the general case, no. For the task of information-preserving translation from one notation into another, these points seem to me likely to be necessary but not always sufficient. (If I could say when they won't be sufficient, I would be closer to a satisfactory understanding of this problem than I now am.) In other cases, I'm not sure they can be applied without more careful formulation. Consider an XML-to-XML conversion whose input is the standard XML representation of a conforming ixml grammar and whose output is an equivalent grammar in something like BNF form (no alternatives except at the top level of the right-hand side, no expressions of the form E* or E+ or E?, etc.). There is not a one to one relation between all input elements and output elements, and attempting to define a one-to-many or many-to-many relation so that I can check a predicate on all pairs seems likely to end badly. If G and G' are the input and output grammars respectively, then correctness here requires something like: - G' has no 'alts', 'repeat0', 'repeat1', 'option' or 'sep' elements. - In G' every 'alt' element has a 'rule' element as its parent. (This follows from the preceding plus schema validity, but just in case.) - L(G') = L(G). - For any input string s, the XML output produced by parsing s against G is identical to the XML output produced by parsing s against G'. This requires, of course, that L(G) have a specified meaning. (As Liam Quin has pointed out, sometimes the domain-specific meaning of the XML matters for correctness.) Or consider an XML-to-XML transformation which takes a document in (an XML form of) the Alloy modeling language and produces a document in the same format in which a new column is added to a particular set of relations. Again the domain-specific meaning is central to any specification of the requirements for the transformation. In both case, there are going to be clear correspondences between some of the input elements and some of the output elements, but there will not be a 1:1 mapping between the elements of the input and those of the output, and it may or may not be possible to define a predicate that captures just how pairs of corresponding elements must relate. For the EBNF to BNF translation, it is not required by the specification given above that the set of strings generated by a given nonterminal N in G be generated by N in G'. It is not even required that there *be* a nonterminal N in G'. Preserving all the nonterminals of G in G', and preserving the sets of strings they generate, is certainly a good idea because it makes proving the correctness of the transformation much easier. But that's an implementation choice, not a condition of correctness. Thank you for the concrete example. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|