[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How to assess the correctness of a Format1 --> Format2 map

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: xml-dev <xml-dev@lists.xml.org>
  • Date: Wed, 25 Jan 2023 12:14:20 +1100

Re:  How to assess the correctness of a Format1 --> Format2 map
Yes to Michael.  

Perhaps it is better to say that the first are people who think they are being mathematical (often having an academic background), and the second are those who think they are being pragmatic (often having a management background.)  :-)

In reality, you typically get neither: a few low-hanging-fruit assertions and unit tests that neither meet any kind of proof standard that the mathematics require, nor are based on the prioritized risk (cost/benefit) assessments that the pragmatics equires. (These may compromise the usefullness of either approach, even when better than nothing.)

Of course, hybrid approaches are possible: the best pragmatalists emphasize feedback (quality-in-use metrics, continuous improvement) because where there is one problem, it indicates a point where there is likely some confusion or unclarity that may gave caused another error. It may be that the risk assessment at that point is that it is economically worthwhile to adopt the  "mathematic" techniques.

So it seems to me that the Intel story is of both the failure of pragmatism (because they failed to properly assess the risk/cost/benefit and so chose ad hoc? methods not up to job) and the success of pragmatism (they first tried a gamble whose odds seemed favourable to them, but then re-assessed based of feedback.)

I don't see that either approach stands without the other: how do we know that our formal proof system was proving the aspects of your system where problems actually occur, unless we have a risk asessment/prioritization/feedback system. For example, if for some organization the main risk could be that changes to business rules/logic cannot be deployed fast, some kinds of program proving may be irrelevant. "Quality" is more than "correctness".

Regards
Rick




On Wed, 25 Jan. 2023, 05:05 C. M. Sperberg-McQueen, <cmsmcq@blackmesatech.com> wrote:

Rick Jelliffe <rjelliffe@allette.com.au> writes:

> Michael's point are good.
>
> I think it is important to recognize that there are two distinct
> approaches/cultures/skill-pools to "testing".
>
> The first is the mathematical, and uses the language of program
> proving, correctness and completeness.
>
> The second is the pragmatic, and uses the language of risk,
> prioritization, sampling and loci of errors.
>
> Which is best? Budget often decides.

Good observations.

I think I'd take issue with 'pragmatic' as a descriptor of the second
approach; the descriptions are otherwise pretty much value-neutral and
reasonably fair, but describing one as 'pragmatic' tacitly suggests that
the other is not pragmatic and thus not practical.  I think that that is
a story those who prefer the second approach often tell about the first
approach.

It's quite true that budget often decides, and also that budget-driven
decisions depend a lot on whose budget it is and how cost accounting is
done.  Once Intel had established, empirically, that quality-control
methods that incur a $475 million charge for a chip recall do not
actually save money for the company (as opposed to the project or
department) over more expensive proofs of correctness, Intel is said to
have changed some of its quality assurance culture.

It's interesting, I think, that while the cultural divide Rick
identifies is very real, there are some approaches to this problem that
seem to straddle the divide.  The 'cleanroom software engineering'
approach feels very mathematical to developers who don't use formal
approachs (as well as feeling very informal to those who actually
practice formal methods), but its testing approach is firmly based on a
statistical characterization of expected inputs and the kinds of
mean-time-to-failure guarantees it allows its practitioners to offer
depend crucially on (and I assume are formulated in terms of) the
expected inputs, and not on formal proofs of correctness for all inputs.

So I think Rick is right to describe the word 'culture' for the
distinction -- the division between the two is a learned distinction,
not an intrinsic one.

Michael


> Regards
> Rick
>
> On Tue, Jan 24, 2023 at 4:06 PM C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
>
>  Roger L Costello <costello@mitre.org> writes:
>
>  > Hi Folks,
>  >
>  > There exists a proprietary data format called Format1.
>  >
>  > A smart group of people created a non-proprietary data format called
>  > Format2. It's used to store the same kinds of data as Format1. Of
>  > course, Format2 uses different fields and structures.
>  >
>  > I defined a mapping from Format1 to Format2.
>  >
>  > I want to know if my Format1 --> Format2 mapping is correct.
>
>  You may find a paper I wrote a few years ago of interest.  “What
>  constitutes successful format conversion? Towards a formalization of
>  ‘intellectual content’.” International journal of digital curation 6.1
>  (2011): 153-164.
>
>  Some further work specific to equivalence of XML documents is reported
>  in Sperberg-McQueen, C. M., Yves Marcoux and Claus Huitfeldt. “Document
>  lattices: Equivalence, compatibility, and contradiction in document
>  markup.” Presented at Balisage: The Markup Conference 2014, Washington,
>  DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup
>  Conference 2014. Balisage Series on Markup Technologies, vol. 13
>  (2014). https://doi.org/10.4242/BalisageVol13.Sperberg-McQueen01.
>
>  > Here's my idea on how to ascertain the correctness of the Format1 -->
>  > Format2 mapping:
>  >
>  > Use a technology to parse Format1 instances to XML and to parse
>  > Format2 instances to XML. Compare the resulting XML's. If they are
>  > identical, then the mapping is correct.
>
>  The identity of the XML may establish the correct mapping *for that one
>  document*, but only if some conditions hold.
>
>    - The Fubar elements generated from format1 and those generated from
>      format2 must have the same meaning.  That may be easy to guarantee
>      in a particular case, but I doubt that it's guaranteed in general.
>
>    - Your translation from format1 to XML must be correct.
>
>    - Your translation from format2 to XML must be correct.
>
>      I think you have started with an obligation to prove one proposition
>      (that your mapping from format1 to format2 is correct), and ended up
>      with an obligation to prove two propositions, each of which has the
>      same form and looks about as hard as the one you started with.
>
>  It's helpful, I think, to be able to show that the mapping works
>  correctly for a specific document.  But I notice that it's not what you
>  started out to prove.  You started out to prove that your mapping from
>  format1 to format2 was correct -- I think that that must mean that your
>  mapping is correct for all documents in format1. 
>
>  Proving correctness for one document doesn't in itself establish
>  correctness for all documents.
>
>  If I told you I have a program which produces the correct results for at
>  least one set of inputs, you wouldn't accept that as an argument that
>  the program was correct.  (Or, at least, you shouldn't.)
>
>  In that sense, your proposed approach is too weak:  it doesn't not
>  establish what you said you want.
>
>  In another sense, your proposed approach may be too strict:  if the
>  target XML format has more than one way to convey the same information,
>  then XML identity is too strict a test.
>
>  > Data Format Description Language (DFDL) is a standard language for
>  > parsing data formats. Here's the idea using DFDL:
>  >
>  > -  Create a DFDL schema for Format1.
>  > -  Create a DFDL schema for Format2.
>  > -  Design the DFDL schemas to produce the same XML format.
>  > -  To ascertain the correctness of the Format1 --> Format2 mapping do this:
>  >         -  Use the Format1 DFDL schema to parse a Format1 document to XML. 
>  >         -  Use the Format2 DFDL schema to parse a Format2 document to XML.
>  >         -  Compare the XML's:
>  >             -  If they are identical, then the mapping is correct.
>  >             -  If they are within a certain tolerance, then the mapping is correct.
>  >                E.g. of an acceptable tolerance: the Format2 XML omits an
>  >                optional element.
>  >             -  Otherwise the mapping is not correct.
>  >
>  > Am I thinking clearly about this problem (ascertain the correctness of
>  > the Format1 --> Format2 mapping)?
>
>  I'm not sure.
>
>  You say "If [the two XML documents] are within a certain tolerance, then
>  the mapping is correct.  E.g. of an acceptable tolerance: the Format2
>  XML omits an optional element."
>
>  If XML document d1 (derived from format1) specifies three things, and
>  XML document d2 (in format2) specifies two of those things but not the
>  third, I don't quite see why you should conclude that the mapping from
>  format1 to format2 was successful.  You've lost information.  Why would
>  the fact that you defined it as optional in your XML format make a
>  difference?
>
>  To take a simple case: we have translated images in format1 and format2
>  into XML -- for concreteness, let's say we have translated them into
>  something very like SVG.  Document d1 specifies a rectangle, shaded in
>  yellow, with blue text of a certain font and size.  d2 also species a
>  rectangle and text, but doesn't specify the color of either.  Color
>  information is optional in the XML format.
>
>  So ... blue text on a yellow background has become
>  &default-foreground-color; text on a &default-fill-color; background.
>
>  Depending on the goals of your mapping, this either matters (i.e. the
>  mapping has failed) or it doesn't (because all you care about is the
>  sequence of characters in the text, and everything else is irrelevant).
>
>  It's probably better to start by saying what information needs to be
>  preserved in the mapping and what information does not need to be
>  preserved (and perhaps should be thrown away systematically -- are you
>  worried about steganography?).
>
>  > Is mapping instances of the Format1 and Format2 data formats to a
>  > common XML format logical?
>
>  It's certainly a useful way to get them into a format that allows them
>  to be compared directly.  The problem of equivalence between XML
>  documents is not simple (as the example in the second paper mentioned
>  above illustrates), but I like to think it's soluble in principle.  But
>  if you have a skeptical audience, then the correctness of your
>  translations into XML must also be shown.  If you have an audience
>  willing to take the translations from format1 and format2 into XML on
>  faith, why do they not want to take the translation from format1 to
>  format2 on faith?
>
>  Thought-provoking question - thanks.
>
>  --
>  C. M. Sperberg-McQueen
>  Black Mesa Technologies LLC
http://blackmesatech.com
>
>  _______________________________________________________________________
>
>  XML-DEV is a publicly archived, unmoderated list hosted by OASIS
>  to support XML implementation and development. To minimize
>  spam in the archives, you must subscribe before posting.
>
>  [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
>  Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
>  subscribe: xml-dev-subscribe@lists.xml.org
>  List archive: http://lists.xml.org/archives/xml-dev/
>  List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.