[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Caution using XML Schema backward- or forward-compatibilit
Excellent discussion! Michael has brought into the discussion a very useful idea: semantic drift. He asserts that it "happens naturally in the real world". I assert that it also occurs naturally and often in data versioning. Here are two examples of semantic drift in data versioning: EXAMPLE #1 Consider this simple XML document: <distance>100</distance> In the v1 XML Schema the <distance> element is declared as follows: <element name="distance" type="nonNegativeInteger"/> The data specification document defines distance as: "Distance represents the length measurement from center of town." In the v2 XML Schema there is no change to the declaration of the <distance> element: <element name="distance" type="nonNegativeInteger"/> However, the data specification document redefines distance: "Distance represents the length measurement from the town line." The we have an example of two versions that are "validation-compatible" but "semantic incompatible." The semantics of "distance" has drifted from v1 to v2. EXAMPLE #2 Consider the same simple XML document: <distance>100</distance> In the v1 XML Schema it is declared differently: <element name="distance"> <complexType> <simpleContent> <extension base="nonNegativeInteger"> <attribute name="units" fixed="miles"/> </extension> </simpleContent> </complexType> </element> The <distance> element now has a "units" attribute which is fixed at "miles." The data specification document defines distance as: "Distance represents the length measurement from center of town." In the v2 XML Schema the declaration of the <distance> element is modified; the units attribute is fixed at "kilometers": <element name="distance"> <complexType> <simpleContent> <extension base="nonNegativeInteger"> <attribute name="units" fixed="kilometers"/> </extension> </simpleContent> </complexType> </element> The data specification document is unchanged in its definition of distance: "Distance represents the length measurement from center of town." Thus, we see a second example of two versions that are "validation-compatible" but "semantic incompatible." The semantics of "distance" has drifted from v1 to v2. COMMENTS 1. I think that these examples illustrate two common changes in data. Do you agree? 2. In the examples, the XML instance document: <distance>100</distance> validates fine against both the v1 and v2 XML Schemas. But if the applications that process the XML instance aren't changed, then the processing results may be incorrect. CAUTION Just because an application can validate an XML instance document, doesn't mean it can process the XML instance document. QUESTION Can you state in one sentence the fundamental lesson to be learned in our discussion? /Roger -----Original Message----- From: Michael Kay [mailto:mike@s...] Sent: Thursday, December 27, 2007 6:13 AM To: 'Stephen Green'; Costello, Roger L.; xml-dev@l... Subject: RE: Caution using XML Schema backward- or forward-compatibility as a versioning strategy for data exchange > e.g. because an element wasn't made optional it > cannot be removed and so there is a temptation to change its > semantics - to reuse it for something else rather than remove > it. Yes, "semantic drift" is a big problem and of course it happens even in the absence of schema change. Semantic drift happens naturally in the real world, for example credit card numbers which once identified an account might start to identify a specific card with access to that account. It's not surprising that it happens, because if a system is capable of meeting new requirements without requiring any software changes then people will use it creatively in new ways to meet those requirements. One of the challenges in designing schemas (or database integrity constraints) is knowing whether you should try to resist semantic drift as a menace to information integrity, or whether you should allow your system to ride the waves, thus increasing its flexibility and longevity. System designers often underestimate the creativity of users in applying semantic overloading to data structures. I saw one system where users were marking certain records for review the following day, simply by entering a particular code that was known to be invalid and would therefore appear in tomorrow's validation report. The system designers helpfully introduced stronger validation at data-entry time, and chaos ensued because the users had to invent a new process. Michael Kay http://www.saxonica.com/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|