Fallacies of Validation, version #2
Hi Folks, Many thanks for all the outstanding comments! Below I have updated the list of fallacies (note 3 new fallacies) and elaborated upon the previous fallacies, using the examples and information that you provided. As always, comments are very welcome. Fallacies of Validation 1. Fallacy of "THE Schema" 2. Fallacy of Schema Locality 3. Fallacy of Requisite Validation 4. Fallacy of Validation as a Pass/Fail Operation 5. Fallacy of a Universal Validation Language 6. Fallacy of Closed System Validation Let's examine each of these fallacies. 1. Fallacy of "THE Schema" This fallacy was identified by Michael Kay: > ... there's no harm in using XML Schema to check data > against the business rules, so long as you realize this > is *an* XML Schema, not *the* XML Schema. We need to stop > thinking that there can only be one schema. Len Bullard made a similar statement: > ... most fundamental errors are ... to consider only a single schema. and at another point Len states: > ... fall into the trap of thinking of THE schema and not > recognizing the system as a declarative ecosystem of schemas > and schema components. Both Michael and Len are stating that in a system there should be numerous schemas. This is a big mindshift for me. I admit being trapped into thinking that there should be a single schema. Len responded to my query to define "declarative ecosystem". I think that this term is a very important term and underlies much of what is presented here. Here's what "declarative ecosystem" means: Every system lives within a world where there is a lot of variety, i.e., systems aren't islands. For example, the Wal-Mart system must coexist with its supplier systems, its distributor systems, and its retailer systems. One can think of this system-of-systems as an "ecosystem". Thus, the Wal-Mart system resides in an ecosystem. Each system within the ecosystem has their own local requirements which are documented by their own (declarative-based) schemas. Thus, not only are there a bunch of systems which must coexist, there are a bunch of schemas that must coexist. This ecosystem of schemas is a "declarative ecosystem". [Len, have I accurately defined the term?] Oh, one more comment on declarative ecosystems. Len made this remark which I think is important: > ... [if two systems are interoperating in a > closed environment then] it doesn't matter how > singular or multiple they [the schemas] are; > but when they are in an ecosystem, they typically > overlap and exchange information, and adapt as a > result. [Mindblowing ideas Len! Schemas exchanging information and adapting. Wow!] Okay, now back to the fallacy of "THE schema" ... Many examples were provided to demonstrate the value of multiple validations: Len provided an example of a distributed reporting system: > Look at any large reporting system. You can build > that up a large schema but given local variations, > do you have sufficient power/force/authority to > make them stick or will you be constantly adjusting > them, loosening them, strengthening them, and how > will you know which is the right thing to so? I would like to elaborate further on this. Suppose that a company has an office in London, Hong Kong, and Sydney. They all report to the main office in New York. With such a geographically dispersed collection of offices, it is easy to imagine that there will be local variations. There will probably be some data that is common to all the offices (Rick Jelliffe calls the constraints on this type of data invariant constraints). Then there will be locale-specific data (variant constraints). So, it doesn't seem reasonable to assume that a single reporting schema would suffice for this geographically-dispersed organization. [Len, have I captured your example accurately?] Mary Holstege and Michael Kay gave examples of the value of multiple schemas in a workflow environment: From Mary Holstege: > ... suppose all you care about in some phase of > processing is picking up the IDs in a document. > Then you define a minimal schema where everything > is open with the appropriate ID attributes. Maybe > you're going to generate an index. In another > phase of processing all you care about is checking > that dates are in the right date range. So you have > another minimal schema that only pays attention to dates. From Michael Kay: > One example I am thinking of is where a document is > gradually built up in the course of a workflow. At > each stage in the workflow the validation constraints > are different. You can think of each schema as a filter > that allows the document to proceed to the next stage of > processing. Finally, Len made a good statement: > Sometimes, a single schema suffices for the whole > system. Sometimes, you needs lots of little ones. 2. Fallacy of Schema Locality Len identified this fallacy: > ... most fundamental errors are to consider schemas only at the external system junctions ... To be honest, I am not clear on this fallacy. I believe that what is being said is this: if you build a system with local customs hardcoded into it, but then deploy it into a global environment ... that's a real bad mistake. An example of this is Michael Kay's example of interacting with an online U.S. service that insisted on users providing a state code. Clearly, the online service was built with local customs hardcoded, but then deployed in a global environment. Here's a comment that Len made on this fallacy: > The problem of locale is that it is declared > locally but might require global management. Can someone tell me if I have captured this fallacy accurately? 3. Fallacy of Requisite Validation Yesterday Michael Kay made a very compelling statement with regards to whether validation should be done at all in certain situations. Michael was responding to the example of an online service validating a user's address. Here's what Michael said about the online service's insistence on validating the user's address: > The strategy (validating the user's address) assumes that > you know better than your customers what constitutes a > valid address. Let's face it, you don't, and you never > will. A much better strategy is to let them (the user) express > their address in their own terms. After all, that's what they > do in old-fashioned paper correspondence, and it seems > to work quite well. Michael argues very effectively that in this situation it makes no sense to do any validation at all! 4. Fallacy of Validation as a Pass/Fail Operation Mary Holstege identified this fallacy. Here's what she said: > [Many people think that validation is a pass/fail operation.] > Not so, although lots of people are still stuck in that way > of thinking, including, alas, a lot of the vendors. > The schema design goes to great pains to make it possible to > do things like this, for example: validate a document against > a tight schema, and then ask questions of the result such as > "show me all the item counts that failed validation because they > were too high" A quick scan of Rick Jelliffe's latest message indicates that he disagrees with Mary on this fallacy. Perhaps some more discussion is in order? 5. Fallacy of a Universal Validation Language Dave Pawson identified this fallacy. He noted that the Atom specification cannot be validated using a single technology: > From [Atom, version] 0.3 onwards it's not been possible > to validate an instance against a single schema, not > even Relax NG. They need a mix of Schema and 'other' > processing before being given a clean bill of health. 6. Fallacy of Closed System Validation This fallacy was identified by Len a long time ago. I still remember something he said one day when discussing closed versus open systems, "Systems leak. There's no such thing as a closed system". This is an important comment. Many people imagine that they can create a monolithic, invariant schema because "there's just me and my well-known trading partners". This statement fails to recognize the existence of a changing world; more precisely, a changing ecosystem. One last thing - my favorite term of the day (can you guess?), and my favorite quotes of the day. Favorite Term: Declarative Ecosystem Favorite Quotes: 1. [From Len] "I can't separate social rules from engineering fundamentals. I apply engineering fundamentals to implement social systems." 2. [Also from Len] "Even if one thinks it easier to manage a single schema, command and control in adaptive systems is distributed. A schema is control." [Wow! I never thought of schemas in this fashion. Great stuff Len!] Thanks again everyone! Please keep the comments coming. /Roger
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format