[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Extensible schemas and xs:any
From: "Murray Spork" <m.spork@q...> > I am however starting to question the value of doing an "all-at-once" > validation - or if I shouldn't just do the validation in a 2 stage > process - validate against Main first (ignoring any child elements of > Stuff) - then extracting the Stuff element and validating it seperately > against its own schema. This is the question I intend to ask on > xmlschema-dev - what approach do people think is better? If you are a human validating documents, it is definitely better to be able to validate in stages: * in layers (e.g. well-formedness, then structures, then co-occurrence constraints, then datatypes, then uniqueness, then references, to pick one possibility); * in islands (e.g. my tables are all correct, then my prose sections are all correct, then my metadata sections are all correct, then my cross-references are all OK); * by entity, or * by severity level. So your suggestion of validating some elements first, then others is good. You can do this in XML Schemas, but it does not provide any real support for it. (Contrast with Schematron, which provides "phases" to give language-level support of validating according to a test plan.) For programmers, if we implement systems that are at the limit of our comprehension, we are begging for bugs and unmaintainability. If we look at non-XML validators, we can see that features for staged validation is important. For example, SP (i.e., for SGML) provides a lot of options to customize which reports are generated. Looking at Java validators, you can see that tools like AntiC, JLint and ManMachine's wonderful metrics program JStyle (not the indenter of the same name) provide a good degree of user-selection of which problems are reported. But many of the XML libraries provide terrible support for validation. Xerces 2 for Java did not even report line numbers until recently, for example. And when errors are given, they are directed at programmers or gurus. It is laughable to see error messages with the word "null" in it; what on earth is a normal user supposed to make of a programming term. Often errors are incorrect anyway. A beta tester for our upcoming product, reported that when faced with this <!DOCTYPE x PUBLIC "xxx"> the error message comes to the effect that "a space is required before the system identifier". But there is no system identifier there! I would say, apart from the understandable immaturity of XML libraries, there are two causes promoting this problem. First, the Draconian error policy of XML combined with the limitations of grammar-based languages or validators (where it may be difficult to get back on track after a parsing error has been found) tends to force people to work validating in document order. But that may not be order in which the user wants to be working in. Second, the focus on validation as a (contractual act of) QA, of acceptance testing with a binary result, tends to sideline the needs of people who need incremental validation. With XML Schemas, it would be nice for validation APIs to let us query the schema, for example, to ask "does this element have more than one definition" (e.g. several local ones, or a global and a local) and, if so, to allow an element to be validated with the "or" union of all the types. You might need this if you are validating an entity without the parent, and you don't want to care about the state of construction of the parent, for example. So, in general, I suggest using XML Schemas very conservatively, and using Schematron for as many of fiddly bits as possible. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|