[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A single, all-encompassing data validation language - good
Well, this started up just as I was going on vacation but: > > Up till this date, grammar-based and rule-based languages have been > kept separate: > > Grammar-based Languages: XML Schema, Relax NG, DTD > > Rule-based Languages: Schematron, RuleML James Clark one time said something about wanting to make sure that your markup language only addresses one need (can't find or remember the relevant wording) This saying is however problematic. Because it needs some sort of constraints as to how we decide if a language addresses only one need (requirement, problem domain, whatever you want to call it) Using the concept of validation as an example I think we can see what this constraint should be and it is simply that we can talk about meaningfully from a computational aspect. I cannot discuss validation meaningfully from a computational aspect and I don't think the many people of greater capabilities than mine on this list can either. what can be discussed meaningfully is stuff like graphs, grammars, contexts etc. So I would say from this premise, don't have a general purpose validation language have general purpose grammar languages, data typing languages, and so forth. > What do you think about XML Schema working group incorporating > rule-based capabilities into the language? Its not as powerful as Schematron. So if I am starting a project that needs a grammar I will use the grammar capabilities in the grammar language of choice. When I need to use context specific checking I will switch to schematron instead of using the context checking capabilities of my grammar language. Why? Because 1. Schematron is simple. 2. I can't be sure that all of the context checking I need to do over the course of the project will be met by the capabilities of the grammar language. 3. I don't needlessly complicate the process of writing either validation language, and I can be more sure of having it work well across all platforms if things are kept relatively simple. > Here are some potential advantages and disadvantages: > > ADVANTAGES > > 1. Need only one language to express all data validation requirements. My point earlier. My data validation requirement might include that x > 500 value of x is divisible by value of y accessible over http value of x is in the allowedXvalues column of the big X database. that the encoding of the document is utf-8.(really, I have seen this requirement) Data validation is an open ended thing that cannot be described meaningfully other than to say we know it when we see it therefore one should not try to make a domain specific language to handle what is evidently a general programming language problem. > 2. Possible performance improvement (as compared to separate languages > with separate validations). Likely performance degradation if you have requirements for multiple types of validation - why? 1. The simpler the model of what something does the easier to optimize, in my experience. 2. In relation to the above certain types of validation may be more amenable to streaming. 3. My experience of real world (data) validation implies that a pipeline is the best model for performance optimization because I am always running into situations where certain conditions must be met in the incoming document that if they are not met can mean they will be thrown away.Example, in the Danish UBL routing is done via an EAN locations number, if this number is not present or is not a correct EAN then the document cannot be routed and should be returned. Obviously it makes no sense to validate the whole document if you have a constrain that means you won't need to. This relates to streaming requirement above. > > 2. Is grammar validation of a fundamentally different nature than rule > validation? Perhaps, if we say that what one has is a grammar based language then there are certain things that gives you. These are 1. the ability to easily implement in editors. 2. most grammar based languages make reading the grammar and understanding what is allowed relatively easily. I remember Rick Jelliffe suggesting that it would be easy enough to implement a schematron based editor but I think it would be pretty difficult to place the contexts of where one should edit or do auto completion very well. Sure, finding the errors and reporting them is easy but the more helpful aspects of modern editors? > > 3. If so, is it reasonable to merge two fundamentally different things? > 4. Is it in the best interest of the marketplace to have a single, > all-encompassing data validation language, or is it better to have > multiple data validation languages that work together? for me, multiple data validation languages. Especially as the word validation seems to be a very changeable one. Cheers, Bryan Rasmussen
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|