[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Questions on DCD
>>6) I know this is not going to make me popular, but I think that there >>are too many datatypes > >I do too, and I've warned my co-editors to expect massive amputations >in the committee process, if DCD ever gets taken up. > I think everyone thinks that this is the case. The problem is that if you have too few, then many applications will require ad hoc and inconsistent extensions of the mechanism. Too many and you bog down all processors with having to deal with lots of types that most people don't use (and you still don't cover all the bases.) I made a proposal related to the whole type validation mechanism, which really extends beyond DCD but would address a lot of its issues as well as allow for much more extensibility. I'd like to post it here for comment (with some internal information removed) in order to perhaps bash it out as a general type validation mechanism. However, right not its in a Notes database and if I post it here its going to look so horrible that it will probably be unreadable. But, just in case, here it is. I'd appreciate any comments on it (if you can read it.) Posting stuff to this mailing list from Notes generally seems to totally destroy its format. Overview This document is related to the DCD proposal made to the W3C, and more specifically the DCD 'constrain mechanism'. DCD provides a means for indicating constraints on the values of elements and attributes. This mechanism is provided via the Min/MinExclusive and Max/MaxExclusive properties, and has 'falls within a single range' semantics. In other words each element or attribute definition can express a value range (inclusive or exclusive) within which the value of each instance of that element or attribute in the target XML file must fit, in order to meet the constraints. The purpose of this document is to propose an alternative constraint mechanism which we feel is no more complex and far more flexible than the one current proposed. Just to provide a refresher course for the existing DCD constraint proposal, here is an example snippet from a DCD which defines an attribute which is of type int and which has a range of 1 to 10, inclusive. <AttributeDef Name="Foo" Datatype="int" Min="1" Max="10"/> The same mechanism applies to element definitions as well. The content of the min/max values must of course make sense for the declared data type of the element or attribute. After doing a quick and dirty demonstration program of DCD (based on the existing functionality in the XML4J parser), the XML Team at JTC-SV would like to put forward a proposal which we feel makes the constraint mechanism more useful and extensible, without placing an undue burden on the common case of a simple document with simple validation needs (i.e. not constraints required, just structural validation.) The overall goals of this constraint mechanism are: Minimum code support requirements in the core parser architecture (i.e. minimal cost for those who don't use it) Reasonable implementation size and complexity Open endedness and flexibility for the uncommon case and user Simplicity of understanding and use for the common case and user We feel that a constraint mechanism is probably achievable which meets these requirements. As the likely targets of such a proposal, we obviously do not want to propose something which is not achievable and maintainable with reasonable effort, so we certainly hope not to contribute to the growing perception that 'deep thoughts' in the XML world are out of hand, and real world implementation is suffering for it. Driving Forces The primary driving force of this proposal is a belief that the constraint mechanism currently expressed in the DCD proposal is insufficient to meet more than a small fraction of the needs of the possibly quite wide target audience. We understand the reasoning behind this initial proposal, i.e. to maintain a level of simplicity that would increase the likelihood of acceptance and implementation; however, we feel that the current mechanism is sufficiently limited that its implementation might be counterproductive. The reasoning is that almost any real world application of the technology would require some amount of manual extension. Such extensions are not possible within the existing specification, and hence would almost certainly be implemented in a haphazard way, hindering interoperability of implementations. Also, since any such haphazard extensions have the potential of becoming defacto standards, we would like to avoid having such 'design by aggregation' imposed upon us by the marketplace. By providing a more extensible mechanism up front, we would hope to avoid this scenario, since any reasonable extension of the mechanism could be made without stepping outside the system provided. And thirdly, though obviously useful, the limited constraints expressable in the existing system does not seem sufficient enough to warrant the effort of implementing a constraint mechanism in the parser. Such a mechanism is non-trivial and imposes some mimimum of unavoidable overhead on the parser. For such an effort to be made and such a performance burden to be accepted, we would very much prefer to achieve more powerful constraint checking for our buck. The Basic Concept Our concept is based loosely upon the existing experience of spreadsheets, which are probably the prototypical example of simple 'application development' for the end user. In particular, the 'function' concept of the spreadsheet, which provides an easy to understand mechanism for doing simple arithmetic and logic operations. These functions are in the form of a simple function call which evaluates its parameters and returns a boolean pass/fail result. So, at its simplest level, a constraint expression would look something like this: <AttributeDef Name="Foo" Datatype="int" Constraint="IsInRange(1,10)" /> In this scenario, a "Constraint" property is introduced. Its value is a string which expresses some constraint by way of a 'function syntax' expression. In this case the function is "InRange" and it takes two values, the minimum and maximum values of the range. All constraints will be of this form. High Level Implementation The implementation of this proposed validation scheme is relatively straightforward. It can be delivered in three conceptual layers, each of which provides increasing levels of sophistication for increasing levels of effort and coding skill. These layers will be discussed here in detail, as well as how those layers can be fit together and 'delivered'. Intrinsic Functions At the core of the validation system there will be a set of intrinsic functions, which are provided with the parser implementation, and which should be required in any DCD implementation by the specification. This will insure interoperability of core validation services. These functions will be selected for their high 'bang for the overhead buck' appeal, i.e. they will meet hopefully 90% of the common case needs with minimal overhead (since they will be packaged with the parser core.) A likely set of core functions would be: Name Example IsEqualTo Constraint="IsEqualTo(5.0)" IsGreaterThan Constraint="IsGreaterThan(&BaseLevel;)" IsLessThan Constraint="IsLessThan(25)" IsInRange Constraint="IsInRange(&ValidRange;)" IsOneOf Constraint="IsOneOf(Blue, Red, Pink)" IsTrue Constraint="IsTrue()" IsFalse Constraint="IsFalse()" IsEven, IsOdd Constraint="IsEven()" IsMultipleOf Constraint="IsMultipleOf(255)" IsInMultiRange Constraint="IsInMultiRange(1-10, 90-100)" IsStrEqualTo Constraint="!IsStrEqualTo('We the People')" IsDigit, IsChar, etc... Constraint="IsHexDigit()" And, Or, Xor Constraint="And(IsInRange(1, 90), !IsMultipleOf(5))" This set of functions should meet the needs of quite a wide range of applications, though there might be a couple more fundamental ones that could or should be added. Though the semantics of these are quite obvious, a little discussion of the finer points is presented before we move on. First of all, notice how these functions leverage the power of general entities, by allowing flexible replacement of function parameters. This capability will provide a lot of power to modify the validation over time without changing the DCD itself. This is not in an of itself an improvement over the existing validation scheme, since entity replacement is inherent to XML; however, the more expressive the validation mechanism, the more leverage is gained. Secondly, note the second to the last line, which describes the 'character type' functions. These can be mapped pretty directly to the language support for such things, and will provide a nice way to check a lot of characteristics of single character fields. There are language and locale issues involved here, which will be discussed at the end of this document. Also, note the last line which defines some boolean logic functions. These can be intrinsically handled by the processor itself, and will support much more complex constraints built from more basic ones. As long as we limit the nesting to something reasonable such as a single level, the complexity of these functions will be quite small. They will merely be a recursive container and invoker of other functions, with a little evaluation of the boolean results of each one. Though the example shows two parameters, there is no reason why it cannot easily allow an open ended number of subexpression parameters. Negation is implemented by the '!' prefix before a function, as in the last line where the function checks that the value is both in the range 1..90 inclusive and is not a multiple 5. This provides a lot of flexibility and avoids the need for having explicit Not versions of functions, and the implementation of it is ultra trivial. In the IsStrEqualTo() example checks that the value is not equal to "We the People". The amount of code to implement these intrinsic functions, above and beyond the basic amount of instructure required to support constraint checking at all, is very trivial. Most of them will resolve to singe lines of evaluation code. To insure openness, the function mechanism will probably be based on the namespace proposal as well. So, in reality, the above functions would actually be part of a "Htpp://W3C.Constraints/DCDStd" namespace for instance. This will allow a convenient partitioning of the function namespace, as well as a very flexible way of providing alternative processing by just mapping the namespace prefix to another URI that maps to a different set of functions! Third Party Functions The next level of support would be the ability to plug in third party validation functions. This would open up the system considerably by providing a well defined delivery mechanism for functions, to which third parties could write. As long as these functions can be expressed with the simple function syntax described above, they can be as complex as the developer wishes them to be and the user is willing to deal with. Support for third party functions requires a well defined interface to which they can be developed. This required interface is actually quite simple and convenient, and will have very few semantic demands to be met. The very simple semantics insures that open endedness is not compromised by the interface. A proposed interface is described below. Custom Functions At the upper end of the spectrum are custom applications which would provide their own functions for doing very domain specific constraint validation. These could include PIN number validation, database lookup of names or ids or social security numbers, and on and on. Our proposal provides a flexible back door for the validation mechanism to accomodate the most complex imagineable validation, without increasing the overhead of the common case by a single CPU cycle. Though there is no limit to the complexity or sophistication that these custom functions could achieve, there are no implementation issues here which go beyond those of the third party function development scenario discussed above, at least from our perspective as the parser provider. Implementation Details This section puts forward a specific example implementation that we believe will meet all of the requirements and fulfill all of the promise of the proposed system. Example Java implementations are presented, but the implementation would be easily done in C++ or any other quality object oriented language. The Function Interface A function is represented in the implementation as a simple abstract interface class. The interface is extremely simple, but allows the system to manage them and invoke them generically and reasonably efficiently. For this discusion, the interface is called ValFunction. A concrete implementation of it would look something like this in Java. This very simple class would allow the functions to be managed and invoked very simply and easily. Of course this is not a very complex example, and could be achieved by way of an intrinsic IsLessThan() function, but it shows how one would implement a simple function class. class ValidSalary implements ValFunction { // Default ctor only because they are factory created ValidSalary() { } // 'Parsing' method public void Parse(String[] astrParams) { // We only take one function param of maximum salary if (astrParams.length != 1) throw SomeError(); // Try to convert to our max salary member fMaxSal = new Double(astrParams[0]).doubleValue(); // Format our constraints into the description string strDesc = new String("< " + fMaxSal); } // Evaluation method public boolean bEvaluate(String strValue) { // Convert the string to a double and compare to max double fTmp = new Double(strValue).doubleValue(); return (fTmp < fMaxSal); } // Reporting method for errors public String strConstrainDesc() { return strDesc; } // Private data double fMaxSal = 0; }; The constructor is a default since functions will generally be 'factory created'. However, the factory can certainly invoke them with particular parameters. More on this below in the "Function Bundle Interface" section. The Parse() method is called once during the evaluation of an element or attribute which declares a constraint that uses the function. The contents of the function (the stuff after the function name, i.e. inside the function's parenthesis) is passed to the parser method in an array of strings which represent the comma separated function parameters. The function will evaluate these parameters, which represent the validity constraints set up for the element or attribute, and store that information in some (hopefully) optimal internal format. In the example above, which validates maximum salaries, it converts the single parameter to a double and stores that for later use
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|