|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Cross document validation with Schematron - XML syntax for Xpath?
First some background: we have a large complex Web app that builds 1000's of different input forms from metadata descriptions in the form of XML. This XML comes from many different spots and describes global metadata, user view specific metadata, authorizations and the current data for a given screen. XSLT transforms take this input XML smashes it together into an abstract object mode and this is in turn forwarded on to another XSLT that does presentation specific transformations (for the Web app that means turning it into XHTML). The user does a standard HTTP POST back to us, and we get the request parameters back as XML and run another XSLT transform and then a Schematron transform to validate the input from the user. If Schematron throws an assert we detect that and recycle the input back through the original loop with the appropriate error message otherwise we continue on to the next screen. The original metadata and the instance specific screens that are built around them are built by business analysts using other screens (that are in turn built by the same system). In particular, we have a validation editor where they describe the validation rules for the input to any given screen. These rules are one step removed from Schematron statements; a simple transform turns them into Schematron. The main reason for not specifying Schematron directly is so that the "validation editor" can pick the rules into component pieces when a business analysts wants to go back and edit an existing validation rule; we use XML elements and attributes to build the Xpath, that way we don't have to parse the Xpath (though we're probably going to go to XSLT for regex support so I suppose parsing the Xpath with regex would be just about the same work in the long term). All this works pretty well, but for one issue which I will describe shortly. However, we now have a new requirement which is to be able to validate across multiple documents. We manage clinical research data, so an example would be for someone to be able to specify that a surgery date was after any protocol on study date, or that a surgery date is after a particular instance of an protocol on study date. In this case, the data being validated is in the surgery document and the data it is being validated against is in the protocol document. (In reality all this is pulled out of a database on the fly, but the mechanics of how these documents are actually created should be more or less irrelevant to the problem at hand?) First issue: Writing Schematron asserts can be non-intuitive for a business analyst. Consider, for example, a document that reports many lab results. We may want to say that the ANC value is between 1000 and 10000. As a Schematron assert it is essentially: not(*[local-name() ='ANC']) or ( result_val > 1000 and result_val < 10000) IE, for things that aren't ANC's we are ok, otherwise check the result value. The problem is that a business analyst just doesn't get the "not(x) or" pattern, it might make sense to someone well versed in Boolean logic and xpath, but even some of our more experienced developers get confused on these rules. Given this, and the requirement for cross document validation we'd like to move the input to our validation process one more step away from Schematron and find or create a language that can be used by the business analysts to specify the validation rules in a manner that is a little more natural to them. For example: element = 'ANC' and result_val > 1000 and result_val < 10000 For Schematron generation that's pretty straight forward, however, more importantly, we also need to be able to use this rule specification to tell us how to generate the other document. Considering my other example, we want something like: *[local-name() = 'surgery.date'] > *[local-name() = 'protocol.on_study_date'] Or *[local-name() = 'surgery.date'] > *[local-name() = 'protocol.on_study_date' and protocol.mnemonic = 'TOTXV'] We want to be able to parse this rule specification to find the fact that we have to do a retrieval of all the protocol data that is in context for this particular patient (or the protocol data in context that has a mnemonic='TOTXV'). Essentially, I think what we need is an XML syntax for xpath that we can turn back into real xpath or be easily parsed so things other than xpath savvy processors can generate data sets that match the xpath. We are running this all on top of Apache Cocoon with Saxon so we more or less have any piece of XML or XSLT handling machinery we might need available to us: protocol resolvers, any and all manner of schema, XSLT in any version, Java classes, and even Java extensions for XSLT if needed, though I'd rather stay away from those. Sorry for the windy post, but finally the real questions: anyone know of any "obvious" way to do this? By obvious I mean some existing spec, or best practice? If not, any thoughts on what a good structure for our artificial language that is going to be fed into Schematron and our document retrieval process? Am I missing something with respect to Schematron? Could we hook into some underlying part of an xpath parser and gain are understanding of the xpath there instead of at the higher level (and thus not need the XML syntax for xpath)? Other thoughts or comments? Peter Hunsberger
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








