Re: dynamically generated XML Schema?! Re: R: [xml-d
On Thu, 04 Nov 2004 09:41:12 +0100, Burak Emir <burak.emir@e...> wrote: > > Peter Hunsberger wrote: > > >Burak Emir <burak.emir@e...> asks: > > <snip>XML syntax discussion and related</snip> > >>> > >>One can of course endlessly discuss about syntax, but I have never > >>understood the obsessiveness of marking up descriptions of XML data in XML. > >> > >>Who needs to dynamically generate schemas? > > > >Umm, we do. > > > Are you sure? :-) Let me put it this way, if someone needs an XML schema we can generate one. In this particular application for 99% of the current needs we really don't need an XML schema at all. That will change as things open up across more organizational boundaries. > > >>The whole point of schemas is > >>to be a widespread, well understood description of instances. > >> > > > >In our cases we have a lot of metadata described in a relational > >database. There are customizations of that metdata that select > >specific pieces based on the authorizations of the user and the usage > >context of the meta data. The only time we need a schema is for the > >description of a piece of instance data that is travelling beyond the > >boundaries of the system, so we generate the schema as we need it. > > > >This may sound like a problem of not having a powerful enough schema > >language and in a way it is. However, my general philosophy is that I > >will generate no schema before it's time... > > > > > Ok, using schemas to describe the format of the data that is going out, > from descriptions in a relational database. > > If I put this a bit more concrete, I have a bugtracking system, and a > bug-report has a field "product" which is an enumeration. > > Now, when there is a new product, the enumeration changes (somebody > updates the database). One generates a new schema. > > But this is a bit one way: The one who generates the schema changes his > data at free will (maybe the product field even disappears)? > > Where does that leave the receiver of your data? Two options > > 1) Either, he cannot rely on any schema, because it may be subject to > complete change. > 2) Or, the schema changes are actually very very restricted to a few > backwards-compatible details. > > Assuming the latter, I start seeing things clearer now, namely that if > you add a new complex type by derivation, you are effectively building a > new schema, hence there is indeed a new to build new schemas if it is > possible to "continuously specialize". > > Does this cover your requirement? If no, can you give a concrete example > like the one above? Not really, the dynamic generation occurs at well defined points: the introduction of a new clinical trial or the revision of a medical protocol. Say a researcher wishes to revise a protocol to capture some new information. We may already have metadata descriptions existing elsewhere that describe this particular information (if not, they are created by business analysts who know nothing about XML). Based on this metadata, the researchers current authorizations and the context in which the researcher wishes to use the new information we can generate a new schema. This schema will be consistent across all matching instance data until the protocol is revised again. The old version of the schema will be retained at that point and can be used to audit and validate previous versions of the data. Any schema revision may not be completely backwards compatible, sometimes information is no longer wanted. On some occasions data changes format or type (the latter can be problematic and require manual changes). Data elements can go from not present in the model at all to being required. Making elements optional wouldn't work; until they become an official part of the protocol they can't exist at all. The schema documents the current state of the protocol exactly. (Or rather it will in the future, at the moment the schema aren't as precise as they need to be.) In theory, the dynamically produced schema could serve as a basis for negotiation. In practice this negotiation is done using higher level modelling, and business analyst facilitated face to face meetings with the concerned parties. The schema is in some ways an after the fact documentation artifact. The fact is, XML schema and the tools for handling them don't work at the level of modelling that is required for our application. (Thus, as I say, I will generate no XML schema before it's time.) > > I am aware of XML Schema pitfalls that prevent typed programming > languages (e.g. XSLT, XQuery) from using the specialized data, yet it's > hard to really grasp the need for "continuous > specialization/extension/adaptation". I think in some ways it's part of the problem domain: we're doing research, by definition we don't have well defined business rules that can be evenly applied across all of the researchers. None-the-less the researchers will wish to exchange data with each other in some well defined way. Instead of proceeding top down with business rules to schema we have to build many possible solutions and dynamically search the solution space to see what fits at any given moment. In a way it's a recursive data mining project to find what schema works to describe the data. Alternatively, perhaps it's a genetic algorithm for determining the fitness of the schema to the data. (Both of those characterizations are unfair, we actually have a better understanding of the data than they imply.) > ... > <snip/> > > >>Now one can dwell in discussion of hypothetical families of schemas, but > >>for all my experience tells me about modelling, if you manage to > >>understand what the common things are that make a bunch of schemas a > >>family, then you can anticipate the extensibility you need, which > >>removes completely the need for dynamic generation. > >> > > > >Yes and no. We have a meta-schema. It's so abstract and so > >generalized that it's difficult to use for specific instance data. > >The problem is, understanding of the schema is often local to the > >schema writer. Not everyone "gets" 5th normal form, 5th normal form > >doesn't work when the data hit's the data warehouse. > > > Does it happen that you need to change that one as well? > > Or is it a "parameterized" schema (like the Java generics)? It is largely a parameterized schema though it is still being revised as we figure out what works best. The biggest changes are a constant evolution to make it more granular. It's becoming less and less like a conventional relational database schema (not that it ever was) and more and more like a graph management system. > >>What is a use case for dynamically generated schemas? > > > >For one, you need different schema for different stages in the life of > >the data. I know of no technology that lets you adequately describe > >all possible transformations of the schema over time from within the > >schema itself. As a specific example (discussed previously on the > >list), you need a way to match versions of the schema to work flow. > > > In my understanding of the problem, this drifts away from "dynamical > generation". Schema evolution (or just backwards-incompatible change) > makes configuration management, versioning, and many things necessary. > > But having a meta schema and generating schemas is of no use for the > problem at hand, because the receiver of your data cannot write software > that deals with the meta schema, and hence with all versions of the schema. I guess this depends on your perspective: are the schema the starting point or the end point? Do you negotiate from the schema or do you document the negotiations with the schema? If it's the latter, how do you model and document before you have the schema? If you have a good system for capturing the modelling and documentation when you are working at the business knowledge capture level then the schema can become an after the fact documentation artifact. Yes, you still need version management, but the audit trail that documents the negotiations isn't based on the schema, it's external to it (and yes, maybe you have a schema for exchanging that data also). At the end of the day we end up with a Gene Therapy version of the drugs schema and a Solid Tumor version of the drugs schema and they may in turn have their own revision levels. They have metadata in common, but they also have completely different sets of elements within them. > >>Why does one need to use XSL for it ? > > > >You don't, but in our case, we've got about 8 different pieces of > >source metadata that have to be combined and transformed in order to > >derive a specific schema. XSL is the best match to the problem I know > >of. > > > Unless I have misunderstood, I think your problem seems rather > different, because you could also get away with not generating any > schema at all, if it can change it unanticipated ways. Your problem and > its solution (which may be elegant) does not take receivers into account > - they may have to hand patch their code to deal with the new data. The changes are anticipated, they occur at well document points in the life cycle of the protocol. When the changes happen the receivers do indeed have to change the systems that accept the data. We're working on ways to automate the process. The solutions are, in part, based on the exchange of schemas that document the changes... :-) -- Peter Hunsberger
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format