[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: dynamically generated XML Schema?! Re: R:[xml-de

data mining for xml schema

Peter Hunsberger wrote:

>Let me put it this way, if someone needs an XML schema we can generate
>one.  In this particular application for 99% of the current needs we
>really don't need an XML schema at all.  That will change as things
>open up across more organizational boundaries.
As I said before, half of your application sits across organizational 

Although it is a use case for the one who publishes the data, I am not 
sure whether there is a way to write a program that reacts to such a 
schema change and adapt its behavior automatically.

>>Where does that leave the receiver of your data? Two options
>>1) Either, he cannot rely on any schema, because it may be subject to
>>complete change.
>>2) Or, the schema changes are actually very very restricted to a few
>>backwards-compatible details.
>>Assuming the latter, I start seeing things clearer now, namely that if
>>you add a new complex type by derivation, you are effectively building a
>>new schema, hence there is indeed a new to build new schemas if it is
>>possible to "continuously specialize".
>>Does this cover your requirement? If no, can you give a concrete example
>>like the one above?
>Not really, the dynamic generation occurs at well defined points: the
>introduction of a new clinical trial or the revision of a medical
This discussion is on the process level (interacting with humans), where 
my initial question was on the level of interacting software.

It seems to be a case of schema evolution.


>>I am aware of XML Schema pitfalls that prevent typed programming
>>languages (e.g. XSLT, XQuery) from using the specialized data, yet it's
>>hard to really grasp the need for "continuous
>I think in some ways it's part of the problem domain: we're doing
>research, by definition we don't have well defined business rules that
>can be evenly applied across all of the researchers.  None-the-less
>the researchers will wish to exchange data with each other in some
>well defined way.
The only constant thing is change, also business rules are not cast in 

>Instead of proceeding top down with business rules to schema we have
>to build many possible solutions and dynamically search the solution
>space to see what fits at any given moment.  In a way it's a recursive
>data mining project to find what schema works to describe the data.
>Alternatively, perhaps it's a genetic algorithm for determining the
>fitness of the schema to the data. (Both of those characterizations
>are unfair, we actually have a better understanding of the data than
>they imply.)
Both data mining and genetic algorithms talk on machines, you add humans 
to the equation.

The point I tried to make was more or less that if you generate a schema 
dynamically, then humans have to rewrite software. Meaning that the old 
software will not work. Your problem seems way beyond, you never claimed 
that old software will work.

>>>Yes and no. We have a meta-schema.  It's so abstract and so
>>>generalized that it's difficult to use for specific instance data.
>>>The problem is, understanding of the schema is often local to the
>>>schema writer.  Not everyone "gets" 5th normal form, 5th normal form
>>>doesn't work when the data hit's the data warehouse.
>>Does it happen that you need to change that one as well?
>>Or is it a "parameterized" schema (like the Java generics)?
>It is largely a parameterized schema though it is still being revised
>as we figure out what works best.  The biggest changes are a constant
>evolution to make it more granular.   It's becoming less and less like
>a conventional relational database schema (not that it ever was) and
>more and more like a graph management system.
[OT] sounds like tricky stuff. Reminds me of a "professor for software 
engineering" whose only fascinations were ADA, Mercedes-Benz (as an ever 
repeating example of plain old industry in need for new software) and 
general graph replacement systems. I would never spend my time on a 
general approach to graph systems. For special purpose they can make a 
lot sense.

>>>>What is a use case for dynamically generated schemas?
>>>For one, you need different schema for different stages in the life of
>>>the data. I know of no technology that lets you adequately describe
>>>all possible transformations of the schema over time from within the
>>>schema itself.  As a specific example (discussed previously on the
>>>list),  you need a way to match versions of the schema to work flow.
>>In my understanding of the problem, this drifts away from "dynamical
>>generation". Schema evolution (or just backwards-incompatible change)
>>makes configuration management, versioning, and many things necessary.
>>But having a meta schema and generating schemas is of no use for the
>>problem at hand, because the receiver of your data cannot write software
>>that deals with the meta schema, and hence with all versions of the schema.
>I guess this depends on your perspective: are the schema the starting
>point or the end point?  Do you negotiate from the schema or do you
>document the negotiations with the schema?  If it's the latter, how do
>you model and document before you have the schema?
Negociations would mean "reconfiguration", and I precisely doubt that 
such a thing is possible (in absence of Meta-XSLT:-)

>If you have a good system for capturing the modelling and
>documentation when you are working at the business knowledge capture
>level then the schema can become an after the fact documentation
>artifact.  Yes, you still need version management, but the audit trail
>that documents the negotiations isn't based on the schema, it's
>external to it (and yes, maybe you have a schema for exchanging that
>data also).
Surely, schemas do evolve, and having a documentation artefact is better 
than having none.

What I get out of the description is that probably no schema language 
and no fixed program would help here.

>>>>Why does one need to use XSL for it ?
>>>You don't, but in our case, we've got about 8 different pieces of
>>>source metadata that have to be combined and transformed in order to
>>>derive a specific schema.  XSL is the best match to the problem I know
>>Unless I have misunderstood, I think your problem seems rather
>>different, because you could also get away with not generating any
>>schema at all, if it can change it unanticipated ways. Your problem and
>>its solution (which may be elegant) does not take receivers into account
>>- they may have to hand  patch their code to deal with the new data.
>The changes are anticipated, they occur at well document points in the
>life cycle of the protocol.  When the changes happen the receivers do
>indeed have to change the systems that accept the data.  We're working
>on ways to automate the process. The solutions are, in part, based on
>the exchange of schemas that document the changes... :-)
To automate "fixing a program for the new schema", this is precisely 
what I think does not exist anywhere.

More specifically, even a statically typed bunch of XSLT stylesheet and 
XQuery programs cannot deal with a dynamical schema change.

Burak Emir



Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.