[Home] [By Thread] [By Date] [Recent Entries]
The technique where one creates instances until one gets an exhaustive set isn't unknown. Once known as "tag sprinkling" it is a typical bottom up approach seen a lot when we used to be given a stack of mil docs or a mil content standard and told to "make a DTD for this by next week". It isn't that much different from having system do the job. Lots of times when I start out to develop a language without a real imprimatur for it, I write instances until I get a feel for the domains. Then I start in on the DTD/Schema. This is probably fairly common. Mahler and Andaloussi wrote a book on structured techniques for DTD design, and most people who come from any kind of structured programming background understand these approaches. So the automation may be future-tense, but the approach is the oldest one out there. Some suggest it is a bad approach and that structured design techniques should be used in every case. I think it depends on the situation. If I have a lot of folks (say committee) sitting around a table, structured techniques drive a discussion nicely. But I've seldom seen it done. Most of the time, they toss a load of docs on the table and say, "tag these". Only later do when they try to get a trading partner to adhere to their tags do they hear that well-known refrain which will haunt the Semantic Web: "Who sez?" "The W3C sez!!" "Screw 'em. This is our business deal." And so it will go. How many really large relational schemas out there share standard namespaces? Some. But mostly, they share a weak data standard (eg, NIBRS) for which endless local customization is done, semantics and all. As I said: the challenge of vertical vocabularies is to get the two front-runners in a business ecology to give up their subject domain expertise to slower competitors. In most business markets, you are number one or number two or a loser. A little Objectivism; localize schemas. Len http://www.mp3.com/LenBullard Ekam sat.h, Vipraah bahudhaa vadanti. Daamyata. Datta. Dayadhvam.h -----Original Message----- From: Joel Rees [mailto:rees@s...] [clipped] > When it was suggested early in the XML rhubarb > that DTDs would go away, (well-formed only), > I laughed. It removes the biggest advantage > of SGML: standard vocabularies for focused > domains, the easy means to annotate a text with inline > metainformation for interpretation. Now people > are defending DTDs against the next new thing > and so it goes, but the principle remains: once > you get beyond a simple message, well-formedness > isn't enough. You need the metadata to get around > the outrageous and inefficient noise reduction > techniques of open text searching. My company is betting that there will be a large range of applications for which one would rather not have the DTD in the way. I tend to figure that DTD-less is an intermediate step, something to use while trying to get a grasp of what a document class should include and what it should not. When I think of writing XML documents with a word processor, I imagine formatting some piece, then selecting a range and assigning a semantic tag of my choosing to it. The word processor should split the semantic XML from the formatting XML, then save the format as an XSL document and the semantic as straight XML. An automatic DTD generator should eat a batch of similar files (possibly built from a single original as a template) and spit the DTD out as whatever is needed to describe everything in the batch. When editing the doc, a palette would appear showing the current set of tags not made directly manipulable by the current XSL. So, following this line of reasoning, SW would simply take a DTD and allow selecting a node/nodeset and attaching some attributes or child elements that specify some common/standard qualitative semantic? How far in the future am I imagining? I know Microsoft is doing the smoke-and-mirrors about Word saving as XML. Any bets (or inside info) about whether they have even considered semantics issues?
|

Cart



