[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Expert's advice needed about XML Schema and defining some
On Fri, 2003-12-05 at 15:50, Robert Koberg wrote: > Hi, > > Michael Champion wrote: <snip> > > I had the same reaction to the original post. As useful as XML is for > > lots of things, one needs to guard against the temptation to see nails > > that need pounding simply because one has a hammer. XML per se has no > > notion of cross-document referential integrity nor does XPath/XSLT 1.x > > have the notion of a join. Obviously these are two great strengths of > > the relational approach. <snip> > This is the second post I have seen that pooh-poohs the value of > id/idref, XML Schemas and xslt1.0 to manage a project's validity. I am > wondering if the context here is unique or if it is generally a bad > practice to use these types of things. I use them for a few reasons: > provide a UI for a user to manage their project and ensure validity for > our cms. There is a general tendency to disparage features when they are misapplied, or when we try to apply them without understanding them. I am _not_ saying anyone in this thread has done this. Rather, it is a general observation from my work experience. (Of course, me dissing relational database systems at every opportunity is well thought through and rational. Another matter entirely!) I can't help stepping up on the soap box for a bit here (just skip this if you don't enjoy rants): <rant> <![CDATA[ ID/IDREF is a very simple mechanism for creating cross-references in documents, usually technical documents. For that purpose it works well. However, ID/IDREF was never intended for use in a Web environment, nor was it intended to create links between different documents. SGML used other mechanisms for that purpose (HyTime). For describing relationships between resources XML has XLink, and a couple of other recommendations. (For some reason everyone seems intent on inventing their own linking mechanism, duplicating work that has already been done. I don't quite understand why.) XML Schemas are intended to specify document structure. (It is arguable whether W3C Schema does this well.) The intent is not to validate link relationships, or entire projects. XSLT was originally designed as a language for transforming XML documents to XSL-FO. The idea of using XSLT as a general transformation language was hit upon quite early in the design process though, so I do not think XSLT suffers to much from the change in scope and purpose. Still, XSLT _is_ a transformation language, not a general purpose programming language. XSLT even specializes in the kinds of transformations it does, handling some things exceedingly well, and other things rather poorly. There is absolutely nothing wrong with finding a new use for a tool. (I once got a tool for removing the stems from strawberries as a gift. I use it as a sugartong. Works very well.) However, it is (or should be) a calculated risk. There is always a risk that the tool is not really suited for the task. Knowing as much as possible about the tool, what it was designed for, the circumstances under which it came to be, and what alternative tools may be available, certainly helps. The uses of W3C Schema, ID/IDREF and XSLT discussed in this thread are not the uses these tools were originally intended for. Nothing wrong with trying to find a new use for them, actually it is a very good thing, but it is not the fault of the tools if it does not work. Since I am in rant mode, I might also point out that XML itself is designed for publishing content on the Web. Originally, it was not designed for content creation. The original idea was more along the line of creating content using SGML, and then transform to XML for publication. This idea died very quickly, but there are still traces of it left, for example in the idea of making SYSTEM identifiers in DOCTYPE declarations required, and the nonexistent support for remapping SYSTEM identifiers in some parsers (notably MSXML). To this day things like these cause considerable problems when designing and building XML compliant document production systems. (Which does not stop me from doing it, or suffering for it.) ]]> </rant> > > Below is a simplified example of some things I do; could you (anyone) > comment on it? (tear it to shreds if you like; I have a thick skin) > > This is a config XML that describes a brochure-type website (site.xml): > <site ...> > <folder id="f123" index_page="a123" ...> > <page id="a123" ...more system independent metadata...> > <region name="wideColumn"> > <content ref="c123"/> > </region> > </page> > </folder> > <page id="a234" ...> > ... > </page> > </site> > > This is a config XML that describes a kind of topic mapping or dmoz-type > website (topics.xml): > <topics> > ... > <topic id="t123" label="some_grouping"> > <topic id="t234" label="some_sub_grouping"> > <content id="c123" label="blah" ...more system independent > metadata.../> > </topic> > </topic> > ... > </topics> > > When validating I bring in config files like so: > > <config> > &site; > &topics; > </config> In other words, you have one document, the config file, and two well formed fragment files. There is nothing wrong with this approach, provided that you don't have so much data that the size of the normalized file becomes a problem. > > and here is a the content piece referenced/identified in the content > elements above (c123.xml): > <article> > <p>blah blah <link page_idref="a234">blah</link> > </article> Here things go a bit weird, in my dochead opinion. If article is the root element of c123.xml, then c123.xml and the site.xml file must both be imported into the same file before validating the article. It looks as if the system has very tight couplings where it shouldn't. An article can't be validated on its own, and is tied to the web site were it is published. I realize that for a brochure site that is completely self contained, this may not matter much, but as a generic design model, it does not work very well. Again, XLink, or another link model with a similar purpose, would have felt more natural here, and would have enabled a simpler, more flexible design. It is worth noting that in general it is a good thing to keep structural validation and link validation separate. It is necessary to be able to validate the structure of a document at the time it is written, at least for anything even moderately complex. On the other hand, external resources the document refers to may not be available when the document is written. Indeed, they may not even exist, because a document may refer to some other document that has not been written yet. In such cases, and they are frequent, tying link validation to structure validation would be a big mistake. <snip> > In addition and among other things, I ocassionally validate that the > content pieces referenced in the site.xml//page/region/content exists in > topics.xml/topics//content (automated). > > site.xml/site//folder has the index_page attribute which is an xs:IDREF > while site.xml//page/@id is an xs:ID. > > c123.xml//link/@page_idref is not defined in a schema as an xs:IDREF. > Rather, I use XSL to verify that the 'virtual page' (to be rendered as > HTML to the file system or sent back to a browser) that the content > piece attempts to link to actually exists in the site.xml. > > If the above is understandable (:-o), am I following bad or good practices? Yes it is understandable. It is not the way I would have designed it, but it works and it isn't to complex. What I have against the design is that it locks out standard tools and techniques. How does an XML editor validate an article? For that matter, how does an authoring application validate an IDREF link when the target is in another document? How do you reuse content that is currently published on this web site somewhere else, where the publishing mechanism is entirely different? Again, these considerations may not be relevant for everyone, but they certainly are to me and the systems I work with. Most of the differences in our respective approaches, is probably due to the fact that we work with different things. I am mainly concerned with content creation, and you (it seems to me) with publishing. Also, we deal with different kinds of information, a brochure and the technical documents I work with are very different. It is only natural if we have different perspectives and come up with widely different solutions to similar problems. /Henrik
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|