[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Weak DTDs
The strength of the DTD is in giving a limited set of possibilities for a processing engine to work with. There are obviously other ways to do this (see below) but for a lot of applications, the DTD provides sufficient constraints for authors of the information. A common example is a title element. Often a title is required to provide feedback in a UI, to act as link text in a hypertext link, etc. If your DTD says: <!ELEMENT anything (title, anything.else+)> then you know for a fact that you can pick out the title, given a valid document. Also, the parser will tell you if the document is valid or not and you can then decide whether to attempt processing it. In our application, the RTF processing engine will still attempt to process a document but says "hey, you might not get what you expect". In other situations, an application just says "go away and come back with something valid". It sounds like in your situation, you aren't worried about the vast majority of elements but just want to pick up on key things like <atom>, <bond>, etc. The "Eliot" way to do this would be with an architecture DTD which defines attributes to identify important elements. Your derived DTD can then use any content model (or even element names) you want. For example: <!element atom - - (bond+)> <!attlist atom CMLNAME NAME #FIXED atom> <!element bond - - EMPTY> <!attlist bond CMLNAME NAME #FIXED bond> Your derived DTD might then go something like: <!ELEMENT myatom - - (title, mybond+, otherstuff)> <!ATTLIST myatom CMLNAME NAME #FIXED atom> <!ELEMENT mybond - - (title, description)> <!ATTLIST mybond CMLNAME NAME #FIXED bond> (I'm still new to AFs, but this is the basic idea) Now your processing engine can identify items by their fixed attributes and process according, ignoring all other elements. Other people can happily derive from your architecture DTD to add their application specific elements. If you are using XML without a DTD, things are exactly the same except that you need to explicitly set the attribute on the relevant elements (as I understand it). It should be trivial to write a normaliser which would generate XML from an SGML instance (SGMLNORM would probably do it). I think one of the major problems with the Web today is the plethora of badly formed HTML pages which have been allowed to grow and florish by browsers which don't check for validity in any way at all. There is a danger that lack of DTDs in XML documents will lead to even greater "tag soup". ---------- From: peter@u... Sent: 17 October 1997 08:21 To: xml-dev@i... Subject: Weak DTDs -------------------------------------------------------------------------- -- I am in the throes of revising CML (Chemical Markup Language - an XML-based application) and trying to work out what the value of conventional DTDs are. The previous version has a traditional SGML-like DTD - lots of parameter entities and other clever stuff. I am finding this too restrictive for several reasons, mainly because: (a) XML-* is moving so rapidly (e.g. LINK, STYLE, etc.) This is a Good Thing, but CML has to react to it. (b) RDF, DC, MathML etc will be involved in CML and I can't say exactly how at present. (c) My ideas on CML itself keep changing as I gain experience of new problems. I'd like *constructive* views on the value of DTDs in XML. [I know that the community has strongly held ones, so please avoid too much passion :-). There was a very interesting discussion a few weeks back on the aesthetics of DTDs - a good DTD is a thing of beauty.] I can see the following reasons for DTDs. (a) the author has to conform to a pre-defined spectrum of ideas (e.g. a tax-return). [This is not required for CML, and any conformance is outside what a DTD can deliver - e.g. value verification.] (b) the document may get corrupted in transmission or elsewhere. I suspect this is not a very important reason these days. (c) it *may* make it easier to develop authoring tools (d) it *may* give guidance to implementers of applications. (e) it should (but doesn't always) act as an incentive to develop human-readable documentation of the semantics. (f) it shows that the author has defined the language at some point in time. I'd be grateful for other reasons for CML I expect that (c-e) have some limited value. (f) may impress some people and horrify others. In creating CML documents I find myself: (a) wanting to introduce foreign names (e.g. <DC:author>, or <MathML:EQN>) These could reasonably come at many places in the document (b) forgetting my own 'rules', e.g. order of elements within a content model. So I can't expect others to follow them :-) (c) adding new components to content models - for good reasons. There is no reason why an <MOLECULE> cannot contain a <FIGURE>, but I didn't think of that earlier. I don't want to have to think of all combinations and ask 'is that reasonable?'. However the power of structured documents means that I can often use very fuzzily constructed documents. Thus: 'if a MOLECULE contains ATOMS and BONDS, the software can draw a picture' 'if any parent contains a FIGURE, allow that to be displayed by the reader'. 'if a VARiable has attribute BUILTIN=FOO, inform the software that it could process this with special FOO-specific code' and so on. These are powerful conditions, but if we try to express them in DTDs, validation will fail. What I'd like to have is a wildcard #ANY (this has already been suggested) which can be used for content models something like the (currently illegal) XML: <!ELEMENT MOL (#ANY,ATOMS,BONDS)*> This says that MOL can contain anything, but that ATOMS and BONDS have a special role. The authoring tool might present a menu with the items ATOMS, BONDS, Other. The software for MOL.java could contain routines to identify children: for (int i = 0; i < this.getChildCount(); i++) { Node n = getNode(i); if (n instanceof ATOMS) { /* atom-specific stuff */; natom++; } else if (n instanceof BONDS) { /* bond-specific stuff */; nbond++; } } if (natom > 0 && nbond > 0) { displayMol(); } Obviously this can't be written automatically, but the 'DTD' helps the author. In some cases there will be stricter rules such as: <!ELEMENT VAR (PCDATA)> <!ATTLIST VAR BUILTIN CDATA #IMPLIED TYPE (INTEGER,FLOAT,STRING) STRING ...> which clearly help both authoring tool authors and applications authors. At present I would like to keep a simple DTD but most of the content models will be 'ANY' and most of the attribute values will be CDATA. It would be nice to have attribute values which could take a list of values *and* CDATA :-) - like: <!ATTLIST VAR TYPE (INTEGER,FLOAT,STRING,#ANY)> which would inform the software that it should cater for three specific values, but that the user can add FOO if they really want. Any sympathisers out there :-)? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|