[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Come On, DTD, Come On! Thoughts on DSDL Part 9


dsdl
ISO/IEC 19757-9 is currently an empty hole titled "Datatype- and
namespace-aware DTDs".  This is a ragbag of ideas to fill that hole.

I am assuming that the context for extending DTDs is not redefining
XML, but rather creating an enhanced XML DTD format which can be used by an
external validator.  Examples of existing external validators are Jing
for RELAX NG, XSV for W3C XML Schema, and Sun MSV for many different
schema formats (including DTDs).  Enhanced DTDs would not be acceptable
to XML validating parsers.

I think the following enhancements to standard XML DTDs are worth
considering.  They are directed to making DTD authoring easier and
more flexible.  Nothing is introduced that is beyond the current schema
language state of the art.

1) The NS declaration.  Declarations of the form <!NS name SYSTEM "uri">
are allowed to define the namespaces associated with CNames in ELEMENT
and ATTLIST declarations.  As is the case for other schema languages, in
the presence of a known prefix, name matching is done on the universal
name (URI + local-part) rather than the CName.  The default namespace
is declared using #DEFAULT in place of the name.

Example:

<!NS foo SYSTEM "http://www.example.com/foo">
<!NS bar SYSTEM "http://www.example.com/bar">
<!ELEMENT foo:a (foo:b)>
<!ELEMENT bar:a EMPTY>

Issue: Is it an error to mention a prefix that is not declared?  My
answer: no; if this is done, name matching falls back to string identity.

Issue: is the keyword SYSTEM useful?

Issue: this does not help when prefixes are not used consistently
throughout an instance.  Do we care?  My answer: no.

2) Attribute data types.  The names that can appear in an ATTLIST
declaration directly after an attribute name are extended to include
the datatype names of part 5 (i.e. XSD simple types).

Example:

<!ATTLIST baz
	foo integer #implied
	baz integer #required>

Issue: do we need to make the datatype list extensible?  If so, we could
use QNames and a DATATYPE declaration, rather like the compact syntax
of RELAX NG.

3) Element simple datatypes.  Likewise, unparenthesized content models
in ELEMENT declarations are extended from just ANY and EMPTY to include
these same datatypes.

Example: <!ELEMENT foo nonNegativeInteger>

4) Datatype lists.  In either #2 or #3 context, a simple datatype name
can be replaced by "LIST(name)" to indicate a whitespace-separated
list of strings matching the datatype.	IDREFS is equal to LIST(IDREF),
and ENTITIES is equal to LIST(ENTITY).

5) Datatype choice.  In either #2 or #3 context, a simple or LIST-wrapped
datatype name can be replaced by |-separated names, to indicate a choice
(derivation by union in WXS terms).

Example: <!ELEMENT bar integer|name>

Issue: what do we do about XSD facets?	They are important but don't
easily fit into the rigid DTD syntax.

6) Restore & connector.  Bring back the & connector, either with the
SGML semantics (A,B)|(B,A), or preferably with the RELAX NG "interleave"
semantics.  The difference is that, given the content model "A & B+",
the element sequences A, B, B, B and B, B, B, A will match in either case,
but B, A, B, B will only match using interleave semantics.

Issue: SGML or interleave?  My answer: interleave

7) Abandon SGML 1-ambiguity rules.  Instead, allow complete flexibility of
content models.  See James Clark's discussion in "The Design of RELAX NG".

8) Restore multiple element and attribute names separated by |s.
This makes for conciseness and easy authoring.	These constructs were
dumped in XML DTDs because they imposed extra cost on validating parsers,
but in this model validation is something done outside parsing, so higher
cost is worthwhile.

9) Fixed element content.  Allow ELEMENT declarations to specify "#FIXED
'value'" after a datatype.

Example: <!ELEMENT foo integer #FIXED "5">

This means that the content of any foo element must be equivalent to 5
according to the "integer" datatype's equivalence relation: therefore,
05, 005, +5, etc. will pass validation.

General issue:	Should there be some way to indicate candidate roots?
In existing DTDs, any element can be a root.

General issue: We need to figure out what to do if the instance contains
an internal DTD (by which I mean an internal subset, a reference to an
external subset, or both).  Should internal validation be required,
permitted, or forbidden when doing external validation?  (I take it
for granted that if it is to be done, it will be done in the parser,
i.e. first.)  What is the effect of attribute defaulting specified by
the internal DTD on the external validation process?  internal validation
be done before external validation or turned off

-- 
John Cowan <jcowan@r...>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.