[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SGML default attributes.
These are really two different subject domains: entities (content-level reuse) and document types (defining and determining correctness of instances against some understood set of rules). On general entities: General entities are absolute evil. They should never be used under any circumstances. Fortunately, the practical reality of XML is that they almost never are used. I only see them in XML applications that reflect recent migration from legacy SGML systems. The alternative is link-based reuse, that is, reuse at the application processing level, not at the serialization parser level. Or more precisely: reuse is an application concern, not a serialization concern. Entities in SGML and XML are string macros. To the degree that string macros are useful then they have value and in the context of DTD declarations parameter entities have obvious value and utility. Parameter entities are not evil. But in the context of content, that is, the domain of the elements themselves, string macros are a big problem, not because they aren't useful, but because people think they do something they don't, namely provide a way to do reliable reuse. The use cases where string macros are useful relative to the use cases where they are actively dangerous is so small as to make their value not at all worth the cost of their certain misuse. Even for apparently-simple use cases like string value parameterization in content (e.g., product names or whatever), string macros fail because they cannot be related to specific use contexts. When you push on the requirements for reuse you quickly realize that only application-level processing gives you the flexibility and opportunities required to properly implement re-use requirements, in particular, providing the correct resolution for a given use in a given use context. The solution was in HyTime, namely the content reference link type, which was a link with the base semantic of use by reference. Because it is a link it is handled in the application domain, not the parsing domain. This is transclusion as envisioned by Ted Nelson. You see this in DITA through DITA's content reference facility and the map-and-topic architecture, both of which use hyperlinks to establish reuse relationships. With DITA 1.3 the addressing mechanism is sufficiently complete to satisfy most of the requirements (the only missing feature is indirection for references to elements within topics, but I defined a potential solution that does not require any architectural changes to DITA, just additional processing applied to specific specializations). I'm not aware of any other documentation XML application that has the equivalent use-by-reference features, but DITA is somewhat unique in being driven primarily by re-use requirements, which is not the case for older specifications like DocBook, NLM/JATS, and TEI. Of course, there's no barrier to adding similar features to any application. However, there are complications and policy considerations that have to be carefully worked out, such as what are the rules for consistency between referencing and referenced elements? DITA has one policy, but it may not be the best policy for all use cases. On DTDs and grammars in general: I do not say that DTDs (or grammars in general) are evil. I only say that the way people applied them was (and is) misguided because they misunderstood (or willfully ignored in the face of no better alternative) their limitations as a way to associate documents with their abstract document types. Of course DTDs and grammars in general have great value as a way of imposing some order on data as it flows through its communication channels and goes through its life cycle. But grammars do not define document types. At the time namespaces were being defined I tried to suggest some standard way to identify abstract document types separate from any particular implementation of them: basically a formal document that says "This is what I mean by abstract document type 'X'". You give it a URI so it can be referred to unambiguously and you can connect whatever additional governing or augmenting artifacts to it you want. By such a mechanism you could have as complete a definition of a given abstract document type as you wanted, including prose definitions as well as any number of implementing artifacts (grammars, Schematrons, validation applications, phone numbers to call for usage advice, etc.). But of course that was too heavy for the time (or for now). Either people simply didn't need that level of definitional precision or they used the workaround of pointing in the other direction, that is, by having specifications that say "I define what abstract document type 'X'" is. This is was in the context of the problem that namespace names don't point to anything: people had the idea that namespace names told you something but we were always clear that they did not--they were simply magic strings that used the mechanics of URIs to ensure that you have a universally-unique name. But the namespace tells you nothing about the names in the space (that is, what is the set of allowed names, where are their semantics and rules defined, etc.). The namespace spec specifically says "You should not expect to find anything at the end of the namespace URI and you should not try to resolve it". So if the namespace name is not the name of the document type, what is? I wanted there to be one because I like definitional completeness. But in fact it's clear now that that level of completeness is either not practical or is not sufficiently desired to make it worth trying to implement it. So we're where we were 30 years ago: we have grammar definitions for documents but we don't have a general way to talk about abstract document types as distinct from their implementing artifacts (grammars, validation processors, output processors, prose definitions, etc.). But experience has shown that it's not that big of a deal in practice. In practice, having standards or standards-like documents is sufficient for those cases where it is important. As far as addressing the problem that the reference from a document instance a grammar in fact tells you nothing reliable, a solution is what DITA does: stop caring about the grammar as an artifact and care only about the set of (abstract) vocabulary modules the document says it (may) use. That is, actually declare the abstract document type in an unambiguous way and worry about validation details separately. DITA does this as follows: 1. Defines an architecture for layered vocabulary. The DITA standard defines an invariant and mandatory set of base element types and a mechanism for the definition of new element types in terms of the base types. All conforming DITA element types and attributes MUST be based on one of the base types (directly or indirectly) and must be at least as constrained as the base type (that is, you can't relax constraints). This is DITA specialization. It ensures that all DITA documents are minimally processable in terms of the base types (or any known intermediate types). It allows for reliable interoperation and interchange of all conforming DITA documents. Because the definitional mechanism uses attributes it is not dependent on any particular grammar feature in the way that HyTime is. Any normal XML processor (including CSS selectors) can get access to the definitional base of any element and thus do what it can with it. The definitional details of an element are specified on the required @class attribute, e.g. class="- topic/p mydomain/my-para ", which reflects a specialization of the base type "P" in the module "topic" by the module "mydomain" with the name "my-para". Any general DITA-aware processor can thus process "my-para" elements using the rules for "p" or, through extension, can have "mydomain/my-para" processing, which might be different. But in either case you'll get something reasonable as a result. 2. Defines a modular architecture for vocabulary such that each kind of vocabulary definition (map types, topic types, or mix-in "domains") follows a regular pattern. There is no sense of "a" DITA DTD, only collections of modules that can be combined into document types (both in the abstract sense of "DITA document type" and in the implementation sense of a "a working grammar file that governs document instances that use a given set of modules"). DITA requires that a given version in time of a module is invariant, meaning that every copy of the module should be identical to every other (basically, you never directly modify a vocabulary module's grammar implementation). Each module is given a name that should be globally unique, or at least unique within its expected scope of use. Experience has shown us that it's actually pretty easy to ensure practical uniqueness just by judicious use of name prefixes and general respect for people's namespaces. No need to step up to full GUID-style uniqueification ala XML namespaces. In addition to vocabulary modules, which define element types or attributes, you can have "constraint modules", which impose constraints on vocabulary defined in other modules. Constraint modules let you further constrain the vocabulary without the need to directly modify a given module's grammar definition. Again, the rule is that you can only constrain, you can't relax. 3. Defines a "DITA document type" as a unique set of modules, identified by module name. If two DITA documents declare the use of the same set of modules then by definition they have the same DITA document type. This works because of rule (2): all copies of a given module must be identical. So it is sufficient to simply identify the modules. In theory one could go from the module names to some set of implementations of the modules although I don't know of any tools that do that because in practice most DITA documents have associated DTDs that already integrate the grammars for the modules being used. But it is possible. The DITA document type is declared on the @domains attribute, which is required on DITA root elements (maps and topics). Note that you could have a conforming DITA vocabulary module that is only ever defined in prose. As long as documents reflected the types correctly in the @class attributes and reflected the module name in the @domains attribute the DITA definitional requirements are met. It would be up to tool implementors to do whatever was appropriate for your domain (which might be nothing if your vocabulary exists only to provide distinguishing names and doesn't require any processing different from the base). Nobody would do this *but they could*. Thus DITA completely divorces the notion of "document type" from any implementation details of grammar, validation, or processing, with the clear implication that there better be clear documentation of what a given vocabulary module is. Cheers, E. ---- Eliot Kimber, Owner Contrext, LLC http://contrext.com On 5/4/16, 11:06 AM, "Steve Newcomb" <srn@coolheads.com> wrote: >Eliot, > >In order to avoid potential misunderstandings, I think it might be worth >clarifying your position on the following points: > >(1) Resolved: the whole idea of entity identity was a mistake, is >worthless, and is evil. > >(2) Resolved: the whole idea of document type identity was a mistake, is >worthless, and is evil. > >I have deliberately made these statements extreme and obviously silly in >order to dramatize the fact that, even though there are problems with >SGML's and/or XML's operational approaches to them, we cannot discard >these ideas altogether. The ideas themselves remain profound and >necessary. They will always be needed. The usefulness of their various >operational prostheses will always be limited to certain cultural >contexts. Even within their specific contexts, those prostheses will >always be imperfect. They will always require occasional repair and >replacement, in order that they remain available for use even as that >context's notions of "entity", "document", and "identity" continue to >evolve and diversify. > >The operational prostheses with which these ideas were fitted at SGML's >birth are things of their time. That was then, this is now, and "time >makes ancient good uncouth". Their goodness in their earlier context is >a matter of record; they were used, a lot, for a lot of reasons and in a >lot of ways. At the time, it was not stupid or evil to make the notion >of document type identity depend on the notion of entity identity, nor >was it stupid or evil to make the notion of entity identity dependent on >PUBLIC identifiers. And in many ways, it still isn't. What is your >proposed alternative, and why is it better? > >Steve > >On 05/04/2016 11:23 AM, Eliot Kimber wrote: >> SGML requires the use of a DTD--there was no notion of a "default" DTD. >> This requirement was, I'll argue, the result of a fundamental conceptual >> mistake--understandable at the time but a mistake nevertheless. >> >> The conceptual mistakes that SGML made was conflating the notion of an >> abstract "document type" with the grammar definition for (partially) >> validating documents against that document type. That is, SGML saw the >>DTD >> as being equal to the definition of the "document type" as an >>abstraction. >> But of course that is nonsense. There was (remains today) the misguided >> notion that a reference to an external DTD subset somehow told you >> something actionable about the document you had. But of course it tells >> you nothing reliable because the document could define it's "real" DTD >>in >> the internal subset or the local environment could put whatever it wants >> at the end of the public ID the document is referencing. >> >> Consider this SGML document: >> >> <!DOCTYPE notdocbook PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" [ >> <!ELEMENT notdocbook ANY > >> <!ELEMENT bogus ANY > >> ]> >> <notdocbook> >> <bogus><para>This is not a DocBook document</para></bogus> >> </notdocbook> >> >> This document will be taken as a DocBook document by any tool that >>thinks >> the public ID means something. But obviously it is not a DocBook >>document. >> It is, however, 100% DTD valid. QED DTDs are useless as tools of >>document >> type definition. The only reason the SGML (and now XML world) didn't >> collapse under this fact is that the vast majority of SGML and XML >> authoring and management tools simply refused to preserve internal >>subsets >> (going back to the discussion about DynaBase's problems with entity >> preservation). >> >> Standoff grammars like XSD and RELAX NG at least avoid the problem of >> internal DTD subsets but they still fail to serve as reliable >>definitions >> of document types in abstract because they are still only defining the >> grammar rules for a subset of all possible conforming documents in a >> document document type. >> >> Because of features like tag omission, inclusion exceptions, and short >> references, it was simply impossible to parse an SGML document without >> having both its DTD and its SGML declaration (which defined the lexical >> syntax details). There is a default SGML declaration, but not a default >> DTD. >> >> A lot of what we did in XML was remove this dependency by having a fixed >> syntax and removing all markup minimization except attribute defaults. >> >> XML does retain one markup minimization feature, attribute defaults. >> Fortunately, both XSD and RELAX NG provide alternatives to DTDs for >> getting default attribute values. >> >> Cheers, >> >> Eliot >> ---- >> Eliot Kimber, Owner >> Contrext, LLC >> http://contrext.com >> >> >> >> >> On 5/4/16, 6:16 AM, "Norman Gray" <norman@astro.gla.ac.uk> wrote: >> >>> Greetings. >>> >>> (catching up ...) >>> >>> On 29 Apr 2016, at 17:58, John Cowan wrote: >>> >>>> On Fri, Apr 29, 2016 at 8:54 AM, Norman Gray <norman@astro.gla.ac.uk> >>>> wrote: >>>> >>>> In the XML world, the DTD is just for validation >>>> >>>> >>>> That turns out not to be the case. There are a number of XML DTD >>>> features >>>> which affect the infoset returned by a compliant parser. If they are >>>> in >>>> the internal subset, the parser MUST respect them; >>> I stand corrected; I was sloppy. I think this doesn't change my >>> original point, however, which was that in SGML the DTD was integral to >>> the document, and to the parse of the document, and that it's easy to >>> forget this after one has got used to two decades of XML[1]. I can't >>> remember if there was a trivial or default DTD which was assumed in the >>> absence of a declared one, in the same way that there was a default >>>SGML >>> Declaration, but taking advantage of that would probably have been >>> regarded as a curiosity, rather than normal practice. >>> >>> In XML, in contrast, the DTD has a more auxiliary role, and at a first >>> conceptual look, that role is validation (even though -- footnote! -- >>>it >>> may change other things about the parse as well). Thus _omitting_ an >>> XML DTD (or XSchema) is neither perverse nor curious. >>> >>> Practical aspect: When I'm writing XML, I use a DTD (in whatever >>>syntax) >>> to help Emacs tell me if the document is valid, but I don't even know >>> whether the XML parsers I use are capable of using a DTD external >>> subset. That careless ignorance would be impossible with SGML. >>> >>> The rational extension of that attitude, of course, is MicroXML, which >>> (as you of course know) doesn't use any external resources at all, and >>> doesn't care about validation. >>> >>> Best wishes, >>> >>> Norman >>> >>> >>> [1] Hang on, _two_ decades?! I've just checked and ... 1996 doesn't >>> seem that long ago. >>> >>> >>> -- >>> Norman Gray : https://nxg.me.uk >>> SUPA School of Physics and Astronomy, University of Glasgow, UK >>> >>> _______________________________________________________________________ >>> >>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS >>> to support XML implementation and development. To minimize >>> spam in the archives, you must subscribe before posting. >>> >>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ >>> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org >>> subscribe: xml-dev-subscribe@lists.xml.org >>> List archive: http://lists.xml.org/archives/xml-dev/ >>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php >>> >> >> >> _______________________________________________________________________ >> >> XML-DEV is a publicly archived, unmoderated list hosted by OASIS >> to support XML implementation and development. To minimize >> spam in the archives, you must subscribe before posting. >> >> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ >> Or unsubscribe: xml-dev-unsubscribe@lists.xml.org >> subscribe: xml-dev-subscribe@lists.xml.org >> List archive: http://lists.xml.org/archives/xml-dev/ >> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > > >_______________________________________________________________________ > >XML-DEV is a publicly archived, unmoderated list hosted by OASIS >to support XML implementation and development. To minimize >spam in the archives, you must subscribe before posting. > >[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ >Or unsubscribe: xml-dev-unsubscribe@lists.xml.org >subscribe: xml-dev-subscribe@lists.xml.org >List archive: http://lists.xml.org/archives/xml-dev/ >List Guidelines: http://www.oasis-open.org/maillists/guidelines.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|