Re: external parsed entites (was: A unique ID question ?)
Len Bullard wrote: > > Tim Bray wrote: > > > > At 06:30 PM 11/9/99 -0600, Len Bullard wrote: > > >We can't bag the cat once it is in the alley > > >unless we are faster than the cat, or know where it sleeps. In either > > >case, we will have a mad cat when we take it out of the bag. > > > > I enjoyed reading that but I have *no* idea what you mean, Len... -T > > External parsed entities are a done deal for XML 1.0. > But hey, just a version number, right? We get to > have this fight again. :-) > > If you and eliot truly think they are bogus and a > screwup to use, sounds like an issue to me. > > How do subdocs fix this problem, Eliot? Or > better, what problems do subdocs fix for markup systems? By "subdocs" I presume you mean the use of separate documents to define a single logical "compound" document (the SGML concept of "subdoc" is in fact a red herring--what's important is the use of separate documents, not the fact that they are declared as SUBDOC entities--thus the lack of SUBDOC in XML is absolutely no loss). It solves the problem by forcing you to use and manage truly reusable objects, rather than justing doing syntactic copy and paste through the inclusion of external parsed entities. It also forces you to recognize that all use-by-reference needs to occur at the semantic (DOM, grove) level, not the syntactic level. Once you realize that, all sorts of apparently hard problems or non-sensical cases become easy and quite sensical, because we are out of the syntactic domain and into the semantic domain. For example, it is non-sensical for one element to be replaced by another element in another document at the syntactic level (parse time), but it is perfectly sensical for an element node to redirect to another element node in another DOM and the processing needed to achieve this is trivial: # See if node is a redirection and resolve it: try node.atts["redir"]: pointer = node.atts["redir"].value node = xpath.resolve_pointer_to_node(pointer) except IndexError: # No redir attribute, just go on as before It couldn't be easier at this level. All that's required is that your processing software understand the semantics of the documents that might be pulled together this way, which might of course use different document types, but that's no different from needing to understand a document that uses elements with different namespace prefixes, so it's a constant part of the XML processing problem and this approach doesn't change it (except to possibly make it both more obvious that the problem exists and clearer as to how you define a framework for handling the case). The problem with external parsed entities is that they are not true objects in the sense that they have no independent existence outside the contexts in which they are used--they cannot be parsed or validated in isolation and must conform *syntactically* to all the contexts they are used. Thus the problems with ID conflict, entity names, etc. When using multiple documents, each element maintains its original document context and therefore its fundamental identity, so there is no possibility of ID or name conflict and you can always examine the element in its original context as well as in any contexts in which it might be used. Of course, if you want to write a transform that generates a new single instance as output, you have to disambiguat the names and IDs, but there's no programming difficulty there, it's just an exercise in rewriting of pointers and, possibly, applying name-space prefixes to element type names (if you're so inclined). But this is only one way to take advantage of semantic use-by-reference--you should never assume that the processing result of compound document processing is another XML document. Using GroveMinder we have a grove-aware browser that does all this resolution dynamically at run-time, generating HTML as output. There's nothing particularly difficult or inventive about this except that we did it (and that it happens to implement the part of the HyTime standard that deals with use-by-reference, the value reference facility). Note also that if, for example, XLink made a clear distinction between use-by-reference relationships and hyperlink relationships, that it would be clearer how one can have a highly-generic, standards-based infrastructure for doing this stuff. As it is, you can define your own conventions for using XLinks to mean use by reference (the "show=embed" part of XLink is almost there, but it is not sufficiently flexible. For example, it doesn't let you define the value of an attribute by reference, which is quite useful, if not a hard requirement for certain problems). By using independent documents you get objects (documents) that have their own independent existence. They can be reliably re-used because they are combined with other documents *semantically*, not syntactically, at the processing level. That is, I construct a bunch of DOM trees (or groves) and then another layer of processing decides how to use them together. No document directly interfers with any other. Of course, there will be dependencies between the documents, such as one document linking to something in another document. Because the processing of compound documents is a separate layer, there can be many different ways of processing the same compound document and therefore different sets of constraints that you might want to enforce. You can have a policy that says all the members of the document must have the same DTD or maybe you don't care because your processing isn't DTD sensitive (e.g., a generic XML structure browser). The system can be more or less sophisticated depending on your requirements. You don't have to do twisted things like use name spaces to disambiguate element type names in the source documents or rationalize all the documents to a single over-arching document type. Individual documents can be optimized for their own local purposes and still combined together meaningfully with other documents, given some rules for playing nice together (e.g., Architectures, architypes, etc.). Once you've built the infrastructure to handle this way of doing things, the range of problems you can solve and the range of requirements you support increases dramatically and the incremental cost of the system drops rapidly. I note that tools like Framemaker and Wordperfect (and I think even Word) have *always* worked this way. In Framemaker, for example, a "book" is composed of multiple documents. Each document is completely syntactically independent of the other documents in the book: it can have its own templates, customizations, etc. The addressing between documents for cross references and hyperlinks is document-to-document addressing because each document establishes its own ID name space in Framemaker. There is no syntactic interference between different documents in the same book. [NOTE: this is not true for the SGML version of Framemaker--for whatever reason, the designers chose not to carry this model into the SGML version, which they could have done very easily.] Cheers, E. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format