|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why the Infoset?
Rick JELLIFFE wrote: > I disagree. I believe that we actually agree on the salient points and disagree only on question of how (and where) processing should be done (and processing decisions made). > The basis of SGML'86 was that the rooted, directed, cyclic graph with attribute-value > tree framework that allowed a handful of general distinctions on the edges (child, > parent, next, attribute, IDREF, etc) and a handful of general types on the named nodes > (element, comment, PI, etc) was, when coupled with a simultaneous rooted graph of > entities with a handful of general types (NDATA, sgml), was sufficient for an enormous > number of complex problems. On top of this information model, the need to cope with an > enormous number of possible notations and syntaxes. I accept and agree with this characterization of SGML. Without disturbing these premises in the cases where they are useful, the concept of well-formedness in XML 1.0 posits an entirely different starting point: effectively it is to dismiss the graph mechanism and the tree metaphor, the apparently necessary 'general types', and even the abstraction of a document node, and to deal instead solely with a mechanically correct syntax. > XML's WF-only is not "entirely" self-sufficient for anything to do with graphs. I never said that it was, for graphs or for anything else which the processor of a document must 'do'. I note, for example, that "the true content model of an instance document might be uniquely derived at the time and place of its use". That requires an appropriate processor, which makes appropriate assumptions and affords appropriate ancillary resources for that time and place. The instance document on which that processor acts is not 'entirely self-sufficient' for the execution of that process (if it were, it would qualify for my objection to SOAP); it is simply that the document is entirely self-sufficient *as syntax* for input to that processor. That is, the document does not require a DTD, schema, or stylesheet in order to be interpreted by the processor in a way which is appropriate for the unique circumstances of its processing. > (The problem is only fixed by hard-coding, i.e. XLink, and hardcoding requires > universal names, i.e. Namespaces using some public-identifer-like registration system, > i.e. URIs) You (and the Infoset) want these hardcoded in the document or registered in some canonical form at a fixed address. I (and the philosophy of WF, I assert) want these universalities to be determined by the processor as appropriate for the particular circumstances in which a document instance is processed. At this point in the argument I always use the same examples, but those examples are from working, production systems which I have built for many years and to which many millions of real dollars are committed daily. Consider a securities order ticket. The 'intended' use of that document is to instruct a trader to execute a buy or sell in a given security, subject to optionally given conditions. Yet in the full sequence of processing which that trade, once executed, requires there are necessary subsequent uses of that document which its creator may be only vaguely aware of, and for which he did not--and did not know how to--provide or specify the necessary 'universalities' for that document's subsequent correct interpretation. After execution, that order ticket must be routed for trade comparison, cashiering, securities receive and deliver, custody notification, regulatory compliance, and portfolio analysis. In all of these cases it *should* be the original order ticket document, rather than some re-statement of it, which is routed for input to a process entirely unaccounted for in the original composition of that document and utterly absent from whatever 'intent' might have been expressed by that document's creator. Re-using that same document not only avoids errors of transcription but makes the necessary auditable connection between that document and the outcome of such processes as cash payment, securities delivery and tax reporting. The only way to re-use that document appropriately is for each processor which acts upon it to resolve references, scope and links, and to execute transformation, in a manner which is unique to the circumstances of that particular process and dependent not upon any such universalities hardcoded in the document or available from some single canonical reference, but expressed as the particular expertise of that processor for the specialized job which it does. > What XML did was to say that a lot of users only need simple AV-trees, so lets allow > them to have them with little fuss. They are not necessarily trees. That metaphor is only one of many possible semantic elaborations, by an instance process, from simple WF syntax. > And it said that people could agree on a syntax. That is, in fact, all that WFness dares to claim. > I think this is too hard on schemas: what a schema does, in part, is specify which > additional constraints the document has more than WF XML. These constraints allow more > optimal handling of data: if I know that my content model is (a,b,c)+ and > that it is closed, then I can allocate a list with three slots for them and I know that > the XPath a[position()=1] on the parent will always succeed. If I know that an > element is a date, I can store it in a database as a date not a string. If I know that > a value or combinations of value is unique, I can use them as keys for faster access to > data. I agree entire with this. It is the point of schemas. The question is whether those schemas are pre-ordained or are derived in a form specifically meaningful and useful to the instance process. > The idea of a syntax with no schema/DTD is hardly new: in part, it was the infelicities > of these that caused SGML'86 to take such a strong and radical view: if anyone can put > any element anywhere, how can a consumer contractually require an information producer > to produce certain information? LISP or ADA or any of the languages with > position-independent > and nameable parameters provides the same basic capabilities as XML WF: why are they > not good markup languages? It is the ability to constrain data by schemas that is the > key. I agree with your statement of the problem. I (and, I assert, the WF philosophy) propose a radically different approach to its solution than do you and SGML'86. > If data were all atomic, and each datum was described by a universal name, and no two > documents were similar, then I think > William's view would be pretty close to the mark: documents could be ad hoc assemblages > of elements used by applications which handled each document as best it could, perhaps > with the aid of private schemas to check that all the information required was present. > But truth is not atomic: a number may be complex, a quantity will have a unit as well > as a value, a table has rows, love and marriage go together like a horse and carriage. This is the crux of the matter. In the instance document, understood purely as WF syntax, the data *is* effectively atomic and for that reason effectively useless. A particular processor running a particular process in particular circumstances is required to elaborate a useful instance data structure and, with it, locally useful semantics from atomic data supplied by the document, together with other inputs appropriate for that process. As Demokritos realized, there are only atoms and nothingness--all the rest is opinion--until the moment that something happens, that something is done, which for that particular purpose in those particular circumstances requires a particular instance structure, or opinion of the appropriate form, relationships and, by elaboration, semantics of that data. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








