|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: The privilege of XML parsing [in an internetworked web]
"Henry S. Thompson" wrote: > Indeed -- that's why W3C XML Schema _loosened_ the binding between document and > schema, compared to XML 1.0 wrt DTDs -- an application (read 'consumer') is free > to mandate its own W3C XML Schema (or none) in preference to whatever the author > provides. What's the problem? The problem is that, whether the schema is specified by the document creator or by the document consumer, we are still looking at constraints upon input, or at questions of conformance decided by the form of input to a process. This is not surprising: it is our heritage from SGML's fundamental concept of validity (or for others of us, from the fundamental concept of programming to an interface). However, such notions of validity or interface conformance are inimical to the fundamental architecture of the Web or, more broadly, to processing in general in the internetwork topology. Web architecture does not specify interfaces nor validity constraints upon documents input to processes. Web architecture specifies the use of the http verbs PUT, POST, GET when given URLs as arguments. Web architecture is about resolving a URL to PUT or POST a document to the location addressed, or about dereferencing a URL in order to GET an entity body representing the current state of a document. There is no Web mechanism to specify, let alone enforce, validity or interface conformance constraints upon the use of http verbs given URLs as arguments. A process making appropriate use of the Web architecture or internetwork topology has clear-cut, well-understood standard methods on-the-Web for publishing the documents which it produces or for fetching entity bodies representing the documents which others publish. There are no standard on-the-Web methods for verifying--let alone enforcing--validity or interface conformance (nor even any generally accepted means on-the-Web for specifying those constraints or enforcing their association with particular documents). Validity or conformance checking of input takes place inside the process boundary of an idiosyncratic operation, and therefore effectively off-the-Web. This distinction is not specious. Understanding that any validity or conformance checking or enforcement must be done by idiosyncratic code inside the boundary of a particular process is very helpful in understanding the mechanism by which any data moves onto or off of the Web through that process boundary. What moves on the wire is an entity body--and if properly standards-compliant, one accurately described by a MIME type. In this mechanism, XML is in no way privileged above other types, nor despite the history of the Web is HTML. However, because it is XML, that entity body must at the process boundary first be parsed and successfully verified as well-formed. From that point what happens to it is under entirely local control within the process but is implicitly off-the-Web because its current form--whatever it may be as the output of a parse--is not the entity body which legitimately travels the internetwork. With XML the normal procedure is to build a tree on the output of that parse. How that tree is shaped by the particular processing of includes, links, entity expansions, etc., and how that tree is decorated on any one occasion by type information or by various annotations is controlled by the local process and clearly may result in idiosyncratic outcome. In other words, there is no reason to believe that the tree instantiated by, and within the boundary of, any particular process will conform to any other tree, particularly not the tree which the original publisher of a document might have had in mind. This is the privilege of XML parsing: the entirely local control of how a data structure is instantiated on the output of the parsing which is required when an XML entity body is brought into a process. Whether or not the choice is to instantiate a structure specified by the publisher of the original document or to enforce validity constraints that publisher prefers, the choice is local within a process which is opaque to that original publisher. It is therefore a choice to use the consumer's data structure and validity constraints, even if what is chosen comes from, or is approved by, the original document publisher. I think a fundamental misunderstanding is that interoperability requires the instantiation of the same data structure at each of two interoperating processes. That is the fundamental assumption of two-phase commit, but it is an assumption which can be implemented only within an homogenous traditional enterprise network. It is also the underlying assumption of validity, which is why validity is incongruous and in general unachievable on the Web. On an internetwork, internal operations of processes which might seek to interoperate are opaque to each other, including the data structures which they expect as input. Outside the process boundary, on the internetwork, there are only the entity bodies of documents, ideally conforming to appropriate MIME types. Interoperability is achieved when one process can use the output of another--that is, what is published at a URL and can be retrieved with an http GET--for its own purposes. Necessarily, 'for its own purposes' means that the consuming process instantiates a data structure specifically suited to the operation of that process. As it is virtually inevitable that structure will differ from the structure used by an upstream process for its particular purposes, interoperability is based on a particular instance entity body shared through the operation of http verbs. That entity body is the very stuff which moves on the Web, but it is not a data structure as is required by the operation of processes, nor is it an archetype for such structures. That entity body is itself a concrete instance and on a particular occasion might be the nexus through which processes interoperate, whether or not its content, or content model, is in any way specific to what a receiving process operates upon. So, finally, no 'loosening'--short of disconnection--between a document instance and a possible schema for that instance is sufficiently loose to fit--or be natively implementable in--the Web architecture. Processes operating on the Web may use Web verbs to effect particular connections on a particular occasion which result in an idiosyncratic data structure appropriate to that operation of a particular process. This is utterly at odds with the premise of validity, which insists that, however a document is connected to a particular schema, it must conform to that schema before it might legitimately be processed. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








