|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A multi-step approach on defining object-oriented nature o
Paul Prescod wrote: > The use of the word "data structure" is confusing. I see "data structure" as being an > aspect of implementation and unrelated to the data on the wire. A purchase order schema > is not defining a "data structure", it is defining an XML "vocabulary" or "file format". I would say that the primary feature of XML documents is that each is explicitly a data structure. I am actually astonished to see that questioned. A DTD of schema describes the data structure to which a document conforms, but any instance of simple well-formed XML no less explicitly presents its data structure through markup. > I will defend to the death the right of applications to define their own implementation > data structures without any concern whatsoever for the rest of the world. But I cannot > understand why you deny the widely held belief (held both inside and outside the XML > world) that the signature of a function is both its input and its output and that both > should be defined formally. Because that design traps an application in a weave of shared a priori agreements, which vitiates the expertise of the application. Applications are valuable because they execute functions with particular expertise. Very significant components of that expertise are data collection and instantiation on the input side and on the output side the presentation of a form most specifically suited to expression of the value which the expertise of the application has added. If either of these is constrained by anything other than the expertise implemented in the application itself, that expertise is thereby compromised. It is because of the particularity of their expertise that a pair of applications is unlikely to have a single specific data structure in common, as the best expression both of the expert output of one and the particular data input requirements of the other. Three applications are orders of magnitude less likely to share such a structure. Yet the internetwork topology of the Web--and, I would argue, your own REST principles--operate because of the publication of each output for the potential use of many applications for different purposes. This freedom to share is implemented with http verbs. It is madness to constrain it further by demanding that interprocess communication--already well implemented by http verbs--be cast in agreed data structures which serve the particular needs of neither party to a communication and, in fact, constrain both parties from the necessary exercise of their expertise in data collection or output presentation. > Well, no, there are (at least?) three different roles for input validation. The first is > merely to offload syntactic error checking from your application to a purpose-built > component, the schema validator. The second is to communicate these expectations to a > third party (to create either a compatible producer application or an alternate consumer > application). The third is to build agreement between multiple parties before > implementation begins. I believe you are concentrating on the last. Only to damn it, to inveigh against it, and to persuade implementors that that way lies madness. > It would really help if you could provide some specific, concrete examples. The usual > model is that in order to buy something you submit a purchase order in a well-known > vocabulary and receive in return a receipt (modulo all kinds of negotiations, acceptance > protocols, etc.). Input->function->output. Input and output are publically, formally > defined. The function is defined only by its implementation. I can show this in > mind-numbing syntactic detail if it helps. Now what does your > model look like? [These are much repeated examples from my day job, offered here with apologies to those who may have seen them more than once before.] In my world, in order to buy something you submit an order in a vocabulary well-known to the order document creator but quite possibly unknown and unusable on its own terms to the application which will fill the order. A money manager in heartland USA has never before submitted an order to buy securities in Malaysia, but is now persuaded by a salesman (of Malaysian securities, presumably) that this would be a good idea. That money manager has a computer system which it must use to produce that order because all of the automation of the compliance and regulatory reporting tasks required of that money manager is built around the basic operations of that system. Therefore as processed by that system the buy order produced as output is specific to the US-only domestic form which makes up the majority of that money manager's business. Unfortunately, that form is unknown, and contains content which does not apply to, the Malaysian order execution application which must now process it. In the early 1980's we solved these problems by building massive any-to-any transformation switches, capable of going from the output of any process used by any of our customers or their counterparties to the input of any other application to which we had ever seen it connected. This is a disaster (though still the norm in our industry). The permutations increase geometrically, as does the complexity involved in applying any changes required by the input or the output of any one application. And that is before you deal with the scoping issues of which pairs of application (and in which order of operation!) have private understandings of each other's vocabularies. For the past 12 years (first with homegrown syntactic rules, and since 1998 with well-formed XML) we have built and operated all of our systems on the principles I am promoting here. We assume that the form of order output by the money manager's system is fixed by the expertise (and in this case the legal compliance and reporting requirements) of its operation in its local milieu. We also assume, which the designers of that system did not, that its output will have to be used as input by other applications which have never seen anything quite like it. Correspondingly, we assume that the only output of the Malaysian order processing application is an order execution of the form understood in Malaysia, but that that application will have to take in 'orders' from all over the world, each in a format local to its origin. Building systems which put expertise first turns out to be straightforward and, even better, when existing systems are adapted to operate in this way all of their specialized processes can continue to run unchanged. These systems can be quite cleanly wrapped in a data instantiation layer which looks at the internal data structures of existing processes in order to derive the form of data presentation which the application requires if it is to operate. In the case of the Malaysian order execution application, the data instantiation layer begins with the assumption that the input document presented is, in some sense, an order because it appears at this location where orders are presented. It may, of course, turn out not to be an order, in which case it will have to be rejected because the order execution application can do nothing with it. The data instantiation layer begins by searching its locally-maintained history of previous successful data instantiations for a match first on the provenance of this document or, failing that, on its element structure. In practice, the overwhelming majority of orders from a given source exhibit exactly the same structure, and the internal application data structure required can therefore be immediately instantiated on the model of previously successful instantiations in the history. Where a given instance is different from the usual pattern from that source, the change is usually small and quite often actually occurs in some portion of the offered input which is not used by this particular application. In such cases, again, the locally required data instantiation can be accomplished immediately. Where there does not appear to be another order from the same source recorded in the history, the correct instantiation will very often be immediately identified from the form of the input presented. This is simply an acknowledgment that in a given specialized domain there are often only a few software vendors, and while this Malaysian node may have never seen an order from this particular money manager before, it has probably seen an identical form of order from another money manager running the same order creation software. Failing both of those routes to instantiation, there are still only a few fields which are likely to appear in a securities order, and only a subset of those are of interest to this Malaysian order execution software. Quite often some, even all of the fields of interest can be identified through the form of their content as examined through simple regex processing. I hesitate (greatly!) to use the term 'brute force', but in my experience there have also been quite a few cases of data identified and correctly instantiated because the data instantiation knew that there were, for example, only five fields of interest to the application, and it did not take much compute time to try instantiating every field presented as every one of the possible fields of interest, to see if any permutation gave a whole which made sense. Bear in mind also that in securities processing every step of execution is followed by a step of comparison of the outcomes between counterparties. In the extremely unlikely case that a brute force instantiation attempt through all the permutations of input to internal data fields could have resulted in a sensible whole of a data structure, which then successfully executed in the application--if against those astronomical odds the data was in fact processed in error, that error will be picked up at the very next step, when it fails to compare. Yet there will of course be some very small number of input documents which the instantiation layer can do nothing with, particularly when it is seeing a form of data input for the first time. Humans will have to get involved here. But, again, securities processing has long established the standard business practice of kicking out exceptions for fixup offline. Getting humans involved in the instances where truly necessary is not a case of abdicating in the automation of domain expertise, but actually the appropriate deference to industry practice. > > ... The autonomous processing nodes of the > > internetwork topology are of value because of what they produce, which is to > > say the expertise which they implement. > > Not necessarily. Sometimes they are of value simply because they know how to accept > information. Knowing how to accept information--that is, knowing how to instantiate it--is knowing how to process it. As implemented, it is, in fact, processing it. That is one of my two main points. (The other is that *how* a process knows how to instantiate data is based on its internal data structure, not on some external schema.) > It is the *side effects* of this information acceptance (i.e. the shipping of running > shoes to a consumer) that is of value. The same goes for email. My computer produces > little interesting output after accepting an email. This is not side effects; this is process outcome. Process outcome is the intended effect, and in fact I am bucking the orthodoxy when I insist that it is the *only* intended effect. The purpose of executing a process is to produce an expert outcome, not to fulfill the expectations of some upstream process which believes that it is invoking another process by presenting it with a given data structure. > Perhaps you differ in your view of the architecture because you have a different problem > domain than most other > people. Perhaps. I hope that I have adequately illustrated it above. > How do you get the output until you've given input? Autonomous processes produce very particular output and publish it in Web-accessible locations. (This strikes me as the soul of REST.) Go get it and see what it is. If you declare that you are an interested party to that output, even if it is only a sample, and you satisfy standard Web tools controlling access to it, you should be permitted to GET it. > When you say "discovery" do you mean the human being looking out the output at design > time or the runtime software process doing so without human intervention? I can't say I > enjoy picking through electronic scat but either do I know how to write a program that > can do it. As detailed above, this is not at design time but at the time of resolving exceptions from run time. I think I describe above how much of this picking through scat can be automated, and what small percentage of it falls to humans. > How can a process select the data it will work upon without a priori knowledge of the > semantics of the element types. It has a priori (and authoritative) knowledge of its own data needs. > If you use "P" to mean paragraph and I use it to mean purchase order (perhaps even with > the same namespace qualification) then the process cannot reliably act on the data it > receives from us. This is the usual objection and is a red herring. It doesn't matter what I use "P" to mean, or what you use it for. We have not set out to agree on a data structure or a standard data vocabulary. All that my data instantiation model is concerned with is whether in your "P", or anywhere else in what you present me, I can find what I need to instantiate the particular data structure which is my true prerequisite to executing an instance of my expert processing. > I feel, therefore, that semantics must be agreed upon in advance (or at least mappable to > agreed semantics). Standardizing syntax in a schema at the same time bolsters this > semantic agreement and makes application writing easier. Nope. Otherwise my specific expertise, and its requisite domain vocabulary, become muddied with yours and both of our processes are dragged down by it into a common denominator of mediocrity. Respectfully, Walter Perry
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








