[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Ontologies

  • From: "W. E. Perry" <wperry@f...>
  • To: XML DEV <xml-dev@l...>
  • Date: Wed, 20 Dec 2000 10:26:30 -0500

topic maps scalability
Martin Bryan wrote:

> Walter Perry wrote:
>
> >In fact, aren't we ready to go the whole
> way and acknowledge that the ontologies addressable at the nodes of a
> semantic
> web must in fact *be* executable against the various inputs we put to them?
>
> Not unless we agree a fixed API for returning node data in a given language.

No. As I argue regularly at length (and will spare you a full version of here),
the scalability and adaptability of a semantic web (or whatever we call this
network of autonomous nodes) depends in the first instance on each node's
ability to handle a significant variation of input presentation. Considered from
its own point of view, each node implements one or more processes. Within its
autonomous black box, that node knows what input it requires at the threshold of
executing a process and it knows the output which successful completion of that
process produces. From the node's viewpoint, the forms of both its input and
output are semantically fixed. That is, both input and output necessarily
exhibit some structure, and such structure necessarily implies some particular
understanding of the internal relationships of its constituent data and some
epistemological perspective on that body of data as a whole. That is, the form
of the data, together with such incidentals as what is included or not, conveys
significant semantics. A 'fixed API', in exhibiting such a structure,
necessarily conveys such semantics of its own. Those API semantics, however, are
unlikely to be the 'native' semantic understanding which both sender and
receiver bring to the data that is the substance of any specific exchange
between them. In a sufficiently complex network, an agreed fixed API is unlikely
to represent fully and accurately the semantic understanding of either party.
This has immediate and devastating consequences for the scalability and
adaptability of the system as a whole.

If there is a single API across the entire semantic web, then as each node grows
more specialized and the interactions among them more complex, an increasing
percentage of each node's work will be wasted on conversions in and out of the
API with each process it executes. The maintainer or designer of function
implementation at each node will face a similarly increasing percentage of
effort squandered in figuring out how to get from the data presented to the data
required in building the ever more complex functionality required at each
specialized node. This problem should look very familiar:  those of us who have
been harnessing processes distributed across different enterprises, national
practices and regulations, time zones, hardware and operating system platforms,
and interchange protocols have already lived through three generations of
proposed solutions just since 1980. Need I point out that it is just this
problem which the semantic web proposes to solve with nodes that understand the
substance of their interaction at a semantic, rather than a purely syntactic,
level (all of it based on the underlying ability of markup to assure
self-describing data)? Fine; then get on with it, but don't introduce the
eventually fatal bottleneck of conversion in every case through a static
intermediate level.

It is just as futile to try solving this problem by creating a separate API for
each class of node-to-node pair interactions. This is the flaw in the agreed
vertical market data vocabularies (ESteel, FpML, etc.--more than 2000 of them
when I last gave up trying to do an accurate count, as well as to discover
whether even one of them operated from a different premise). To the extent that
a particular node is truly expert--that is, that within the semantic web the
introduction or application of its unique ontology facilitates a semantic
outcome more elaborate, more nuanced, or in some other way more than the common
denominator of the vertical market vocabulary--that node requires a richer
semantic vocabulary to express the output of its process. To my mind, this is
precisely why we use *extensible* markup as our basic syntax. For any node to
make use of the particular expertise of another surely means using primarily
what is unique to the output of that node. This means using as an input what is
outside the standard API in the output of that node. So, in order both to have
access to the particular expertise of other nodes, and also to avoid in the
general case the proliferating waste of constant conversions into and out of
standard APIs, why don't we try a different premise entirely:  that it is the
responsibility of each node to instantiate for its own purposes (and therefore
in its unique native semantics) whatever it might take as input from the unique
output of another node.

Not only does this put the solution to the problem in the right place
philosophically, but as a practical matter it correctly factors the process
design, implementation and maintenance tasks incumbent on the expertise of each
node. The node's expectations for input data necessarily reflect its unique
understanding of the process it is to perform, the unique ontological structure
it instantiates. Part of what is unique about that node is the process by which
it gets data from a form in currency outside of it into the unique internal form
which directly expresses its epistemology. To share that form with an upstream
data provider might well mean sharing, for example, the algorithmic processes by
which that form is achieved, which may be the very raison d'etre of this node.
Why undo the useful factoring of a problem which incorporates this node into its
solution? It may also be that the data provider's understanding of its output is
fundamentally at odds with this node's understanding of that same data as input.
This touches directly upon the very nature of reuse. Surely one crucial premise
of a semantic web is that different nodes may take the output product of any one
and do vastly different things with it, predicated on entirely different notions
of how, within their individual ontologies, the product of that node should be
understood. This is why it is a semantic web and not a semantic pipeline, though
the thread of any sequence of processes within it might be most easily
understood as a unidimensional pipe. And, finally, the practical business of
instantiating input data afresh to fit the ontology of each node just isn't that
difficult in most cases. The node already knows what it needs, so the 'output'
side of the 'transform' (both contentious terms here, I know) is already fixed.
To the extent it is looking for some component of the input which is unique to
the data source node, behavior which utilitizes that unique element has already
been built into this receiver and with it, presumably, some hints to identify
what it is looking for. To the extent that it has some previous experience with
this data source node, this receiver has previous familiarity with the forms of
data which it has received and with how it has previously instantiated that data
for its own purposes. Finally, even where input data may appear at first
incomprehensible, it is often possible for the receiver to instantiate it by
brute force into something usable. The receiver knows, after all, the data
semantics which its own processes require. If by working the permutations of
instantiating what it receives into possible versions of what it needs, and then
by performing its unique processes upon one of those versions, the receiver is
able to achieve a useful--for it--outcome, no one else has the standing to say
that this was not a correct instantiation of the input data. The point is that
in all these cases there is processing code--unique to each node--to be written,
and nothing will magically obviate that chore. This is why I agreed so
wholehearted with Jonathan Borden's analogy of the specification of an ontology
to source code. If the 'ontological node' is going to *do* anything useful then
yes, of course, executable code is what is required.

>
>
> >Let us submit the same body
> > of input simultaneously to various different diagnostic
> methodologies--each
> > expressed as an ontology to which we can form a nexus at an addressable
> > node--and, provided that we can retrieve or address the output of each, we
> can
> > ignore the particulars of what happens in those opaque boxes.
>
> Works OK for short term data, but try looking at medical records over the 90
> year life of a patient on this basis and you will run into problems. Even
> Jonathon will admit that drugs get reclassified in their life-time. You need
> to know the classification at the time they were administered, not the
> classification today. Opium was de rigour in our grandparents time. Do you
> want it adminstered to your grandchildren?

As I hope I have very nearly exhaustively covered above (and I said that this
would not be the version in full!--sorry), this is simply a task fulfilled by
the proper factoring of the problem at each node, based upon the unique
expertise of that node. In the example case, if a constellation of symptoms is
submitted to multiple diagnostic nodes (good practice), there must then be a
node which effectively proxies for this particular patient, or perhaps for this
particular patient as treated under the philosophy of his trusted primary care
physician, which then evaluates the various diagnoses and prescriptives from the
unique perspective which none of the diagnostic nodes can have--that of the
unique individual who will implement the next step of the process.

Respectfully,

Walter Perry


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.