Which in short form, the attention to
detail of versioning is a matter of the business(es) affected by a change and
the *tools* for propagating change
to both human and automated users in terms of the cost of the change.
This is one of the most daunting
challenges to the medical community. The impact of changes can be
very positive but the costs are commensurate. The complexity of the
domain lists created over literally centuries of practice is incredible.
Hmmm... Roger, in complexity theory
have you ever come across a term for "resistance" to versioning vs.
patching? We might use 'density' or 'arabesqueness'
metaphorically, but there should be a mathematical construct that describes
this. Is a phase transition management system 'beyond' a
versioning system? Is there a type of criticality control parameter
affecting phase transitions that could be associated with versioning?
Temperature/Energy/cost comes to mind but in that sense, a versioning system is
a means of tuning external parameters and percolation is a suitable model.
http://www.worldscibooks.com/physics/p365.html
len
-----Original Message-----
From: Cox, Bruce
[mailto:Bruce.Cox@U...]
Sent: Monday, December 10, 2007
6:02 PM
To: Greg Hunt;
xml-dev@l...
Subject: RE: Data
versioning strategy: address semantic, relationship, and syntactic changes?
Greg, Roger, I hope you
won't mind if I give some of your interesting ideas a bit of a reality
test.
In summary:
For us, changes are usually
business driven and decided on cost, and, no, it makes little or no difference
what kind of change it is.
In exhausting detail:
At the USPTO, our versioning
strategy for the DTD's and style sheets used for patent publications is
driven almost entirely by cost. When a change in a business process
provokes a change in patent publications (about 10,000 documents per week), we
look at the entire pipeline, including data source, storage, processing,
validation, export to publishing contractor, publication, dissemination,
consumption by internal search systems, consumption by international exchange
partners, consumption by commercial value-added resellers, archives, and final
disposition. Changes to the governing DTD and style sheets are based on
that entire analysis. To the extent possible, changes are made no more
frequently than annually and announced six months in advance, primarily so that
everyone can get the funding in place in time, make changes, test changes,
notify customers, test changes, retrain staff, test changes, update product
descriptions, test changes, etc. We like to test changes on a minimum of
two or more weeks of data (20 to 40 thousand documents), but sometimes do it
across many months of data through parallel runs.
Granted, our universe is limited
in scope. There are only about 120 patent offices in the world, only a
handful use our XML data, and there are fewer than 50 value-added resellers who
use our XML data that we know of. Nevertheless, we identify all changes
to everyone we know to be using the data, since we cannot predict what will or
won't break someone else's system. Our business is such that
we cannot even dream of placing any constraints on the consumers of the
data. If we miss some of the unknown users, and a change breaks their
system, we usually hear about it, especially if it tends to put them out of
business. This has happened with the most innocuous or seemingly trivial
of changes as well as the more dramatic changes. Sometimes we can fix it,
sometimes not; you can imagine the rest ... .
It has happened here more than
once that some bright idea that seemed to solve a major problem received enough
analysis to for us to realize that the cost of implementation far outweighed
the benefit. All our changes are "strong" in the sense of
being well-specified. If they aren't well-specified, they become well-specified,
or they don't survive analysis and don't get implemented.
Even the bright ideas that are ultimately abandoned have to be sufficiently
well-specified to determine if they can be implemented.
Ontologies and such are usually
indecipherable to those who don't know the business they describe, and
superfluous to those who do. Most major business changes in the patent
system occur as a result of an act of Congress or as the outcome of some
litigation. In both cases, the Office writes rules that set the meaning of
terms for better or worse (and sometimes get revised accordingly), usually
based on the language used by Congress or the court. I don't think
there is any mechanical substitute for learning the business you want to engage
with. The world of commerce is far too dynamic for that. In any
case, all changes bite someone, hard or not, sooner or later, so we have little
choice but to treat them all as much the same, so we don't categorize
them in any way, once agreed.
During analysis, we take into
account the expected benefit as compared to cost, where it can be sometimes
useful to understand a change as syntactical only (very low cost as a rule), or
structural (more costly, depending on the scope). Semantic changes are
always very costly in the sense of having to retrain habitual users of the data
in the new interpretations required. However, this rarely impacts the DTD
(unless there are corresponding changes in structure as well) and is therefore
not usually funded from the IT budget. Nevertheless, considerations for
the cost of training can stop an inexpensive DTD change.
There are a number of http://www.wipo.int/standards/en/part_03_standards.html
that document the meaning of industrial property terminology. These formed
the basis of the vocabulary used in http://www.wipo.int/export/sites/www/standards/en/pdf/03-36-01.pdf, which the USPTO implements as http://www.uspto.gov/web/offices/ac/ido/oeip/sgml/st32/redbook/index.html. For the most part, for a given element name, all the member
states of WIPO assign the same meaning. However, the harmony is often
somewhat superficial, hiding a multitude of variations in rules, traditions,
and understanding, among the member states. That there is as much
agreement as there is might be considered an achievement worthy of note.
Without that, I dare say ST.36 could not exist.
And yes, the intellectual property
community uses those two-letter ISO country codes for a number of purposes,
including place of birth, primary residence, place of filing, mailing address,
agent's address, states designated under the PCT, etc., etc. http://www.wipo.int/export/sites/www/standards/en/pdf/03-03-01.pdf incorporates, sometimes modifies, and even augments the list
with codes for regional authorities that play the role of a patent office for
more than one country. WIPO member states frequently revisit the list as
political boundaries change, since the scope of patents is generally limited to
a political territory. Countries usually enact legislation defining the
changes in scope of the rights attached to a patent corresponding to the
changes in political boundaries.
Bruce B Cox
Manager, Standards Development
Division
U.S. Patent & Trademark Office
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.