[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Public Identifiers
At 10:54 AM 9/20/98 -0500, Paul Prescod wrote: >If I may paraphrase: "FPIs only provided >a reliable way to interchange SGML data between heterogenous systems for >the last 15 years, and will continue to for the next 5 that it takes >symbolic linking to become popular on Microsoft platforms." To me, the >word "only" is out of place in such a statement. I would argue that it has not in fact ever been generally possible to interchange SGML data among heterogeneous systems in the way I think you mean. If you send me a package of SGML entities, it is up to me, the receiver, to make sense of them, including reworking any entity declarations and/or public identifier mappings that may be necessary. I spent the last years of my tenure at IBM transferring SGML documents between OS/2, DOS, and VM/CMS systems, and it was non-trivial to manage. To make it work, I had to set up what was essentially a homogenous system. The only hope SGML ever had for interchange was the SDIF mechanism, which defines a standard for packaging entities together so that automatic processes can unpack them on the target system--but even there the standard assumes rewriting of entity declarations to update system identifiers. Unfortunately, nobody has ever fully implemented SDIF in a publicly-available system (even though it shouldn't be that hard to do and would be mighty useful if done). [NOTE: ISO 9070 *DOES NOT* require the use of ASN1. Do not be fooled. You can use any mechanism you want for representing the package, even tar or Zip.] The idea that you might be able to interchange documents that refer to public entities (that is, entities that are somehow publicly available) is a nice one, but without a generally-available networking system for accessing those entities, it's only an idea. Today, only URLs come close to providing a useful way to name truly-public entities. Which is not to say that URLs are the best choice, just that they're our only option at the moment. It is the use of entity declarations that provides even a hope of interchange for SGML, not formal public identifiers. Public identifiers (formal or not, doesn't matter) help by giving you the option of being even more indirect but that's not requirement for deriving most of the benefit from entity declarations (centralizing the mapping from references to storage objects in the document instance to the storage objects themselves--that is, avoiding "embedded filenames" in instances). Paul is right that if SGML hadn't required some form of indirection, vendors never would have provided it, certainly not with the level of consistency we have today with SOCATS. But even there, I don't have a complete solution, because not all useful tools support SOCATs and not all support the latest version (ADEPT*Editor, for example, only supports the first version of the SOCAT spec, while SP supports the second). Steve Newcomb asks if there is a difference between FPIs and URNs generally and the answer from John Cowan was, correctly, "no, there's no difference". Public identifiers, and formal public identifiers in particular, are just a special case of URN. They have no unique properties beyond those of URNs generally (except, see below). They are not magic. They do nothing special (except part you with 80 or 90 US dollars if you want to have a registered owner name that is not an ISBN publisher prefix). ISO 9070 does standardize owner name registration mechanisms and there are three such currently implemented: ISBN numbers, ISO registered owner names (administered by the GCA, see <www.gca.org>), and Internet domain names (with TC 2 to ISO 8879). This has value because it does provide a pretty solid infrastructure for management of name ownership, one of the requirements for URNs generally. That said, I must admit that Paul's arguments, along with things others have said, have made me rethink my original statement that there's no useful distinction between URLs and URNs (but see below). It is still true that URLs can be just as persistent as URNs. However, it is also the case that we need a formal mechanism for associating names with name spaces, which is what URNs do. URLs have a built-in name space, namely the universe of resources on the Web (which is tautologically defined by those things you can address by URL, but no matter). In other words, we need to be able to say where to go to look up a name. It doesn't matter how direct or indirect that lookup is. Indirection isn't the issue (because we always have some amount of it, regardless of the addressing scheme--even a phone number is an indirection even though we tend to think of it as a direct address). It is always up to the machine doing the resolution of names in a particular space to provide appropriate optimizations--we shouldn't care what they might be when we specify a pointer to something. Thus, the concept of URN as a binding of name-space name to name is very useful, in fact, essential. Because we need to be able to point to things that exist in different universes (as Steve wants to do in his Topic Map example) and we want different ways of naming things (FPI, ISBN number, URL, etc.). But... I think that several errors of design have been made getting here: 1. The expectation a name engenders as to its persistence is a function of the name, not its use. Therefore, the PUBLIC/SYSTEM distinction made by SGML (and XML) is inappropriate as a matter of syntax. A name is a name and there should be exactly one declared for each entity. Within an SGML context, the formal system identifier mechanism (Annex A.6 of ISO/IEC 10744, see <ftp://ftp.ornl.gov/pub/sgml/wg8/document/n1920/html/clause-A.6.html>) could be used to distinguish formal public IDs from other forms of name, e.g.: <!-- Declare notations that represent storage managers: --> <?IS10744 FSIDR IS9070> <!-- Declare a storage manager, in this case, formal public identifiers: --> <!NOTATION IS9070 SYSTEM "ISO 9070//DOCUMENT ...//EN" > <!-- Now declare an entity that uses that storage manager: --> <!ENTITY foo SYSTEM "<is9070>+//IDN drmacro.com//..." NDATA SGML > 2. URLs are a special case of URN. Thus the term URI, meaning "URN or URL" is unnecessary and misleading. There are only URNs, of which URL is a special case where the prefix "urn:url:" can be omitted. URLs can be recognized because they don't start with "urn:", which all other URNs must. URLs are really an optimization of URN where the name space resolver is already known and all Web browsers must know how to resolve URLs (thus there's no need to apply the more general "look up the name space resolver" mechanism you must use with any other form of URN). If this design had been used from the start on the Web, then "urn:url:http://www.drmacro.com" would be recognized by all Web clients. Of course, URLs have this special status only within the context of Web browsers and data formats that give special meaning to the syntactic things that hold URLs (e.g., the "href" attribute of HTML). Outside this context, a URL would be no more privileged than anything else. In a different context, other forms of names could be privileged (as public IDs are in an SGML context). Finally, note that URNs as currently defined are simply *a syntax* (of an infinite possible number of syntaxes) for representing the binding of name-space to name. The formal system identifier example above is another and my suggestion of a few days ago for a <urn:name> element is a third. The current URN syntax is appropriate for use in HREF attributes, but it shouldn't be seen as the one and only way to do this binding. URN resolution mechanisms should be independent of the syntax used for the binding--they should simply expect two arguments, a name-space name and a name in that name space. How the client that makes the resolution request gets those two arguments is its business. Particular data representations can then define their own conventions for representing the binding, whether it's the current URN syntax or something different. 3. We've confused the persistence of names with the persistence of resources, which has lead us to think that URLs (and system IDs) are somehow fundamentally different from URNs (and public IDs). We've set the expectation that the naming method can solve problems when in fact it can't. The evidence that this expectation has been set is the fact that everything I read about so-called "persistent names" has gone out of its way to stress that names alone can't guarantee persistence. They wouldn't have to say this if people didn't expect it to be the case. Given that my analysis is correct, here's what I'd like to see happen: 1. A general recognition of the need for name-space/name bindings in data representation standards, regardless of the kind of data. If these bindings are further standardized along the URN lines (its semantics, not its syntax, necessarily), so much the better. 2. Given item (1), data management systems (including operating systems and networking systems) providing generalized name-space-to-resolver services that reflect the general approach defined by item (1). For Internet-based resources, the DNS proposal is probably appropriate and reasonable. 3. Web clients upgraded to accept "urn:url:" as a prefix to otherwise normal URLs. 4. People and enterprises providing non-URL name resolution servers. These could be along the lines of the PURL services currently being provided (and could probably be implemented with the existing PURL software). For example, Oasis could fund a couple of public identifier servers. Note that these services needn't be free--it costs money to maintain machines and it would be reasonable to charge people who wanted to provide published names for their resources a reasonable fee for it. And now, having said that SGML formal public identifiers have no special properties, let me point out that the fact that registered formal public identifiers are registered means that you could use owner names to direct public ID resolution to servers maintained by the name owner, rather than relying on a central FPI resolution server (that is, "DNS for FPIs"). If I understand the DNS-for-URN resolution proposal (which I very well may not, not being an Internet expert by any stretch), the ability to do this is inherent in the proposal. If we could do these things, and none of them seem to me to be that onerous, then we would, I think, be well on our way to realizing the dream of "universal names" with some hope that persistence, whatever you want that to mean, could be provided by those that care to. [As Robin Cover pointed out in private mail to me, we will always be dependent on human nature for these systems to work, and it is not always human nature to provide persistence for names, at least not outside the scope of your own Web server.] Cheers, Eliot -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 75202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|