Web Resource Identity
There is a new document on the W3C site which is both encouraging and disturbing: """In characterizing the structure and content of the Web, it is necessary to establish precise semantics for Web concepts. The Web has proceeded for a surprisingly long time without consistent definitions for concepts which have become part of the common vernacular, such as "Web site" or "Web page".""" """This document represents an effort on the part of the W3C Web Characterization Activity to establish a shared understanding of key Web concepts.""" http://www.w3.org/1999/05/WCA-terms/ It is encouraging because it is long needed. It is disturbing because I believe it identifies a key problem with the Web (or with my understanding of the Web). This document refers to the URI specification in its definition of "resource": "...anything that has identity." This is troubling because there is no definition of identity. In the HyTime and object oriented worlds, I believe that the defining characteristic of things with identity is that you can take two references and determine if they refer to the same object. I do not see how to do this on the Web. Consider the following URLs: http://www.mitre.org/index.html http://www.mitre.org/ http://www.mitre.org Do they refer to the same resource? Let's try the answer both ways: YES: How do we know, other than common sense? What if the URLs were more radically different -- if the mitre site was also accessible as miter because French and English authors always swap their r's and e's? I would love to hear that there is some such thing as a "canonical URL" that I can retrieve through HTTP or WebDAV. If there is, it should be referred to in WCA-terms. Because the Web has a distinction between Web resources and resource manifestations it is even possible that when you access the same logical resource from different URLs it could return a different byte sequence ("entity" in HTTP terminology) so that even a byte compare will not reveal that the URLs refer to the same _logical resource_. NO: This is more disturbing. It makes robust, scalable hypertext linking essentially impossible. Consider it from an RDF point of view. If I use RDF to attach a hundred properties to one URL and someone else uses it to attach a hundred properties to another one then our property groupings cannot be merged. This also affects XLink. If one group of externally imposed XLinks refers to the site under one name and another group refers to the site under another, then those groups cannot be merged to create a single view. The only solution, if we assume a one to one correspondence between URLs and objects is to have EVERY NON-CANONICAL name for the object explicitly do a redirect to the canonical name. This is not common practice on the Web and as long as URLs are human-typable it is not likely to become common practice. If you move an object from the bowels of your Website (a hundred character URL) closer to the "top" (a 20 char. URL ) you aren't going to use HTTP redirect to redirect people from the nice new name to the older, canonical name. But if you change the canonical name then anything current attached to the document through out-of-line links will break. --- Summary: I believe that the Web needs a concept of a canonical URL, if it doesn't already have one. Retrieving a document or the HEAD for the document should describe the canonical URL. I wouldn't mind if the canonical URL was a totally unreadable UUID as long as I can take two URLs and figure out whether they refer to two things that happen to have the same content or actually refer to the SAME THING. -- Paul Prescod - ISOGEN Consulting Engineer speaking for only himself http://itrc.uwaterloo.ca/~papresco Alabama's constitution is 100 years old, 300 pages long and has more than 600 amendments. Highlights include "Amendment 393: Amendment of Amendment No. 351", "Validation of Laws Regulating Court Costs in Randolph County", "Miscegenation laws", "Bingo Games in Russell County", "Suppression of dueling". - http://www.legislature.state.al.us/ALISHome.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format