how HTTP URIs and URI references work (or don't)
The use of HTTP URIs in a number of contexts is important to XML work in general, and the nature of HTTP URIs is important to particular aspects of XML processing, notably namespaces and RDDL, so it seems worth exploring how these things actually work. RFC 2616 defines the HTTP 1.1 protocol and also the http scheme for URLs: >3.2.2 http URL > >The "http" scheme is used to locate network resources via the HTTP >protocol. This section defines the scheme-specific syntax and >semantics for http URLs. > >http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]] > > If the port is empty or not given, port 80 is assumed. The semantics >are that the identified resource is located at the server listening >for TCP connections on that port of that host, and the Request-URI >for the resource is abs_path (section 5.1.2).... Although this defines the scheme in the now-unfashionable (so 1999) terminology of URLs, it both conforms to common expectations about what something that starts "http://" is for and defines what a resource does. A resource identified by a URI using the http scheme is not merely something that is (or isn't); instead, it is something "listening for TCP connections..." This notion of resource as listener makes it very easy to discuss HTTP resources in the abstract, without concern for what the listener might say in response. http://www.cnn.com is the web site for CNN, whatever the news of the day might be or the ownership of the station, http://dilbert.com is an eternal fount of truth, etc. There's a listener identified by those URIs, perhaps even a distributed listener, and it works quite nicely. This level of abstraction, however useful, is a far cry from using HTTP URIs to identify resources which are not in fact HTTP listeners, which seems to be a more recent trend since the publication of RFC 2616. Being able to discuss HTTP URIs as abstract identifiers for listening resources is very different from being able to use HTTP URIs as abstract identifiers for arbitrary subjects. Another set of related issues arises because many of the specifications that incorporate URIs don't incorporate just URIs themselves. Rather, they incorporate URI references, a more fully-featured toolkit that includes both relative addressing and fragment identifiers. Those features are both defined in a different specification, RFC 2396, which is not HTTP-specific. The appropriate use of relative addressing has been previously discussed as it applies to namespaces, and the conclusion reached seems pretty simple: use relative addressing only for information that needs to change depending on context, and don't use it as a shortcut for information that should remain stable. Having concluded that namespace identifiers should remain stable, the XML Plenary deprecated the use of relative URI references in namespace identifiers. Fragment identifiers are a very different set of problems. Although fragment identifiers (anything after a #, perhaps including nothing after a pound) are defined generally by RFC 2396, the interpretation of fragment identifiers is left to client processing and is dependent on the media type of the information returned by the resource to the client, as defined in Section 4.1: >When a URI reference is used to perform a retrieval action on the >identified resource, the optional fragment identifier, separated from >the URI by a crosshatch ("#") character, consists of additional >reference information to be interpreted by the user agent after the >retrieval action has been successfully completed.... > >The semantics of a fragment identifier is a property of the data >resulting from a retrieval action, regardless of the type of URI used >in the reference. Therefore, the format and interpretation of >fragment identifiers is dependent on the media type [RFC2046] of the >retrieval result. The character restrictions described in Section 2 >for URI also apply to the fragment in a URI-reference. Individual >media types may define additional restrictions or structure within the >fragment for specifying different types of "partial views" that can be >identified within that media type. > >A fragment identifier is only meaningful when a URI reference is >intended for retrieval and the result of that retrieval is a document >for which the identified fragment is consistently defined. URI references clearly demand a tighter coupling between the identifier and the type of the thing identified. With HTTP, is entirely possible and perhaps even more and more likely (thanks to XML-based kits like Cocoon and AxKit) that requests to the same URI will produce substantially different "data resulting from a retrieval result" depending on contexts which are not specified in the URI reference itself. (XHTML, for instance, has a lot of linking elements with separate type attributes for optional identification of the MIME Content-Type desired.) While it might be nice for multiple formats to have common fragment identifiers, the difficulties are fairly obvious once you examine the diversity of types the Web supports, from HTML to plain text to graphics to audio and video. To single out a particular (and very useful) case, SVG defines  the svgView() fragment identifier scheme, as in: MyDrawing.svg#svgView(viewBox(0,200,1000,1000)) The complications that have slowed progress on XPointer are worth consideration as well, as is the scheme-based approach the XPointer WG appears to have settled on, with its (I think necessary) options for diversity of implementation. The value of fragment identifiers in ordinary linking situations where the type of "data resulting from a retrieval result" is constrained through mechanisms beyond the URI reference itself is pretty obvious, I think. Pointing to particular locations within documents is frequent and useful, and a pointer system is necessary for effective use of out-of-line hypertext. The value of fragment identifiers in situations where the type of "data resulting from a retrieval result" is not constrained is far less clear. Namespaces in XML, for example, provides no information whatsoever beyond a URI reference. Many other uses of URI references similarly provide only the URI reference and no further context. As many of these specifications appear to have lost sight of the notion that, for example, an http-schemed URI reference involves a listening resource which returns a variety of types of data. While the use of URI reference syntax for string identifiers may seem acceptable to URI proponents who have long since abandoned a notion of resources as active beings participating in conversations, this use has little if anything to do with the practice defined for URI references generally and http URIs particularly by RFCs 2396 and 2616. It may be a stretch to describe URIs and URI references beginning with "http" as contracts which bring expectations for performance, but there are clearly both formal and informal descriptions of those expectations. Within those expectations, http URIs and URI references function very well. When pressed beyond those expectations into a world of arbitrary identification, http URIs and URI references create confusion rather than reduce it. For those of us in XML-land, this has a few implications: 1) It's not clear what namespaces containing fragment identifiers (even if they aren't http) are about; it may make more sense to use URIs, and if http URIs, put a RDDL document there whose fragment identifiers identify tools. 2) Pretending that the URI in a namespace identifier identifies the namespace rather than a listening (for http) resource is foolish; it may make more sense to redescribe namespaces in a context which offers namespaces-as-affiliation-with-a-URI than as namespaces-as-a-URI. 3) In other contexts where URI references are used, providing additional constraining information regarding the expected type of "data resulting from a retrieval result" should be provided either in the specification or explicitly in the document, as XHTML does with type attributes. This will help to ensure that fragment identifiers are interpreted in an appropriate context. XLink notably fails to do this, leaving content-type identification to further URI interpretation rather than MIME type identification. 4) If you provide an identifier which looks like it points to a listener which provides responses (like an http URI or URI reference), make sure there's actually a listener. That listener can then provide representations describing the affiliation between itself and your use of the identifier. 5) Seriously consider specifying URIs rather than URI references, even in contexts where 'just HTTP' is in use, unless you actually need and are prepared to deal with the additional features/consequences of URI reference usage. I'm not entirely sure why some people prefer Platonic Forms to the practices defined in the specifications, but the specifications seem to offer enough abstraction to be useful without the ever-expanding complications that appear as HTTP identifiers are separated from their foundations.  - http://www.ietf.org/rfc/rfc2616.txt (June 1999)  - http://www.ietf.org/rfc/rfc2396.txt (August 1998)  - http://lists.w3.org/Archives/Public/xml-uri/2000Sep/0083.html  - http://www.w3.org/TR/SVG/linking.html#SVGFragmentIdentifiers ------------- Simon St.Laurent - SSL is my TLA http://simonstl.com may be my URI http://monasticxml.org may be my ascetic URI urn:oid:188.8.131.52.4.1.6320 is another possibility altogether
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format