How to create URIs out of system ids
The XML 1.0 spec indicates that: [t]he SystemLiteral that follows the keyword SYSTEM [which] is called the entity's system identifier is a URI, which may be used to retrieve the entity."  The questions I have are less issues with the spec but more issues of practical implementation, hence this posting. I am trying to grapple with the following question(s): Given an SGML external identifier with a possibly omitted system identifer, how would an application most appropriately generate the system id part for a valid XML ExternalID. The basic scenario is that I'm starting with something that is not necessarily XML (perhaps it's SGML), and I'm trying to automate the process of producing XML, specifically in this case, valid XML ExternalIDs. Here's my cut of the issues. Consider the following cases all allowed by SGML: 1. <!ENTITY foo PUBLIC "public id"> 2. <!ENTITY foo PUBLIC "public id" "sysid"> 3. <!ENTITY foo SYSTEM "sysid"> 4. <!ENTITY foo SYSTEM ""> 5. <!ENTITY foo SYSTEM> In cases 1 and 5, we can assume that the application has some way to determine an implicit system id at least some of the time. Note that the relevant section in the XMP PR  goes on to say: Unless otherwise provided by information outside the scope of this specification..., relative URIs are relative to the location of the resource within which the entity declaration occurs. A system identifier in general could be: a. a file pathname relative to the location of the resource within which the entity declaration occurs; b. a file pathname relative to something else (e.g., the catalog in which the sysid was found as a result of the public id lookup); c. an absolute file pathname on the local computer's file system; d. a URL relative to the encapsulating entity; e. a URL relative to some other base URL somehow specified; f. an absolute URL; g. empty; h. something else (e.g., "this is garbage"). The basic question is what SystemLiteral to generate to create the most appropriate valid XML ExternalID in each case. Below I'm using the term "sysid" to refer to the system id as specified in the external id or in the catalog, and "URL" to refer to the SystemLiteral that will get put into the XML ExternalID. For a, the relative file pathname would get converted to the equivalent relative URL just by converting the syntax; on Unix and NT, this would consist just of escaping characters not allowed in URLs whereas on DOS-based machines and Macs, etc., it is also the case that the path separator character (\ or :, etc.) would get converted to /. Alternatively, the application could make the sysid absolute and then handle it as case c which would make the document more likely to work if it were moved elsewhere. Thoughts? For b, either the application could try to get fancy and translate the sysid that is relative to something else into one that is relative to the containing document and then handle as case a; otherwise, it could make the sysid absolute and then handle it as case c. For case d, there is nothing to do. Alternatively, it could make the URL absolute and then handle it as case f. For case e, either the application could try to get fancy and translate the URL that is relative to something else into one that is relative to the containing document and then handle as case d; otherwise, it could make the URL absolute and then handle it as case f. For case f, there is nothing to do. For g, the application could leave it empty since that is a valid URL, though probably not what's intended. Or it could write some URL such as "http://unknown.netloc/unknown.url". Any other ideas? For case h, the application could leave it alone and just pass on the "garbage" or it could handle it as case g. Thoughts? For c, I'm not sure what makes the most sense. Presumably, the application could try to get fancy and, if the referenced file is in fact accessible via some http-URL, make the conversion, but this seems tricky and questionable and certainly can't work in all cases. That leaves writing out the absolute file name as a file-scheme URL. (Am I missing some other alternative?) My reading of RFC1738 seems to indicate that, for a file path name of c:\pbg\webpages\pbghome.htm on my local machine, the file-scheme URL could be either: file://localhost/c:/pbg/webpages/pbghome.htm or file:///c:/pbg/webpages/pbghome.htm The latter works in NS3.0 and IE3.0 on my W95 machine (the former works in NS3.0 but not in MS3.0 per my experiments--I think I've heard from others that "localhost" does work now in IE4.0). So it sounds like what I'd do in case c is do the syntax conversion as in case a (e.g., \ to / and escape characters as necessary), then prepend "file:///" to the result. Is that reasonable? Another angle I've heard is that user-specified sysid's (cases 2-4 above) should be left untouched since that's what the user said and only sysid's that the application must intuit (cases 1 and 5) should be subject to any of the massaging I've discussed in a-h above. If you subscribe to "my gun, my bullet, my foot, my health insurance", then I suppose I can see that point. If you subscribe to "do what I mean, not what I say, I'd prefer you made my life smoother despite myself because all this technical stuff shouldn't be so hard to figure out in the first place", then I can see arguments for trying to turn all sysids that aren't already absolute URIs into absolute URIs for maximal portability. I'd be interested in hearing other's thoughts on this. paul  http://www.w3.org/TR/PR-xml#sec-external-ent  http://www.w3.org/TR/PR-xml Other sources include RFC1738 and RFC1808. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format