[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX and delayed entity loading
I suspect this is going to be long-winded, mostly because I'm replying to Eliot. So a summary: MIME types or notations eventually come down to a magic word of some sort. MIME types work now; notations don't yet. I want a notation to make it *go*, not tell me what to read to make something to make it go. Since we're not all using the same programming language, magic cookies are the only workable solution; MIME combines this with robust, well-defined fallback behavior. [W. Eliot Kimber] > Unless I've misunderstood something, a MIME type is still an > indirection to the definition of that MIME type. In the epistemological sense, yes. In the real world, though, it's a key into a hash of handlers in your software. And since the list of keys is well-known, the success rate of using a given key is pretty good. > I.e., "text/xml" is a pointer to the RFC that establishes that MIME > type. But then a problem is: where do I got to figure out what RFC a > given MIME type maps to? You go to the IANA, the designated authority for MIME. When you get a public notation identifier, where do you go? Oops... there was a ten-year delay in establishing a registry, and now the one registrar has about eight owners registered, of which one (the ISO) already had a formal reference mechanism. So you can get a catalog that resolves the public identifier to... what? A DLL? A Java class? That's portable. The indirection keeps getting pushed farther down the line. MIME says, "This is the list of things. Here's what they are. If you implement this thing, this is the name to look for." It's a magic cookie system, but it's a robust one that works very, very well. > What if the MIME type is an "x-*" MIME type, what do I do then? > Note that the external ID for a notation could, in theory be a MIME > type: > > <!NOTATION xml SYSTEM "urn:mime:text/xml" > Early in XML's development, there was talk up at the formalism level that system identifiers were HyTime FSIs, but only <url> FSIs were allowed, and were the default, so the <url> tag was omitted. This left room for expansion into other system identifier types. I argued that the default for notation system identifiers should be treated similarly, but the default should be <mimetype>, allowing <!NOTATION xml SYSTEM "text/xml"> > The short answer is that they are a highly general way to associate > data objects with the definition of the rules that governs the > interpretation of that data object. I like the phrase "highly general way". HyTime is a highly general way to associate any piece of information in any format anywhere in the world with any other piece of information in any format anywhere in the world. And about six people actually understand it, and three of them have been institutionalized from the shock. (HyTime II is much better, thank you, Eliot.) In other words, it's so general that it's useless. It can be made useful with certain user conventions, like that the public identifier is treated as a magic string that is just known, or that the system identifier is a piece of software usable within a closed system. But otherwise, it's like SGML without a stylesheet standard, or public identifiers without a resolution mechanism. > I think that the Web and Windows have established an unreasonable > expectation that software will "just know" how to deal with things. > Unfortunately, you can't always rely on registered MIME types and > magic numbers. I don't think the expectation is at all unreasonable. The software *does* "just know" nearly all of the time. The MIME specification was developed in the Internet, where what works, wins. The most important aspect of MIME is its hierarchicality(?): types, sub-types, sub-sub-types, ad infinitum. Like with RFC 1738 language specifications, you can make a reasonable guess about an entity even if you don't recognize the whole MIME type. An old browser, confronted with text/xml, will say, "Oh... I don't know about xml, but I can do text. Here ya go." And the user will see markup, but it'll be sensible. > Perhaps part of the problem is that in the Web world we have tended > to remove the need for such a generalized mechanism by hard-coding > knowledge of the semantics of everything? But you can't do that > forever, and MIME only seems to make the problem worse by requiring > that all interchangable types be registered before they can be > used. Unlike notations, which will work by magic without telling anyone what they are. > Notations don't require that because the external ID of a > notation can be anything (including MIME types or their RFC > documents). Yay... so all my software has to handle is... anything. Fun. I love abstract theory. But in the end, it comes down to software *doing something* with what it gets. A function whose range is unbounded across the set of the universe is not a useful function. A notation for planets whose system identifier is a bibref to Magrathea's operating manual is not of very much use to me. At least with application/planet I know that it's an application, but not one I handle, and can ask the user for suggestions. Before anyone misconstrues my position (too late!) I don't have anything at all against the ISO process. (I only say this because I know there are some people who *seem* to hold a grudge against the ISO or the W3C or both.) I love the abstraction of SGML and HyTime. But notation identifiers have always seemed to me to be a bizarre bit of Pollyannaism, and the constant use of system identifiers in examples blew my mind the first time I read the standard. For all that SGML is of great utility for open systems, it shows definite signs of having grown up in a pre-Internet world where openness and portability were much smaller words. > But I do mind: if I see "x-whatever/whatever", how do I know where > to look, as a programmer or document recipient, to understand what > the rules for that MIME type are? And after you've looked, then what? Designing a system that tells programmers where to go to implement a processor for a new notation is bizarre. Most users are not programmers, and the idea that a notation would point to a formal spec would shatter their heads. They want a notation to point to something that will do it for them. In the absence of a One World Programming Language (pipe down, Python heads), a hierarchical magic cookie system works best. > If someone gives you a document with a useless external ID for a > notation, that's a problem between you and the author of that > document and no mechanism can fix that problem. And the difference between this and MIME is... that with a well- defined registry of notation types, and hierarchical fall-back system, a useless external ID is far less likely. > But it's not just about *viewing*, it's about processing of all > sorts. Pulling down a plug-in for viewing a particular kind of data > is only one small application of notations. If you are only > thinking about the problem in terms of viewing things on the Web, > then you are missing the point. Take viewing as one form of possible processing, used here as an example. The problem is one of finding the processor, and MIME types are equally good at finding the processor for analyzing as they are for viewing. [Simon St.Laurent] > I still argue that notations are a waste of time based on the > misguided notion that information about dependencies (of whatever > type) actually belongs in the document. Only "document" in the sense that the document as an informational unit necessarily includes the description of its type. Notations (whether MIME or otherwise) are associated with the type of a document; like common entities, common notations should be defined in common files. > Let the dependent pieces be self-describing (MIME or something > better), Now, I think you may be falling into the trap Eliot describes. MIME entities aren't self-describing; they're wrapped in headers that describe them. And someone still has to understand that description; it's just that the MIME implementations are more robust and flexible, with better fall-back behavior, then SGML notations. [W. Eliot Kimber] > You're right, the URL for the XML spec is not sufficient (but > neither is "<?xml?>", since you need to know what thing defines what > that magic number means). The URL for the XML spec is *equally* useful to a processor as is "<?xml?>", "text/xml", and "foobie bletch". Assuming of course, that each string is used in a system designed to understand it. The namespec specification uses uses URIs, but they are essentially magic cookies. When used with a stylesheets, the URIs are compared. When fed to a processor (like XSL's xsl: and fo: namespaces), the processor is expected to either recognize the URI on sight, or else isn't a processor for that data type. This is exactly like MIME without fallback. > In thinking about it, I think the only thing that will be reliable is to > depend on a non-electronic, human-primary, long-term repository like the > Library of Congress. > > Thus, the declaration for XML as a notation should be something like: > > <!NOTATION somelocalname > PUBLIC "+//IDN loc.us.gov//NOTATION TZ 1234:W3C eXtensible Markup Language > (XML) Recommendation 1.0//EN" > > For cyberarchæology, that works well. You can find the spec and re-implement a processor. But as a user, I want to use the data I got, not write a bloody parser for it myself. And (once again) since we don't have universally portable software, you can't give me a pointer to a chunk of code. It's got to be a well-recognized name, and MIME provides this better than any other system. Whew. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|