[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Identifying XML Document Types (was XML media types revisited)
We've had a good deal of give and take over various ways for naming and otherwise identifying elements and documents over the last few days, and I'd like to summarize a lot of issues that have arisen (for me at least) from the discussion. I'm concerned that XML is a significant break from 'the old way of doing things', which, crummy as it was, had certain advantages of familiarity. Proprietary documents came with their own identifiers and their own rules for doing things, and I don't think anyone expected to open Word documents in a statistical program and get meaningful results. With XML the expectations (for being able to process documents with both specific and generic tools) are much higher, yet the tools for identifying document types are actually weaker in many ways. I'll list most of the tools for identifying document types here and their potential strengths and weaknesses. I'm hoping I'm wrong about some of these, but I'm also hoping I'm wrong in ways that can make users lives simpler, not ways that just have workarounds requiring users to trek 50 miles through mountains while wearing a straitjacket and ball-and-chain. 1) Filename extensions - The classic for the PC world, used to some extent in Unix, and typically sneered at by the Macintosh community. Advantages: Can be created on a whim. Easily connected to other systems, like MIME identifiers, when used in a supportive (HTTP) environment. Disadvantages: No central registry, so conflicts abound. Typically limited to three characters by old DOS rules, though longer extensions are becoming a bit more common. Makes it difficult to use periods in file names. Doesn't fit well with 'smarter' file systems that store document type and application information separately from the name of the document. Recurring Question: Why using .xml isn't enough to identify XML documents precisely to applications. (Recurring answer: because not all applications should work with every XML document fed them, using finer-grained identification is a good idea.) ---------------------------------------------------------------- 2) MIME types - The classic Internet standard, used by a variety of Internet applications and becoming more widespread in other systems. Advantages: IANA provides central registry, with mechanism (x-) for unregistered types. Can be made into public identifiers and notations fairly easily. Disadvantages: Like the .xml file extension, application/xml and text/xml provide no information about the _type_ of XML document inside the file they roughly describe, leaving applications to determine whether or not the information is actually meaningful. Recurring Question: Why using application/xml or text/xml isn't enough to identify XML documents precisely to applications. (Recurring answer: because not all applications should work with every XML document fed them, using finer-grained identification is a good idea.) ---------------------------------------------------------------- 3) DOCTYPE declarations - The de facto SGML standard, about the only thing that provides a description of the contents of a document. Advantages: Public Identifier vocabulary suitably rich to avoid most naming conflicts without required use of central repository. Disadvantages: Only reliable in validating environments when public identifiers are actually used, which isn't very often. SYSTEM pointers seem much more typical. Even when public identifiers are present, many declarations can be added or overridden in the internal subset, muddying the waters for applications that need a particular structure. Validation process doesn't make clear if this has happened. ANY opens black holes. Recurring Questions: Where do I buy a public identifier? Can I use a public identifier for documents that are only well-formed? (Recurring answer: pretty much no on both counts.) ---------------------------------------------------------------- 4) Root elements using Namespaces - A new possibility that gained some prominence with the accession to W3C Recommendation of 'Namespaces in XML'. Advantages: Namespaces ensure unique element names, making it less likely that you have someone else's DOCUMENT element. Disadvantages: Just because the root element is X doesn't mean its contents are Y. Especially given the problems of validating documents in namespace-aware environments, namespaces may not always be available. Half the XML community regards Namespaces as the worst thing since the plague. Because namespaces aren't supposed to point to anything, you can't sneak a DTD in at the URL identified by the namespace. Recurring Question: So how do I make this work reliably in a validating environment? (Recurring answer: Ask again next year, please.) Perhaps I'm being a little too hard, but none of these solutions seem viable. If all we were talking about was generic documents with style sheets, it might not matter so much, but unfortunately, we're not. Lots of XML standards are under development where putting the square document in the round processor is not a good idea. It seems wise to provide a generic mechanism to keep the square documents created with our generic tools from the round processor. Or maybe that's too much. I guess we'll see. Simon St.Laurent XML: A Primer / Cookies Sharing Bandwidth Building XML Applications (March) http://www.simonstl.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|