[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Encoded XML Content
The discussion has covered some good points up to now. I'll try to build on it, and move forward. Let's be clear about what we're trying to solve here. Unicode has essentially solved the text problem. This note focuses on non-textual data, or places where a different character encoding is required inside your document. For some applications, base64 will be be easy to use. Binary data will be present in particular locations in the XML tree, and the applications will simply know to decode it. These don't really need anything new, but will benefit if there is a common technique for handling it. I think the real target is 'container' elements, where the designer needs to allow for flexibility in content at runtime. It is possible to do part of this with elements, but you run into two difficulties. First, you eventually hit your non-text data, and you have to provide some indication of the content and format. Second, you may have a real need to allow for formats that have never been forseen. What we don't need to do is provide another mechanism for managing XML markup and structure. XML parsers will not be asked to do anything different. This is entirely about how developers will use XML's features to resolve an often-encountered problem. (That's why this still belongs on xml-dev.) That said, the moment you move away from Unicode data content, you face a number of issues. You will probably have to specify a wrapper layer used to make the data XML-friendly. If that is removed, then you will have to note what format or conventions apply to the next layer. Ultimately you will reach either a text layer or a binary data layer, which cannot be further unwrapped. That layer may need a descriptor, to specify what type of data was carried with all this effort. The question I still haven't completely resolved is - is there a need for allowing an arbitrary number of layers, or is three sufficient? That is the 'content encoding', 'content format', and 'content type'? I'm not certain it's sufficient, but I can't see a use for much more at the moment. (I'm not tightly attached to the labels, but I think they work, and at least they're a start.) The most likely implementations seem to be with these as attributes. Attributes that are not present would have a default of a zero-length string. Below, I've listed a number of items, in the interests of ensuring that any proposed solution can handle them all. (Ultimately, such a table would be useful to developers.) What Is It? Content Content Content Encoding Format Type -------------- -------- --------------- ----------------- JPEG image base64 mime:image/jpeg ASCII text base64 ISO-8859-1 mime:text/plain HTML text base64 ISO-8859-1 mime:text/html XML content XML carried xml: XML carried base64 ISO-10646-UCS-2 mime:text/xml XML data only xml:pcdata private data hex x-private:somedata private text base64 Commodore64 x-private:sometext embedded item base64 ISO-8859-1 rfc:822 embedded item base64 mime:application/x-zip I thought about separating content-type from the content-domain, but I can't see that you would specify them separately all that often. The above seems to support several required ideas: 1) Standard XML content requires no settings at all. This is the degenerate case, and it is good that it works this way. 2) Standard XML content could be structured using a DTD specified using namespace techniques. This appears to be an available option without changing any of the infrastructure around encoding. 3) It supports MIME types, but does not require them. Other domains can be used bsides MIME, including completely private or proprietary formats. 4) There is some consistency. Notice that whenever you specify a text type, you must provide a content-format. Otherwise, the text is the same as the surrounding XML. Whenever you specify any content-format that is different than the surrounding XML, you must use a content-encoding to restore XML friendliness. 4) So far, just about anything you can throw in there that has any current structure looks to be workable. An example element using these, called 'container' could be defined as shown below. <!ELEMENT container ANY> <!ATTLIST container content-encoding (base64|hex|none) "none" content-format CDATA content-type CDATA > I've limited the strings in content-encoding. Is this a good idea? There would be some structure applied to the content-format and content-type, but I don't think it would be effectively captured in the DTD. Comments aren't just welcome - they're essential! --------------------------------------------------------------------------- Chris Smith <smith@i...> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|