[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Binary Data in XML (a first pass?)
On Wed, 30 Sep 1998, Tim Bray wrote: > From: Tim Bray <tbray@t...> > Subject: Re: Binary Data in XML > > Suppose I wrote up a NOTE, should occupy less than one page, proposing > a reserved attribute xml:packed with, for the moment, only two > allowed values, "none" and "base64". The default value is "none". > If an element has xml:packed="base64" this means that > > (a) the content of the element to which this is attached must be > pure #PCDATA, no child elements and no references, and > (b) the content is encoded in base64, leading and trailing spaces allowed If I may be so bold -- this was addressed in the development of Open Trading Protocol, and we simmered it down to a concise form. Don Park and Gavin McKenzie helped sift out an standalone form, which then went *back* into the OTP group. The result is summarized below, minus most of the document surrounds. Use of this internally in OTP has allowed for encapsulation and a framework structure without a lot of development overhead. ---------------------------------------------------------------- XML Packaging Basic Goals It is suggested that you read the entire document, since there are some forward references in the Goals section that may only make sense after reading through the whole thing. 1) Inclusion of a variety of items This variety of items can potentially be defined dynamically by the groups/parties/systems involved. Some systems will be "static" implementations - not driven directly by the DTD, but using a parser and embedding the understanding of the DTD in the system itself. It is for this reason that parameter entities are not used. Some people will only develop their system from the DTD, not run their system using it. 2) Simplest possible inclusion of plain text items This means so simple that it should look like PCDATA. More to the point, XML can already handle the plain text case, so we should not have to step out to something else (MIME or otherwise) to handle plain text. 3) Easy inclusion of graphic or other binary entities. This is for the cases where most groups would agree what is desired (ie a GIF or JPEG), but XML does not allow for direct embedding. This is the target for the MIME:mimetype allowance. Data can be directly converted using standard BASE64 routines, and no generation or checking of headers needs to be done. 4) Leverage MIME power! This is the origin of the generalized MIME allowance. In particular, MIME:mimetype simply cant work with multipart types. 5) Allow for private customization. This is a somewhat contentious inclusion. It can be argued that private customization can already be achieved using the MIME application/x-private notation, so why duplicate that capability? However, there is a growing body of XML private customization, and it would be preferable not to have to go through MIME in order to get to it. The XML content provides a straightforward indication that the content, likely straight PCDATA (not transformed), is an embedded XML document, perhaps XML/EDI. I also think this is wise, since those doing there work in XML may not provide a standardized private MIME label for their work. For exactly the same reasons, the general MIME availability should be kept as well. If a group has a reasonably standardized MIME label for a private custom format, then we need the full MIME capability to support it. Finally, the x-ddd:usercode version has already proven useful where different parts of a system may communicate using this mechanism, because its easier than trying to communicate through a (non-existent!) private channel. 6) To be used in place of ANY ANY content is understandably difficult to parse. There may or may not be guidelines to help you. It is preferable to match extensibility with a little more structure. DTD For Package This then leads to a very compact DTD item (more definitions below). <!ELEMENT Package (#PCDATA)> <!ATTLIST Package content CDATA "PCDATA" transform (NONE|BASE64) "NONE" > Note that any special details, especially custom attributes, must be represented at a higher level. For example: <!ELEMENT SpecializedData (Package)> <!ATTLIST SpecializedData ID ID #REQUIRED CustomerId CDATA #IMPLIED PaymentId CDATA #IMPLIED SoftwareId CDATA #IMPLIED > Detailed interpretations of the attributes follow: Attribute: content The content attribute defaults the the value "PCDATA", to imply that the content consists only of legal PCDATA characters for XML. When used in this manner, the content of the Package element effectively substitutes for a simple #PCDATA content in the parent element. Attribute value for "content": PCDATA The content of Package can be treated as PCDATA with no further processing. Attribute value for "content": MIME The content of Package is a complete MIME item. Processing should include looking for MIME headers inside the Package content. Attribute value for "content": MIME:mimetype The content of Package is MIME content, with the following headers implied: Content-Type: mimetype Although it is possible to have MIME:mimetype with transform="NONE", it is far more likely to have transform="BASE64". Note that if transform="NONE" is used, then the entire content must still conform to PCDATA. Some characters will need to be encoded either as the XML default entities, or as numeric character entities. Attribute value for "content": XML The content of Package can be treated as an XML document. This document may include an XML declaration, and it may refer to a different DTD than that of the enclosing document. Character entities and CDATA sections, or transform="BASE64", must be used to ensure that the Package contents are legitimate PCDATA. Enclosing a raw XML document will cause parsing errors while attempting to parse the enclosing document. The well-formedness or validity of the document inside the Package has no effect on the parsing of the enclosing document. Obviously, a non-well-formed or invalid inclusion may still cause errors within an application. However, for some reasons, such as user support, there are legitimate reasons to enclose XML documents that are not well-formed. Attribute value for "content": x-ddd:usercode The content is private, where ddd represents a domain name of a user, and usercode represents a particular content format defined by that user. The guidelines around a x-ddd are very loose. Given company FFGGHH Inc, all of x-www.ffgghh.com, x-ffgghh.com and x-ffgghh are legitmate examples. However, only one should be the correct format, as defined by FFGGHH Inc. The usercode mechanism is intended to reduce the possibility of content attribute collisions, not to provide a mechanism that can eliminate them entirely. Attribute: transform Attribute value for "transform": NONE The PCDATA content of Package is the correct representation of the data. Note that entity expansion must occur first (ie replacement of & and 	) before the data is examined. CDATA sections may legimately occur in a Package marked transform="NONE". Attribute value for "transform": BASE64 The PCDATA content of Package represents a BASE64 encoding of the actual content. Although entity expansion must occur before decoding of the Base 64 stream, it is not expected that this will happen under normal circumstances. ...Chris Smith ...Don Park ...Gavin McKenzie --------------------------------------------------------------------------- Chris Smith <smith@i...> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|