[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Lexical vs value spaces (re: Binary content and allowed characters in XM
Hi, Nicolas LEHUEN wrote: > I don't think readability alone is a sufficient reason to forbid binary > content from appearing in an XML document. I agree: there is a much better and simpler reason to forbid binary content from XML documents ;=) ... > > What defines the set of allowed characters in XML content ? Is it technical > reasons, or readability reasons ? IMO, none of them, but rather a fundamental design decision: a XML entity is a Unicode text (eventually using another encoding) and not a stream of bytes. This should be a sufficient reason to close the debate IMO! The problem with including arbitrary binary content would not so much be the "control characters", but the fact that the physical value of this content read as bytes would change depending on the encoding used for the document (what if I save it as utf-16 while it has been created as utf-8). We are using a layered model where XML is built on Unicode and that would be a short-circuit of the lower level... That being said, this doesn't seem to be a problem to use XML as a serialization format for integers, float or dates, why should it be for binary data? The trick is just to realize that, to take a notion which I find very useful in W3C XML Schema, there is a decoupling between lexical and value spaces and to define the best lexical space for the binary content you want to serialize. For arbitrary binary data, hex or base64 seem to be obvious choices but for data which is "almost text" with special "things" embedded, other solutions can be found. One of them is to serialize the "things" found in the text as elements (and you have then a mixed content), the other is to define a specific lexical space for them (like "=00" or whatever). Which one you want to use comes back to the debate of using structured values in elements or attributes. I think that it's important to realize that the cases where the lexical and value spaces are identical are fairly uncommon (except in the "document" world) and that for a vast majority of datatypes a coding needs to be performed and these spaces are different. BTW, when you think about it, this decoupling goes beyond XML world... In Europe, the Euro has already been there for a couple of years and what will happen in 12 days is just an harmonization of the many lexical spaces to cut the processing costs ;=) ... Eric-- Rendez-vous a Paris pour les Electronic Business Days 2002. http://www.edifrance.org/ebd/index.htm ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com http://xsltunit.org http://4xt.org http://examplotron.org ------------------------------------------------------------------------
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|