[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Plug and Play XML
> From: Peter Murray-Rust <peter@u...> > What will differentiate a text/xml document from an application/xml one? When is each appropriate? I think the idea is to use text/xml in the normal case, and application/xml as a fallback. I think I first suggested it, but it certainly was not my preferred option: I would prefer everything to be application/xml, because I do not like the idea of dumb HTTP/MIME systems fiddling and transcoding data, which they may do for text/xml. Application/xml is a binary transmission; no bits are molested en route. The trouble with text/xml is that XML positively encourages the use of all ISO 10646 characters, for example all the symbol and publishing characters. If the data is "transcoded" enroute from a large character set encoding (e.g. Unicode or an East Asian one) to a small encoding (e.g. 8859-n) then a dumb transcoder will not translate a non-encoding- repertoire character into its numeric character reference, but probably swallow it, or put out something strange. In practise this means that all characters above 127 should be encoded using numeric character references rather than directly by XML document generators. Smart intermediate XML systems should also attempt to replace characters in data and attributes with numeric character references. When you are devising your own PI notations, and comment conventions you should also duplicate numeric character references. The unpleasant implication in all this is for native language markup. If your XML data will be sent to users who use other scripts, do not use characters in XML names that are not available in their regional character sets. Numeric character references do not apply, currently, to names. (I hope this will eventually be changed in SGML and XML, but I think the facts and the effected users will eventually speak for themselves in due time.) This is why you should be conservative in your choice of name characters. The < 127 characters are OK. The 128-255 range of characters in 8859-1 and ISO 10646 are probably pretty safe too. This problem even effects within nations, if the nation has a few different repertoires in common use: in particular in Japan Unix systems using EUC have available several thousand more kanji than older PC (i.e. shift-JIS) and macintosh systems: it is probably prudent for Japanese users to only use those characters available in shift-JIS for naming. None of these considerations were new for the XML discussion: what was new was that XML works with a particular operating model that says that documents must cope with HTTP/MIME systems but also must provide enough information to create the MIME headers in the first place. The restriction that numeric character references cannot be used in markup, just in data and attribute values, comes from the old character model of SGML. In this model, it made no sense to allow numeric character references in names, and indeed would be considered bad, because it created markup that could not be read in a simple editor. XML is probably one of the most thoroughly internationized software systems around: in particular, this internationalization has been in place and under discussion from the very beginning, and not "tacked on". Internationalization (I18n) is one area of XML that must cause difficulties for parser writers to get right. But the benefit is that once they have it right, it makes life much simpler and richer for users. Which is not to say that XML i18n is perfect, but it is certainly near state-of-the-art, given the need to fit in with HTTP/MIME and operating systems. I certainly hope that XML will not remain "state-of-the-art" for long, and that advances in various technologies--in particular, for operating system vendors to agree on a charset/encoding labelling schema that they all implement in their OS (or the adoption of MIME as a file format, e.g. .MIM)-- will overtake it. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|