Re: Processing XML 1.1 documents with XML Schema 1.0 processor
(I smell a troll.) Henri Sivonen wrote: > BTW, is there any actual research about the demand for non-ASCII > element names? XML 1.0 allows a large chunk of non-ASCII on element > names. Is any real-world XML vocabulary actually exercising the > freedom to go beyond ASCII in element and attribute names (except > perhaps some vocabulary that is only used in Japan)? What the **** does that question mean? That element names only used in one country should not be supported in a standard designed to suit the whole world? It is a simple fact that ASCII transliterations of many languages, in particular those with tonal pronunciation, homophones and idoegraphic scripts, can frequently be incomprehensible. (Add to this that there are regional concepts (e.g. in addresses) for which there may be no English analog.) The most direct way of putting the question is "Why should W3C put out a standard that arbitrarily makes things easier for white people than for yellow people?" A space can easily be replaced by a "_": what should the ideograph for a mountain be replaced by: the sound, the meaning, a translation? How does a reader reconstruct the ideograph? XML's name rules are important precisely because they don't adopt the bogus minimalist approach. I am not saying that anyone who wants ASCII-only markup is a greedy, lazy, selfish, unjust, uncaring, clock-back-turning, unpragmatic racist or Western supremicist; on the contrary, there are lots of reasons why an organization or individual *should* use ASCII for Western and international document types. But ISO standards like SGML must support International requirements, and W3C profiles like XML must support world-wide adoption. A less inflammatory response is that the importance of names in markup is not that they are easy to write, but that they are meaningful to read. The better analogy to make isn't the inconvience of making you write ASCII, but the inconvenience if you had to write using, say, Greek characters. You probably could do it, but it would add a layer of inconvenience that would probably make you avoid using the technology where you had a choice. <boring_old_geezer_mode>I designed the original naming rules that XML 1.0 adopted pretty much intact. In 1994-5 or so, I had been given a project by Allette Systems in Australia to figure out why adoption of SGML was slow in East Asian countries. During this time I visited several Asian countries (I had learned SGML while living in Japan working in publishing) and made contacts with many people in publishing. I made up something called the ERCS (Extended Reference Concrete Syntax), with input from many people: Gavin Nicol, Tony Graham and James Clark are three Western names who have posted to XML-DEV, for example. Included on this list were some features that were adopted by XML (that Unicode characters should be available regardless of the document format using hex references, for example) in particular, to support "native language markup". (Note: not "natural language markup") ERCS was adopted by the SPREAD (Standardization Project Regarding East Asian Documents) of the CJK DOCP (China/Japan/Korea Document Processing Experts Group) which was a liason group between industry, academia and standards bodies. That gave ERCS the credibility so it was already a pretty workable package by the time XML came along. (The SPREAD entities occassionally crop up but are obsoleted by XML: the W3C Charmod spec mentions them though, which is nice.) More info, if anyone cares, is at http://xml.ascc.net/en/utf-8/ercsretro.html Here is what the ERCS document, now 10 years old, says about Native Language Markup: "Much of the value of using SGML markup, especially for structure-based searches in hypertext, is that the tag names and other markup can have meaning to the user rather than being cryptic mnemonics. This is most true for SGML documents that contain fielded data. So the provision of native-language tagging is a key facility that SGML will need to supply to be successful. "So the best concrete syntax for a given character set is one that does not artificially or gratuitously restrict what characters are available for use as markup. In the absence of other factors, if a character appears in words in the native language, it should be available for use in NAMEs. And similarly, if a symbol character is readily available from the keyboard, it should be available for use in short references. In particular, it is important to recognize that NAMEs in XML/SGML are not just used for element and attribute names, but also for IDs. An ID is often taken from the value of content, which usually will be some native language. If you have looked at C programes written by Chinese or Japanese, for example, you will see that people like using their native language (and script) for writing the names of things. So the XML 1.0 naming rules are the end of a chain that began by looking at how to solve an actual problem with the acceptability of ASCII-only markup. A solution was worked in consultation with people from East Asia and the ASCII West. It was adopted by standards bodies, and influenced SGML, HTML and most directly XML. XML's enormous popularity has often been attributed (notably by Tim Bray) in large part to its good bottom-line for internationalization. (XML 1.1 removes the checks on particular characters, but in the direction of more openness, not more restrictiveness, which is good from this perspective at least. But it is interesting to see someone saying the world is not ready for names above the first seven bits of Unicode, while XML 1.1 had discussion about whether the world was ready for names above the first 16 bits of Unicode :-) Murata Makoto is speaking at XTech 2005 in a fortnight's time on the Japanese Goverment's adoption of XML. It will be interesting to see to what extent they use ASCII. Cheers Rick Jelliffe
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format