[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Processing XML 1.1 documents with XML Schema 1.0 processor


text dammit
Eric van der Vlist wrote:
> On ven, 2005-05-13 at 11:52 +0100, Michael Kay wrote:
>>With all these things, I think one has to ask what is the approach that
>>causes the least amount of pain to the average user. Asking everyone to
>>change a namespace URI so that a few users can identify clearly whether or
>>not their patterns are intended to match Ethiopian letters isn't a net win
> 
> Only those whose pattern are intended to match Ethiopian letters would
> have to change the namespace URIs and that should reduce the number of
> such users by several orders of magnitude !

I beg to differ Eric, when I use a string or a sequence of name 
characters I want it to be just a damn string and the last thing I want 
to think about is whether it will be usable in Ethiopian, Myanmar, 
Khmer, or Mongolian. I don't want the users of my 
specification/schema/tool to have to figure out for themselves (or to 
ask me) whether they can use the Katakana middle dot in Japanese element 
names or not. A string, a name character, a white space character within 
an electronic document MUST be recognized as such according to the 
current state of the art. It MUST be able to be whatever the latest 
version of Unicode says it is.

Of all people *we* should know that the encoding of text on a global 
scale is not a static science, it evolves and needs to evolve as Unicode 
improves. Yes this implies a phase during which XML processors may lose 
some interoperability, but whoever puts XML interoperability above human 
language operability needs to have their priorities seriously revised. 
Yes this may break software that is making stupid assumptions about the 
content of certain tokens, but such software was written based on a 
misunderstanding of text and deserves to break (and then to be shot in 
the kneecaps, tied to a horse and dragged all around town, dipped in 
boiling lead, dismembered piece by piece with a rusty spoon, and finally 
dumped in a ditch to agonize).

XML is about text dammit, and text is meant to encode something very 
much alive called languages. It will change and it will move, under the 
effect of both language evolution and of the progress made by the 
Unicode Consortium in encoding more and more of it -- a task of 
gargantuan proportion comparable to the attempts at mathesis that all 
had given up on.

Anyone expecting it to be different is still living in a legacy US-ASCII 
world that just happens to have a larger set of characters.

How can XML be the universal data format without the ability to handle 
universal text? Heck, it's SGML for the *WORLD WIDE* Web we're talking 
about, not a falsely ubiquitous data interchange format for big American 
companies.

-- 
Robin Berjon
   Research Scientist
   Expway, http://expway.com/

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.