[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: A dandy little technique for constraining your strings to

  • From: Michael Kay <mike@saxonica.com>
  • To: John Cowan <johnwcowan@gmail.com>
  • Date: Wed, 21 Oct 2015 23:11:17 +0100

Re:  A dandy little technique for constraining your strings to
Restrictions in a schema are often there because we know that the IT system we are sending data to is restricted in what it can handle, and we want to prevent stuff reaching that IT system if we know it can’t handle it. Very often we don’t have the ability to change that IT system. We would love, for example, to allow non-ASCII characters in email addresses, but the internet can’t cope with them and we don’t have the ability to fix the internet. 

I made yet another attempt to use non-ASCII characters in the design of an XQuery extension recently. The WG chose to define the syntax using only ASCII characters instead. All kinds of reasons: difficulties entering the characters on a keyboard, difficulty making sure the characters aren’t corrupted in transmission, etc. The fact is, use of non-ASCII characters still creates hassle. The 20% is almost certainly an underestimate. Building IT components that handle Unicode strings is dead easy; debugging system problems when messages between the different IT components get mangled can often be a nightmare, and a lot of the pain falls not on IT developers but on end-users who have to cope with inadequate data entry tools and mis-displayed output.

Michael Kay
Saxonica

On 21 Oct 2015, at 18:30, John Cowan <johnwcowan@gmail.com> wrote:


On Wed, Oct 21, 2015 at 1:07 PM, Costello, Roger L. <costello@m...> wrote:

You want each string constrained to just ASCII characters.

You may want that, but it's a very bad idea.  As Tim Bray said years ago, the cost of internationalization is maybe 20% extra if you build it in from the beginning, whereas it's about 100% if you try to retrofit it.  Restricting your data to ASCII or any other set less than Unicode is a Bad Thing from day one.  Instead of restricting your data to fit your obsolete processing model, upgrade your processing model to reflect the realities of textual data in the real world.


--
GMail doesn't have rotating .sigs, but you can see mine at http://www.ccil.org/~cowan/signatures



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.