[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_Unicode_is_universal=2C_so_how_come_that

  • From: David Carlisle <d.p.carlisle@gmail.com>
  • To: Roger L Costello <costello@mitre.org>
  • Date: Wed, 16 Dec 2020 14:25:23 +0000

=?UTF-8?Q?Re=3A_=5Bxml=2Ddev=5D_Unicode_is_universal=2C_so_how_come_that


On Wed, 16 Dec 2020 at 13:50, Roger L Costello <costello@mitre.org> wrote:

Hi Folks,

Unicode make it possible to write things in different languages.

For example, rather than this XML:

 

a Bengali-speaking person can write this:

<সংখ্যা_ছাত্র>42</সংখ্যা_ছাত্র>

Or, in a programming language, rather than this assignment statement:

              Number_Students = 42

a Bengali-speaking person can write this:

              সংখ্যা_ছাত্র = 42

That’s awesome.

But, but, but, … how come that universality doesn’t extend to digits?

How come we can only use these digits: 0 (hex 30), 1 (hex 31), …, 9 (hex 39)?

Why, for example, can’t a Bengali-speaking person use the Bengali digits: Bengali digit 0 (U+09E6), Bengali digit 1 (U+09E7), …, Bengali digit 9 (U+09EF)?

Why, for example, can’t a Bengali-speaking person create XML such as this:

<সংখ্যা_ছাত্র></সংখ্যা_ছাত্র>

or write a program assignment statement like this:

              সংখ্যা_ছাত্র =

Let me explain why I assert that the Bengali-speaking person “cannot” do that.

Numbers in an XML document or in a program are just strings and, to perform arithmetic operations on them, those string numbers must be converted to actual numbers. I looked at the source code for the C function (strtol) that converts strings to numbers and here is the key to how it converts a character digit to a number digit:

              digit_number = digit_character - '0’

Yikes!

That generates a number digit by treating the character digit as a number and subtracting the number corresponding to the character ‘0’. For example, if the character digit is ‘4’ (hex 34) then when we subtract ‘0’ (hex 30) we get the number 4. Perfect! But ……… only if we allow European digits (0, 1, …, 9). Clearly, if we were to subtract ‘0’ (hex 30) from the Bengali digit 4 we do not get the number 4.

Thus I conclude:

  • When expressing numbers, the only digits that can be used are the European digits
  • Unicode is universal, but that universality does not apply to digits or numbers

I don't see that the conclusion follows from the earlier comments. Clearly you can convert various number string forms to  numbers, just because strtol doesn't do that doesn't allow a universal conclusion that it is not possible.

It is a choice in the language design what syntax to allow for digits, just as in keywords.  in xslt for example, you have to use for-each  as the element name, you can't use a Bengali (or French) name; it just is what it is.


You say that that you could write this  <সংখ্যা_ছাত্র>42</সংখ্যা_ছাত্র> but if the input is following some standard schema for student records the chances are that isn't valid and it has to be <Number_Students>42</Number_Students>   You could write a filter to take the localised form, mapping the element names, but any such filter could mp the digits at the same time

You could allow multiple language keyword interfaces and number syntaxes, but in most programing languages (for good or bad) the choice is not to do that at the lower level programming end of things and only offer localised interfaces at a higher level
So a spreadsheet might accept localised digits, and localised date forms "Wednesday 16th December 2020" but at some lower level the date might need to be in  standard  form 2020-12-16 or whatever.

David


Obviously I am not understanding something correctly. Please help me to understand.

/Roger



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.