[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: UTF-8+names


utf 8 heart


James Clark wrote:
> 
> But with +names you don't want to work at the encoding level. 
>  For example, if you have a ü in your text file, that will be 
> two bytes in
> UTF-8+names, but you would want to work with it as a single character.
> To edit a UTF-8+names text file, you need to make your text 
> editor treat it as if it were encoded in UTF-8. In other 
> words, to make things work you have to edit it in the wrong 
> encoding.  This will be extremely confusing to users.


This is precisely what I meant when I wrote:

---------------------------
What you and Tim are proposing is to define additional bit patterns for certain Unicode characters, which, when re-interpreted as (sequences of) UTF-8 bit patterns, look like XML entity references.

Therefore at the very heart of your proposal is a re-interpretation trick of bit patterns between UTF-8 on one side and UTF-8+names on the other side.

Indeed, if one uses UTF-8+names just as an encoding of Unicode (with no re-interpretation trick), no human user will ever see those     things.  All that humans will see is some displayable form of the  NON-BREAK SPACE  character, which happened to be encoded as  0x26 0x6E 0x62 0x73 0x70 0x3B  rather than as  0xNN1 0xNN2 (the two bit patterns being equivalent).  

In other words, if the UTF-8+names encoding is used to go from Unicode code points to bit patterns and vice versa (which is how an encoding is supposed to be used), the whole point of defining human-readable alternatives is defeated.  For the human-readable alternatives to be useful, you need to resort to reinterpretation of this encoding as if it were a different encoding.
---------------------------


> 1. General publishing. This community wants the HTML entity 
> sets.  I think the problem here is a software/education 
> problem which is decreasing all the time.  Almost all modern 
> systems have fonts that can display almost all the characters 
> in these entity sets. The desktop environments that I'm 
> familiar with all offer a character map applet which is 
> sufficient (albeit not very efficient) for entry of 
> characters which you have fonts. The quality of Unicode 
> support offered by standard text editors is improving all the time.
> 
> CJK users have long dealt with the problem of how to enter 
> characters for which their keyboard has no key. CJK software 
> typically provides "input methods" to allow efficient, 
> user-friendly entry of such characters. This sort of 
> technology should be applied for entering Unicode characters. 
>  Input methods can easily leverage the standard Unicode 
> names, rather than having to invent and maintain a competing 
> set of shorter names.


This is what I meant when I said that the whole issue should probably be addressed at the software level, rather than by introducing a new encoding.

We seem to be in agreement on these two basic points.

Alessandro



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.