[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: UTF-8+names


bit pattern unicode


John Cowan wrote:
> 
> 
> Mike Champion scripsit:
> 
> > Sure! The question is how to do something to make our
> > lives less unpleasant while The System plots forward.
> > Be patient, vote with our feet against crappy software
> > that can't handle Unicode decently, or try to hack up
> > something in the interim?  The whole point of Unicode
> > encodings is to map conveniently enterable text onto
> > codepoints, and whatever the technical virtues or
> > flaws of Tim's strawman proposal, this seems like the
> > right layer to address it.
> 
> Character naming isn't just a hack for 8-bit users; it's
> just as practical for someone using Unicode directly.
> The human issue of referencing characters over a huge
> codespace is just as great whatever the underlying encoding.


Sorry, I am still unconvinced.

It seems to me there is a confusion of layers here, between the displayable
form of a Unicode character and the bit pattern of its encoding.

What you and Tim are proposing is to define additional bit patterns (*) for
certain Unicode characters, which, when re-interpreted as (sequences of)
UTF-8 bit patterns, look like XML entity references.

Therefore at the very heart of your proposal is a re-interpretation trick of
bit patterns between UTF-8 on one side and UTF-8+names on the other side.

Indeed, if one uses UTF-8+names just as an encoding of Unicode (with no
re-interpretation trick), no human user will ever see those     things.
All that humans will see is some displayable form of the  NON-BREAK SPACE
character, which happened to be encoded as  0x26 0x6E 0x62 0x73 0x70 0x3B
rather than as  0xNN1 0xNN2 (the two bit patterns being equivalent).  

In other words, if the UTF-8+names encoding is used to go from Unicode code
points to bit patterns and vice versa (which is how an encoding is supposed
to be used), the whole point of defining human-readable alternatives is
defeated.  For the human-readable alternatives to be useful, you need to
resort to reinterpretation of this encoding as if it were a different
encoding.

That is, unless you want to modify Unicode itself, by introducing a macro
mechanism that would also affect the displayable form of the characters.  In
other words, I wouldn't see any point in defining a macro mechanism at the
level of the encoding, because it is not reflected in the displayable form.
Who is the end-user of Unicode after all?  I am sure it is the person that
sees the displayable form of the characters, not the person that uses
technical tricks such as a sister encoding to prevent the macros from being
expanded.

I am not actually proposing to add this macro functionality to Unicode, but
I am saying that there are two places where the initial problem can be
addressed:  either at the XML level or at the Unicode level (which involves
the displayable form).  Not at the encoding level.

Alessandro



(*) The byte sequence  0x26 0x6E 0x62 0x73 0x70 0x3B  would be such a bit
pattern.



> 
> -- 
> John Cowan  jcowan@r...  www.reutershealth.com  
www.ccil.org/~cowan [R]eversing the apostolic precept to be all things to
all men, I usually [before Darwin] defended the tenability of the received
doctrines, when I had to do with the [evolution]ists; and stood up for the
possibility of [evolution] among the orthodox--thereby, no doubt, increasing
an already current, but quite undeserved, reputation for needless
combativeness.  --T. H. Huxley

-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an initiative
of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.