[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Defn. of Extender (Pdn. 89) again

  • From: "Paul W. Abrahams" <abrahams@v...>
  • To: XMLDev list <xml-dev@i...>
  • Date: Wed, 25 Aug 1999 20:51:18 -0400

unicode extender
Having searched the unicode.org website, I'm still puzzled as to what an
extender character is.  The issue was raised once  before, back in
January, by the following interchange:

------------------

Re: Extender characters, Production 89 of XML 1.0

John Cowan (cowan@l...)
Mon, 11 Jan 1999 14:07:50 -0500

Elliotte Rusty Harold wrote:

> In XML ["extender"]
> characters can be used anywhere a base character or ideographic
> character can be used.

This is not quite true, because extenders are not name-start characters
in either XML or Unicode.

> However I have been unable to find in the Unicode book or Web site any

> definition of what makes a character an extender. Can anyone clue me
in on
> why some Unicode characters have the extender property while others
don't?
> What's the logic behind this grouping of characters across languages?

Roughly (and unofficially) speaking, an extender is something that isn't

a letter or combining mark but often appears embedded in words.
For example, one may use L plus MIDDLE DOT as a compatibility equivalent

of L WITH MIDDLE DOT in writing Catalan, and we do not want a
Catalan name to break into two names at the MIDDLE DOT.
(The dot is used to distinguish two successive Ls, written with
a dot, from the unitary Catalan letter "ll", written without a dot.)

Extenders are enumerated (but not explained) in Section 5.14 of
the Unicode Standard.

-----------

The description of the Unicode 2.1 character database says nothing about

what an extender is.  The extenders listed in that database are:

00B7;MIDDLE DOT;Po;0;ON;;;;;N;;;;;
02D0;MODIFIER LETTER TRIANGULAR COLON;Lm;0;ON;;;;;N;;;;;
02D1;MODIFIER LETTER HALF TRIANGULAR COLON;Lm;0;ON;;;;;N;;;;;
0387;GREEK ANO TELEIA;Po;0;ON;00B7;;;;N;;;;;
0640;ARABIC TATWEEL;Lm;0;R;;;;;N;;;;;
0E46;THAI CHARACTER MAIYAMOK;Lm;0;L;;;;;N;THAI MAI YAMOK;;;;
0EC6;LAO KO LA;Lm;0;L;;;;;N;;;;;
3005;IDEOGRAPHIC ITERATION MARK;Lm;0;L;;;;;N;;;;;
3031;VERTICAL KANA REPEAT MARK;Lm;0;L;;;;;N;;;;;
3032;VERTICAL KANA REPEAT WITH VOICED SOUND MARK;Lm;0;L;;;;;N;;;;;
3033;VERTICAL KANA REPEAT MARK UPPER HALF;Lm;0;L;;;;;N;;;;;
3034;VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER
HALF;Lm;0;L;;;;;N;;;;;
3035;VERTICAL KANA REPEAT MARK LOWER HALF;Lm;0;L;;;;;N;;;;;
309D;HIRAGANA ITERATION MARK;Lm;0;L;;;;;N;;;;;
309E;HIRAGANA VOICED ITERATION MARK;Lm;0;L;309D 3099;;;;N;;;;;
30FC;KATAKANA-HIRAGANA PROLONGED SOUND MARK;Lm;0;L;;;;;N;;;;;
30FD;KATAKANA ITERATION MARK;Lm;0;L;;;;;N;;;;;
30FE;KATAKANA VOICED ITERATION MARK;Lm;0;L;30FD 3099;;;;N;;;;;

The extenders each fall into category Po (Punctuation, Other) or
category Lm (Letter, Modifier).  However, many other characters fall
into these categories also.  For example:

02B2;MODIFIER LETTER SMALL J;Lm;0;L;<super> 006A;;;;N;;;;;
02B3;MODIFIER LETTER SMALL R;Lm;0;L;<super> 0072;;;;N;;;;;

These all fall into category Lm.  And the following, among many others,
fall into category Po:

0021;EXCLAMATION MARK;Po;0;ON;;;;;N;;;;;
0022;QUOTATION MARK;Po;0;ON;;;;;N;;;;;
0023;NUMBER SIGN;Po;0;ET;;;;;N;;;;;

So despite the statement in the XML spec that ``the character classes
defined here can be derived from the Unicode character database as
follows:'', there doesn't seem to be anything in that database that
would uniquely characterize the extenders.  The statement "Character
#x00B7 is classified as an extender, because the property list so
identifies it" is puzzling since there's nothing in the property list
cited above that would identify it as being such; in fact, the property
list is identical to that of `0021;EXCLAMATION MARK'.

Can anyone elaborate on John Cowan's statement that "an extender is
something that isn't
a letter or combining mark but often appears embedded in words"?

And finally:  I have the Unicode 2.0 book in front of me, and "extender"
appears neither in the General Index nor, as far as I can tell, in the
Table of 
Contents.

Paul Abrahams


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.