[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Character Entities: An XML Core WG View


arabic entities xml
Tim Bray wrote:

> John Cowan wrote:
> > >Why not Unicode.org?  It could create short name "aliases" 
> > >of the long name descriptions.

> > Are you really prepared to create short names 

No -- that's why I suggested Unicode.org.  :)

> > (other than ones involving hex digits) for all 95,156
> > characters in Unicode 3.2?  Or even if we leave out the Han
> > and Hangul characters, the 13,791 characters that are left?
> > It is a biiiiiiiiiiiiiiiiiiiig job.
 
> Yes, but it sure would be nice if it were done.  If this were done, 
> I think that a lot of people would be willing to focus support 
> on this and nothing else.  I wonder how much could be automated?  
> Hmm... -Tim

None of the Latin, Greek, or Math used in today's markup should, IMO, be
automated.  Those should come from the XHTML, Docbook, MathML
traditions, as "unified" by David C. & Co.  

As to the rest, the writing groups are, well, different -- especially as
to case, letters, characters, vowel signs, intent, etc.  A few random
samples from UnicodeData.txt:
	
	BOX DRAWINGS RIGHT LIGHT AND LEFT VERTICAL HEAVY
	RECYCLING SYMBOL FOR TYPE-4 PLASTICS
	UPWARDS HARPOON WITH BARB LEFT BESIDE DOWNWARDS HARPOON WITH
BARB RIGHT

	CYRILLIC CAPITAL LETTER GHE WITH UPTURN

	ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH
	ARABIC LIGATURE FEH WITH KHAH WITH MEEM INITIAL FORM

	SINHALA LETTER MAHAAPRAANA PAYANNA
	SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA

	TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
	TIBETAN SUBJOINED LETTER CHA
	TIBETAN VOWEL SIGN REVERSED II
	TIBETAN SIGN NYI ZLA NAA DA

	HANGUL CHOSEONG CEONGCHIEUMSSANGCIEUC
	HANGUL JUNGSEONG SSANGARAEA
	HANGUL LETTER KAPYEOUNSSANGPIEUP
	PARENTHESIZED HANGUL MIEUM A

You might possibly automate *some* of it group by group.  A lot of them
don't seem to yield very well to "entification", automatic or otherwise.
:) And the alternative underscore trick could cause too many to end it
all with an
&UPWARDS_HARPOON_WITH_BARB_LEFT_BESIDE_DOWNWARDS_HARPOON_WITH_BARB_RIGHT
;.

It's probably best to start with a single unified western set from
XHTML, Docbook, and MathML that people can bring in -- *if they desire*
-- and ten years or so from now, we'll rarely need it (or any other
entified Unicode) anyway.


/Jelks


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.