[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Allowed characters for NCName

  • From: David Carlisle <davidc@n...>
  • To: desmond.kirrane@g...
  • Date: Thu, 13 Dec 2007 12:52:01 GMT

Re:  Allowed characters for NCName


something is strange as dotless i is U+0131 which does have the Unicode
letter class, and is allowed in XML names.


> The links show the characters in Hexadecimal. Is there anywhere that
> actually displays the list of characters?

It's quite a long list


there's plenty of places where you can look up unicode names of
characters. the default place being

http://www.unicode.org/charts/
or
http://www.unicode.org/ucd/

(for pdf charts or textual tables and documentation respectively)


However I find it's useful to have the information available as XML.

The following XQuery for example returns the Unicode number and name of
all characters in Unicode 3.0 that has Lu or Ll (upper or lower case
letter) character class.

saxon9q -u -s http://www.w3.org/2003/entities/2007xml/unicode.xml
"{//character[unicodedata/@category=('Ll','Lu')][number(description/@unicode)<=3]/(string(@id),string(description),'&#10;')}"

You might want to fetch the file and run it locally, unicode.xml is
5.6Mb in size and the above returns lots of lines, starting
 U00041 LATIN CAPITAL LETTER A 
 U00042 LATIN CAPITAL LETTER B 
 U00043 LATIN CAPITAL LETTER C 
 U00044 LATIN CAPITAL LETTER D 
 U00045 LATIN CAPITAL LETTER E 
 U00046 LATIN CAPITAL LETTER F 
 U00047 LATIN CAPITAL LETTER G 
 U00048 LATIN CAPITAL LETTER H 
 U00049 LATIN CAPITAL LETTER I 
 U0004A LATIN CAPITAL LETTER J 

As explained at
http://www.w3.org/TR/REC-xml/#NT-Letter
other character classes  need to be included ( Ll, Lu, Lo, Lt, Nl. Mc,
Me, Mn, Lm,  Nd.) and some character ranges are excluded, but the
above Xpath could be adjusted (or the character range list as given in
the xml spec could be made into a regexp) but probably the above is
about as long as you'd want to do on the command line rather than
putting the Xquery/Xpath into a file.


David

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.