[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Specifying Character Sets


specifying character encoding
> I am working on a small schema language for an XML language that I
> will be using in an open source program. In this schema I 
> am defining
> a text data type. I want the schema developer using my 
> schema language
> to have the option of specifying the character set of the text data type.
> 
> A given XML document is only in one character set.  To 
> support multiple 
> character sets you'll have to do something like base64-encode 
> the content.

I read the question differently (though people often use "character set" to
mean "character encoding", so I might be wrong). XML allows the Unicode
character set (or some version of it). You may want in a schema to restrict
the user to a subset of the characters in that character set, for example
the subset of characters defined in iso-8859-1, or the subset defined in
iso-8859-2, or some subset of your own choosing such as [A-Z][0-9][.,-].

There are international names for character encodings such as iso-8859-1
(search for IANA register of character sets). They define the encodings of
the characters, which you aren't interested in, but in doing so they also
define the repertoire of characters (that is, the character set in its
strict meaning).

I would think that a more useful approach, however, is to use the names of
blocks of characters defined in Unicode, which are available for use in XML
Schema regular expressions, for example <xs:pattern value="\p{IsHebrew}*"/>
limits you to characters with Unicode codepoints 590-5FF.

Michael Kay
http://www.saxonica.com/



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.