[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Specifying Character Sets
> I am working on a small schema language for an XML language that I > will be using in an open source program. In this schema I > am defining > a text data type. I want the schema developer using my > schema language > to have the option of specifying the character set of the text data type. > > A given XML document is only in one character set. To > support multiple > character sets you'll have to do something like base64-encode > the content. I read the question differently (though people often use "character set" to mean "character encoding", so I might be wrong). XML allows the Unicode character set (or some version of it). You may want in a schema to restrict the user to a subset of the characters in that character set, for example the subset of characters defined in iso-8859-1, or the subset defined in iso-8859-2, or some subset of your own choosing such as [A-Z][0-9][.,-]. There are international names for character encodings such as iso-8859-1 (search for IANA register of character sets). They define the encodings of the characters, which you aren't interested in, but in doing so they also define the repertoire of characters (that is, the character set in its strict meaning). I would think that a more useful approach, however, is to use the names of blocks of characters defined in Unicode, which are available for use in XML Schema regular expressions, for example <xs:pattern value="\p{IsHebrew}*"/> limits you to characters with Unicode codepoints 590-5FF. Michael Kay http://www.saxonica.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|