[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Specifying Character Sets
Hi Mike, Le jeudi 26 janvier 2006 à 09:02 +0000, Michael Kay a écrit : > > I am working on a small schema language for an XML language that I > > will be using in an open source program. In this schema I > > am defining > > a text data type. I want the schema developer using my > > schema language > > to have the option of specifying the character set of the text data type. > > > > A given XML document is only in one character set. To > > support multiple > > character sets you'll have to do something like base64-encode > > the content. > > I read the question differently (though people often use "character set" to > mean "character encoding", so I might be wrong). XML allows the Unicode > character set (or some version of it). You may want in a schema to restrict > the user to a subset of the characters in that character set, for example > the subset of characters defined in iso-8859-1, or the subset defined in > iso-8859-2, or some subset of your own choosing such as [A-Z][0-9][.,-]. > > There are international names for character encodings such as iso-8859-1 > (search for IANA register of character sets). They define the encodings of > the characters, which you aren't interested in, but in doing so they also > define the repertoire of characters (that is, the character set in its > strict meaning). Right but to be exhaustive, I'd add that they define the repertoire of characters that can be directly included in a XML document but still do not prevent to add characters external to this repertoire as numeric entities. > I would think that a more useful approach, however, is to use the names of > blocks of characters defined in Unicode, which are available for use in XML > Schema regular expressions, for example <xs:pattern value="\p{IsHebrew}*"/> > limits you to characters with Unicode codepoints 590-5FF. Yep, except that you can't apply this constraint with W3C XML Schema (nor with RELAX NG) to mixed content models which makes it quite useless for a lot of real world applications. <plug href="http://dsdl.org/" type="shameless"> Solving this specific issue is the goal of DSDL Part 7 Character Repertoire Description Language - CRDL and everyone interested in this issue is welcome to help! </plug> Note that this restriction can be expressed with ISO Schematron using XPath 2.0 as its expression language (or with plain XSLT 2.0). Eric -- GPG-PGP: 2A528005 Freelance consulting and training. http://dyomedea.com/english/ ------------------------------------------------------------------------ Eric van der Vlist http://xmlfr.org http://dyomedea.com (ISO) RELAX NG ISBN:0-596-00421-4 http://oreilly.com/catalog/relax (W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema ------------------------------------------------------------------------ Ceci est une partie de message=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|