[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Specifying Character Sets

  • To: xml-dev@l...
  • Subject: RE: Specifying Character Sets
  • From: Eric van der Vlist <vdv@d...>
  • Date: Thu, 26 Jan 2006 10:15:18 +0100
  • In-reply-to: <20060126090311.13B026D00E3@g...>
  • Organization: Dyomedea (http://dyomedea.com)
  • References: <20060126090311.13B026D00E3@g...>

rss 2.0 character set
Hi Mike,

Le jeudi 26 janvier 2006 à 09:02 +0000, Michael Kay a écrit :
> > I am working on a small schema language for an XML language that I
> > will be using in an open source program. In this schema I 
> > am defining
> > a text data type. I want the schema developer using my 
> > schema language
> > to have the option of specifying the character set of the text data type.
> > 
> > A given XML document is only in one character set.  To 
> > support multiple 
> > character sets you'll have to do something like base64-encode 
> > the content.
> 
> I read the question differently (though people often use "character set" to
> mean "character encoding", so I might be wrong). XML allows the Unicode
> character set (or some version of it). You may want in a schema to restrict
> the user to a subset of the characters in that character set, for example
> the subset of characters defined in iso-8859-1, or the subset defined in
> iso-8859-2, or some subset of your own choosing such as [A-Z][0-9][.,-].
> 
> There are international names for character encodings such as iso-8859-1
> (search for IANA register of character sets). They define the encodings of
> the characters, which you aren't interested in, but in doing so they also
> define the repertoire of characters (that is, the character set in its
> strict meaning).

Right but to be exhaustive, I'd add that they define the repertoire of
characters that can be directly included in a XML document but still do
not prevent to add characters external to this repertoire as numeric
entities.

> I would think that a more useful approach, however, is to use the names of
> blocks of characters defined in Unicode, which are available for use in XML
> Schema regular expressions, for example <xs:pattern value="\p{IsHebrew}*"/>
> limits you to characters with Unicode codepoints 590-5FF.

Yep, except that you can't apply this constraint with W3C XML Schema
(nor with RELAX NG) to mixed content models which makes it quite useless
for a lot of real world applications.

<plug href="http://dsdl.org/" type="shameless">

Solving this specific issue is the goal of DSDL Part 7 Character
Repertoire Description Language - CRDL and everyone interested in this
issue is welcome to help!

</plug>

Note that this restriction can be expressed with ISO Schematron using
XPath 2.0 as its expression language (or with plain XSLT 2.0).

Eric
-- 
GPG-PGP: 2A528005
Freelance consulting and training.
                                            http://dyomedea.com/english/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------

Ceci est une partie de message=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.