[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Your XML documents may use different sets of characters,d

  • From: David Carlisle <davidc@nag.co.uk>
  • To: "Costello, Roger L." <costello@mitre.org>
  • Date: Tue, 17 May 2011 16:09:11 +0100

Re:  Your XML documents may use different sets of  characters
On 17/05/2011 15:49, Costello, Roger L. wrote:
> Hi Folks,
>
> Excellent feedback!  Thanks!
>
> Here's a summary of what I've learned. Please tell me where I err.
>
> The following statements apply to "data" not to "markup" (i.e.,
> element names, attribute names).
>
> 1. Except for unpaired surrogate codepoints and a few control
> characters, you can use any character you want in XML documents.
>
> 2. The characters don't have to be defined in the Unicode
> specification.
>
> 3. For characters that don't have a visual representation or aren't
> in the Unicode character set, you can use them  via XML's character
> entity mechanism, e.g.,&#xffed;

#xffed does not use XML's entity mechanism, it is a numeric character 
reference not an entity reference. Also the whether you enter a 
character directly or via a reference is totally unrelated to whether it 
has a unicode description or visual representation.


>
> 4. Implementers of XML applications are free to choose which version
> of Unicode they will support. Thus, one implementer of an XML Schema
> validator may choose to support Unicode 2.0, while another
> implementer of an XML Schema validator may choose to support Unicode
> 2.1. One implementer of an XSLT processor may choose to support
> Unicode 2.0, while another implementer of an XSLT processor may
> choose to support Unicode 2.1.

It may be user choice or configurable rather than implementer choice and 
since we're at Unicode 6 by now one would hope that they don't force 
version 2 but...

>
> 5. In XML applications that use regular expressions (e.g. XML Schema,
> XSLT), be careful about using regexes that contain regex categories
> such as Nd. The characters in those regex categories may vary
> depending on which version of Unicode an implementer supports. Thus,
> your application may execute without errors with one vendor's tool
> and fail on another.

It may do that anyway.
>
> 6. CREPDL is a technology that allows you to precisely define the
> universe of characters that you want to allow in your XML documents.
>

I don't think CREPDL helps here at its regexp syntax is explicitly 
copied from XSD's and so character classes depend on Unicode in the same 
way. In fact I don't just think that. the crepdl spec says that explicitly:

Case 2: The semantics of regular expressions depends on the Unicode
version. Different conformant CREPDL
processors may behave very differently. For example, one may report
"in", while another, "not-in".



> /Roger

David



________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.