[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Your XML documents may use different sets of characters,d
On 17/05/2011 15:49, Costello, Roger L. wrote: > Hi Folks, > > Excellent feedback! Thanks! > > Here's a summary of what I've learned. Please tell me where I err. > > The following statements apply to "data" not to "markup" (i.e., > element names, attribute names). > > 1. Except for unpaired surrogate codepoints and a few control > characters, you can use any character you want in XML documents. > > 2. The characters don't have to be defined in the Unicode > specification. > > 3. For characters that don't have a visual representation or aren't > in the Unicode character set, you can use them via XML's character > entity mechanism, e.g.,■ #xffed does not use XML's entity mechanism, it is a numeric character reference not an entity reference. Also the whether you enter a character directly or via a reference is totally unrelated to whether it has a unicode description or visual representation. > > 4. Implementers of XML applications are free to choose which version > of Unicode they will support. Thus, one implementer of an XML Schema > validator may choose to support Unicode 2.0, while another > implementer of an XML Schema validator may choose to support Unicode > 2.1. One implementer of an XSLT processor may choose to support > Unicode 2.0, while another implementer of an XSLT processor may > choose to support Unicode 2.1. It may be user choice or configurable rather than implementer choice and since we're at Unicode 6 by now one would hope that they don't force version 2 but... > > 5. In XML applications that use regular expressions (e.g. XML Schema, > XSLT), be careful about using regexes that contain regex categories > such as Nd. The characters in those regex categories may vary > depending on which version of Unicode an implementer supports. Thus, > your application may execute without errors with one vendor's tool > and fail on another. It may do that anyway. > > 6. CREPDL is a technology that allows you to precisely define the > universe of characters that you want to allow in your XML documents. > I don't think CREPDL helps here at its regexp syntax is explicitly copied from XSD's and so character classes depend on Unicode in the same way. In fact I don't just think that. the crepdl spec says that explicitly: Case 2: The semantics of regular expressions depends on the Unicode version. Different conformant CREPDL processors may behave very differently. For example, one may report "in", while another, "not-in". > /Roger David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|