[Home] [By Thread] [By Date] [Recent Entries]
At 5:18 PM +0800 6/24/01, Rick Jelliffe wrote: > From: "John Cowan" <jcowan@r...> > >> All Unicode 3.1 code points, including the unassigned ones, are already >> part of the XML document character set. (The trivial exceptions are >> most of the C0 control characters, the surrogate space, and U+FFFE/FF.) >> The issue here is the implicit NAMECHAR and NMSTCHAR declarations, >> if I remember my SGML 8-letterisms correctly. > >The XML Second Edition only references Unicode 3.0, not 3.1. > >According to the Unicode.org site, Unicode 3.1 "adds a large number of coded >characters." >http://www.unicode.org/unicode/reports/tr27/ > All true. Nonetheless, XML 1.0 (as well as XML 1.0 second edition) does allow Unicode 3.1 characters in element content and attribute values as well as all characters that may be define din the future and as well as code points that may never be defined. Production 2 is normative here: [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ Note especially [#x10000-#x10FFFF]. That's all the characters past the basic multilingual plane. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@m... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+
|

Cart



