[Home] [By Thread] [By Date] [Recent Entries]
With all due respect, if you would have read the mail that prompted this question, the proposal was to change the role of character entitization. We are trying to solve a problem and that may mean to get rid of old rules while being backwards-compatible as far as possible and with as many agreement that we can get (ie. Get a rule into XML 1.1). Best regards Michael > -----Original Message----- > From: Mike Brown [mailto:mike@s...] > Sent: Tuesday, March 26, 2002 20:44 PM > To: xml-dev@l... > Subject: Re: MSXML DOM Special Chars Less Than 32 > > James Clark wrote: > > > How would you serialize a C# string that contains the sequence > > > 0xD800,0xD800? If you serialize it as ��, then what > > > happens if somebody writes ��? Is that equivalent to > > > 𐀀? > > Michael Rys wrote: > > This is basically a question of the encoding. If you use UTF-16 then > > it's the parser's job to take �� and map it into the > > encoding of the target environment. If you use UCS-4 as the encoding, > > then you probably did not generate �� in the first place > > but 𐀀... > > With all due respect, this is nonsense. > > 1. A character reference is a lexical construct for representing a single > Unicode character by its decimal or hexadecimal code point. It is not a > generic mechanism for representing any code point, and it is not a > mechanism > for representing characters by their code values in some encoding form. > > A character reference must only reference a code point that corresponds to > a > Unicode character, and that character must be legal in XML. Character > references have nothing to do with encoding, and I hope this discussion is > not > proposing that XML change in this regard. > > 2. Code points 0xD800-0xDFFF are *not* mapped to characters in Unicode / > ISO/IEC 10646. That's why they're excluded from XML's char production. > There > is no character number 0xD800, and there never will be. > > 3. The only way to use character references to represent character # > 0x10000 > is to write 𐀀 or 𐀀. "��" is not well-formed > XML > (see WFC: Legal Character in sec. 4.1 of the spec). > > The answer to James' first question is that if the C# "string" is actually > a > sequence of 16-bit values, and there are no guarantees that these values > are > going to conform to the rules of UTF-16 or some other predictable > encoding, > then it is wrong to blindly serialize that data with a mechanism that > writes > out the hex values preceded by "&#x" and followed by ";". > > Just as one would do with the 0x0000-0x0008 and other control characters, > you > look at it before you serialize it, and say "can I put this in XML or > not", > and if it's no way you can make a legal character out of it, then you > ignore > it, or you raise an exception, or (though I don't agree with this), you > use > "?". You don't emit malformed XML. > > > I understand that Michael Kay is proposing that the well-formedness > constraint > be modified such that "�" is legal but the bytewise encoded NUL is > not, > and perhaps the discussion above is based on "what if" that kind of thing > were > allowed. I have issues with his proposal as well, but at least the 0x0- > 0x1F > code points do map to actual characters, whereas "�" and "�" > and > (as another example) "" do not. > > FWIW, I am strongly against changing the definition of a character > reference > in order to make "�" allowable. Let's not proceed with theoretical > arguments that assume character references are more complex or flexible > than > they actually are. > > - Mike > ________________________________________________________________________ __ > __ > mike j. brown | xml/xslt: http://skew.org/xml/ > denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl>
|

Cart



