[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Doesn't the list of allowable characters shown in theXML s
Hey, Roger, XML is a stream of XML characters (per the spec) or codepoints, to be more precise. So, there is no such thing as an XML document, post-parse, that is anything other than a stream or array of Unicode codepoints. A parser that accepts (one of) the EBCDIC encoding(s) as input converts (either really, if it's running on a machine that uses a different codeset, or theoretically to conform to the spec) the EBCDIC input to Unicode. Likewise, output is just serialization of the (either actual unicode or platform-specific charset mapped-to-unicode) to whatever the (supported) target encoding is. But it's all defined as unicode, so before you can reason about XML, you have to turn the (presumably serialized) stream of not-unicode characters into unicode (or you can have a platform-native XML tool, in some cases, but it conceptually operates over unicode codepoints, if it's an XML tool). Amy! On Thu, 15 Apr 2021 12:51:38 +0000, Roger L Costello wrote: > Hi Folks, > > The XML specification says that these are the codepoints for the > characters that are allowed in XML documents: > > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x10000-#x10FFFF] > > But, but, but, .... > > Doesn't that list of codepoints assume the XML documents are encoded > using a Unicode character encoding scheme? It's not an assumption, it's a requirement. > What if the XML documents aren't encoded using a Unicode character > encoding scheme, then what are the allowable characters? > > For example, in Unicode the codepoint #x9 corresponds to the > "horizontal tab" character but in EBCDIC hex 9 corresponds to the > "begin superscript" character. Is the XML specification saying that > an XML document using EBCDIC can use the invisible "begin > superscript" character but not the "horizontal tab" character? Or, is > it saying that am I expected, when using a character encoding scheme > other than Unicode, to convert the above list of Unicode codepoints > to the corresponding characters in the non-Unicode character encoding > scheme? For example, in EBCDIC the "horizontal tab" character is 5.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|