[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Doesn't the list of allowable characters shown in theXML s

  • From: Amelia A Lewis <amyzing@talsever.com>
  • To: Roger L Costello <costello@mitre.org>
  • Date: Thu, 15 Apr 2021 09:18:29 -0400

Re:  Doesn't the list of allowable characters shown in theXML s
Hey, Roger,

XML is a stream of XML characters (per the spec) or codepoints, to be 
more precise. So, there is no such thing as an XML document, 
post-parse, that is anything other than a stream or array of Unicode 
codepoints. A parser that accepts (one of) the EBCDIC encoding(s) as 
input converts (either really, if it's running on a machine that uses a 
different codeset, or theoretically to conform to the spec) the EBCDIC 
input to Unicode. Likewise, output is just serialization of the (either 
actual unicode or platform-specific charset mapped-to-unicode) to 
whatever the (supported) target encoding is.

But it's all defined as unicode, so before you can reason about XML, 
you have to turn the (presumably serialized) stream of not-unicode 
characters into unicode (or you can have a platform-native XML tool, in 
some cases, but it conceptually operates over unicode codepoints, if 
it's an XML tool).

Amy!
On Thu, 15 Apr 2021 12:51:38 +0000, Roger L Costello wrote:
> Hi Folks,
> 
> The XML specification says that these are the codepoints for the 
> characters that are allowed in XML documents:
> 
> Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
> [#x10000-#x10FFFF]	
> 
> But, but, but, ....
> 
> Doesn't that list of codepoints assume the XML documents are encoded 
> using a Unicode character encoding scheme? 

It's not an assumption, it's a requirement.

> What if the XML documents aren't encoded using a Unicode character 
> encoding scheme, then what are the allowable characters? 
> 
> For example, in Unicode the codepoint #x9 corresponds to the 
> "horizontal tab" character but in EBCDIC hex 9 corresponds to the 
> "begin superscript" character. Is the XML specification saying that 
> an XML document using EBCDIC can use the invisible "begin 
> superscript" character but not the "horizontal tab" character? Or, is 
> it saying that am I expected, when using a character encoding scheme 
> other than Unicode, to convert the above list of Unicode codepoints 
> to the corresponding characters in the non-Unicode character encoding 
> scheme? For example, in EBCDIC the "horizontal tab" character is 5.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.