[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Doesn't the list of allowable characters shown in the XML

  • From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
  • To: Roger L Costello <costello@mitre.org>,"xml-dev@l..." <xml-dev@l...>
  • Date: Thu, 15 Apr 2021 10:04:52 -0400

Re:  Doesn't the list of allowable characters shown in the XML
At 2021-04-15 12:51 +0000, Roger L Costello wrote:
The XML specification says that these are the codepoints for the characters that are allowed in XML documents:
Not quite.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

But, but, but, ....

Doesn't that list of codepoints assume the XML documents are encoded using a Unicode character encoding scheme?
The specification says that a "parsed entity contains text, a sequence of characters", and "a character is an atomic unit of text as specified by ISO/IEC 10646. Legal characters are tab, character return, line feed, and legal characters of Unicode and ISO/IEC 10646".

https://www.w3.org/TR/xml/#charsets

Separately, 4.3.3 states "In the document entity, the encoding declaration is part of the XML declaration".

https://www.w3.org/TR/xml/#charencoding

What if the XML documents aren't encoded using a Unicode character encoding scheme, then what are the allowable characters?
The encoding of the document entity is independent of the repertoire of allowable characters. If the document entity expresses a character that is not in the list of allowable characters, then the document is not well-formed.

For example, in Unicode the codepoint #x9 corresponds to the "horizontal tab" character but in EBCDIC hex 9 corresponds to the "begin superscript" character. Is the XML specification saying that an XML document using EBCDIC can use the invisible "begin superscript" character but not the "horizontal tab" character? Or, is it saying that am I expected, when using a character encoding scheme other than Unicode, to convert the above list of Unicode codepoints to the corresponding characters in the non-Unicode character encoding scheme? For example, in EBCDIC the "horizontal tab" character is 5.
Neither. The specification is saying that a document entity has an encoding that is independent of the definition of the text allowed in XML parsed entities. To get the character you want in XML (as defined by Unicode) use the encoding you need in your document (as defined by the XML Declaration).

If you try to say it using your own words, you may end up confusing the reader. I suggest you cite the specification.

I hope this is helpful.

. . . . . Ken


/Roger

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

--
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/x/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @US$125 (5 hours free) |
Essays (UBL, XML, etc.) http://www.linkedin.com/today/author/gkholman |



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.