[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Couldn't illegal XML characters be used simply by escaping them?

  • From: "Costello, Roger L." <costello@mitre.org>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Sat, 10 Nov 2012 13:08:29 +0000

Couldn't illegal XML characters be used simply by escaping them?
Hi Folks,

This week I was in a discussion and the topic of illegal XML characters came up and someone asked: "Couldn't illegal XML characters simply be escaped?"

Here is my response. Is it correct? Complete? Easy to understand?

We need to distinguish between a reserved XML character versus an illegal character.

The '<' symbol is a reserved XML character. If data contains that symbol it will confuse an XML Parser because the Parser will think, "Oh, a new element is being started."

For example, consider this:

<Equation>if A < B then ...</Equation>

That '<' symbol needs to be escaped. We can escape it using the built in &lt; entity or the decimal or the hexadecimal value of the symbol. Let's do the latter:

<Equation>if A &#x3C; B then ...</Equation>

Now the XML Parser is not confused into thinking that the XML is trying to start a new element. Note that the XML Parser does resolve the character entity reference and the output of the Parser is this:

<Equation>if A < B then ...</Equation>

We've made it past the Parser, so that '<' symbol no longer a problem.

An important thing to note is that the '<' symbol is (obviously) a legal character.

The XML 1.0 specification lists those characters that may be used in an XML document (see below for a partial list). So some characters cannot be used in XML documents. For example, hex 0 (null) is not a legal XML character.

[Person I was talking to] your suggestion is to escape illegal characters like so:

<Test> Here is a null character: &#x0;</Test>

What will an XML Parser do with that character entity reference? It will resolve it (let (null) represent the null character):

<Test> Here is a null character: (null)</Test>

But now the output of the XML Parser is an XML document that contains an illegal character. Thus an error is thrown.

Recap: reserved characters may be used where they ordinarily would cause confusion by escaping them. But illegal characters may never be used and escaping them does not help.

/Roger

Decimal value of
US-ASCII character | Is an XML character?
------------------------------------------
    1              |  No
    2              |  No
    3              |  No
    4              |  No
    5              |  No
    6              |  No
    7              |  No
    8              |  No
    9              |  Yes
   10             |  Yes
   11             |  No
   12             |  No
   13             |  Yes
   14             |  No
   15             |  No
   16             |  No
   17             |  No
   18             |  No
   19             |  No
   20             |  No
   21             |  No
   22             |  No
   23             |  No
   24             |  No
   25             |  No
   26             |  No
   27             |  No
   28             |  No
   29             |  No
   30             |  No
   31             |  No
   32-127    |  Yes


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.