[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why isn't the semicolon a reserved character?

  • From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
  • To: "xml-dev@lists.xml.org" <xml-dev@lists.xml.org>
  • Date: Sat, 15 Mar 2014 19:33:31 -0400

Re:  Why isn't the semicolon a reserved character?
At 2014-03-15 21:41 +0000, Costello, Roger L. wrote:
This XML document is not well-formed:

<Document>
]]>
</Document>

Why? Because the XML parser see that and thinks that the > symbol marks the end of a CDATA section;
False. The "]]>" marks the end of a CDATA section:

http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CDEnd

A simple ">" in parsed character data is not a problem when it is not preceded by two right square brackets. This comes up in my XML syntax class (which, since December, has been available for streaming on Pluralsight).

The following is well-formed as the simple greater-than symbol does not mark the end of a CDATA section:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
This is a > greater-than symbol.
</doc>

the XML parser throws an error since there is no preceding <![[CDATA
To be precise in a way that answers a later question below, it throws an error because at the point the end of CDATA was encountered it was not in a CDATA section. Which, BTW, you mistyped ... the start of a CDATA section is <![CDATA[ per:

http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CDStart

The > symbol must be escaped like so:

<Document>
]]&gt;
</Document>

Now consider the ; symbol. It marks the end of an entity reference.

This is a well-formed XML document:

<Document>
A;B
</Document>

Why doesn't the XML parser see that and think that the ; marks the end of an entity reference; why doesn't the XML parser throw an error since there is no preceding & symbol?
Because an entity reference is not a "section" of parsed data ... it is a concise markup construct. It is easy to detect the end of an entity reference:

http://www.w3.org/TR/2008/REC-xml-20081126/#NT-EntityRef

Note how the content of an entity reference is a simple name.

The content of a CDATA section is far more complex and so is described using a wildcard:

http://www.w3.org/TR/2008/REC-xml-20081126/#NT-CData

Note the interesting quirk that within a CDATA section there is no such thing as an embedded CDATA section ... the following is well-formed:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
This is a <![CDATA[ section <![CDATA[ <![CDATA[ <![CDATA[ <![CDATA[ ]]>
</doc>

CDATA sections are not allowed in attributes, while entity references are.

Parsed character data character data sections are simply "different" and so are treated different when parsing.

Why isn't the ; symbol a reserved symbol?
What do you mean by "reserved"?

It isn't available as a built-in character entity because it isn't needed to disambiguate otherwise ambiguous strings found in parsed character data.

And it just is, as it was in SGML and so is in XML.

I hope this helps.

. . . . . . Ken

--
Public XSLT, XSL-FO, UBL & code list classes: Melbourne, AU May 2014 |
Contact us for world-wide XML consulting and instructor-led training |
Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm |
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ |
G. Ken Holman mailto:gkholman@CraneSoftwrights.com |
Google+ profile: http://plus.google.com/+GKenHolman-Crane/about |
Legal business disclaimers: http://www.CraneSoftwrights.com/legal |


---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.