[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SGML query: SHUNCHAR


reference to non sgml character
John Cowan <jcowan@r...> wrote:

| Am I right in thinking that a SHUNCHAR should not appear as-is in an 
| entity,

Unless it is a function character (e.g. RS, RE and SEPCHAR, which is how
0x0a, 0x0d and 0x09 can appear literally) 

| but may be referred to by a character reference? 

Yes - any character in the document character set can be encoded as a
character reference and will be treated unconditionally as data at the
point of occurrence.  This *includes* NONSGML characters (the ones mapped
to UNUSED).  

For example, this document is valid with the doc char set mapped to
Latin-1, Unicode etc. (where decimal 128-159 are UNUSED):

  <!DOCTYPE foo [
      <!ELEMENT foo  - - (#PCDATA) >
      <!ENTITY  bar  CDATA  "&#156;" >
  ]>
  <foo>[&bar;]</foo>

The character reference is unconditionally data in the replacement text of
the entity declaration.  The CDATA keyword now says that the replacement
text is still data anywhere the entity reference occurs.  Take out the
CDATA modifier, and now nsgmls will throw an error like this:

nsgmls:ex.txt:5:11:E: non SGML character number 156

Note that the error is at the point where the entity reference occurs in
the instance (5:11), not in the entity declaration.

| If not, is there any way in the SGML declaration to specify characters 
| that have this property?

If you're asking, is there a way to *require* that a character reference
always be used for a character, then the answer would be to ensure that
it's in the non-SGML character class (because then showing up directly
would throw an error.)   But that isn't foolproof, because if the context
gets reparsed for markup (as the entity replacement text in the example
above would without the CDATA modifier) you would still get an error.
 
| Also, how strong is that "should not appear" in practice?

Absolute.  13.1.2 "Non-SGML Character Identification" (p.455 in the
Handbook):

: Each _character numnber_[64] to which no meaning is assigned by the 
: _character set description_[73] is assigned to NONSGML, thereby
: identifying it as a non-SGML character.
: [...] 
: A shunned character must be identified as a non-SGML character, unless
: it is a significant SGML character.



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.