[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SGML query: SHUNCHAR
John Cowan <jcowan@r...> wrote: | Am I right in thinking that a SHUNCHAR should not appear as-is in an | entity, Unless it is a function character (e.g. RS, RE and SEPCHAR, which is how 0x0a, 0x0d and 0x09 can appear literally) | but may be referred to by a character reference? Yes - any character in the document character set can be encoded as a character reference and will be treated unconditionally as data at the point of occurrence. This *includes* NONSGML characters (the ones mapped to UNUSED). For example, this document is valid with the doc char set mapped to Latin-1, Unicode etc. (where decimal 128-159 are UNUSED): <!DOCTYPE foo [ <!ELEMENT foo - - (#PCDATA) > <!ENTITY bar CDATA "œ" > ]> <foo>[&bar;]</foo> The character reference is unconditionally data in the replacement text of the entity declaration. The CDATA keyword now says that the replacement text is still data anywhere the entity reference occurs. Take out the CDATA modifier, and now nsgmls will throw an error like this: nsgmls:ex.txt:5:11:E: non SGML character number 156 Note that the error is at the point where the entity reference occurs in the instance (5:11), not in the entity declaration. | If not, is there any way in the SGML declaration to specify characters | that have this property? If you're asking, is there a way to *require* that a character reference always be used for a character, then the answer would be to ensure that it's in the non-SGML character class (because then showing up directly would throw an error.) But that isn't foolproof, because if the context gets reparsed for markup (as the entity replacement text in the example above would without the CDATA modifier) you would still get an error. | Also, how strong is that "should not appear" in practice? Absolute. 13.1.2 "Non-SGML Character Identification" (p.455 in the Handbook): : Each _character numnber_[64] to which no meaning is assigned by the : _character set description_[73] is assigned to NONSGML, thereby : identifying it as a non-SGML character. : [...] : A shunned character must be identified as a non-SGML character, unless : it is a significant SGML character.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|