[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Switching off character entity resolution in XSL

Subject: Re: Switching off character entity resolution in XSL
From: Peter Flynn <pflynn@xxxxxx>
Date: Tue, 03 Feb 2004 12:23:17 +0000
entity bull
On Tue, 2004-02-03 at 03:11, AHynes@xxxxxxxxxx wrote:
> Hello All,
> 
> Unlike what most people would use XSL for (i.e. conversion of XML to HTML
> or other output format), I have a requirement to transform from one XML
> structure to another (subsequent presentation rendering occuring way
> downstream). No big deal I guess, but the annoying thing here is that by
> the time an XML parser has done it's job as per the XML specification, all
> those pesky character entities have been resolved (as defined in the DTD
> for the source document) and the output contains square brackets.
> 
> Example:
> source document contains:     &bull;
> After transformation:         [bull  ]    (of course, the entity declared
> in the DTD is this, i.e. <!ENTITY bull "[bull  ]">)
> What I would like:            &bull;

This looks like it's either an old DTD converted from SGML unedited,
or a DTD written by someone who was unaware that XML shouldn't need 
to use character entities. In practice there are always reasons: an
editor which cannot generate all the required characters is one
common problem.

> I really don't want to go messing with the DTD either, and I really don't
> think a parser would like there being unparsed entities within an entity
> declaration in a  DTD i.e. <!ENTITY bull &bull;> is illegal.

So, alas, is a recursive reference like <!ENTITY bull "&#38;bull;">,
at least in Saxon and I assume in other processors as well.

> I realise there is some way of dealing with this with character
> substitutions before or after using something like sed, but this isn't
> really a great solution, particularly across platforms. Is there any way of
> manipulating the output using XSL, or alternatively switching off entity
> resolution in the parser? 

I don't think so, but you can add to the internal subset a 
declaration of the character entities you want output as something 
else, eg

<?xml version="1.0"?>
<!DOCTYPE whatever SYSTEM "some.dtd" [
<!ENTITY bull "&#x2022;">
]>

This will output a "real" bullet as a numeric character reference.
If you have copies of the character entity declaration files (eg
from the distribution of DocBook) you could reference them in the
internal subset instead, so that all the declarations override
any in the DTD.

Is there a reason why your output should need to preserve the
character entity format?

///Peter



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.