[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Maintaining character entities
I've got XML documents, marked up to a DTD, and calling character entity sets. When I run through the XSLT processor (xalan) to output another XML file I find the entities have been converted to something different, and fairly inconsistently. Entities are expanded by the XML parser (probably xerces in your case) before the XML application (xalan) sees the data. So they are all gone by the time your stylesheet starts, and nothing you can do can preserve them. Tjis is intentional behaviour, entities are supposed to be an _authoring_ macro system and the behaviour of the document is supposed to be the same whether the author uses the entity shorthand or the full form, by having the parser replace all of the entities at the start, consistent behaviour is ensured. > What I would like to achieve is having “ ü in my input xml, and > these entities still being untouched in my output. Can anyone advise how I > achieve this please? You can not do that but you can control whether characters are output as themselves or as entity references or as numerical character references. If you output as html then most xslt systems will use "& u u m l;" and friends on output whether or not the entity was used on input. In XML output, if your processor supports an output encoding (eg ascii) that does not have the characters, then these characters will be output as numeric references & # ... ; Some processors have extension options that give more control, not sure about xalan though. > What I'm getting are (&ldquo;, &uuml;), You should never get that as input from a single character, only if you input that form (either as &ldquo; or equivalently <![CDATA[“]]> which means the same thing) > (ââ,B,Å? (Band Ã,CB¼(B), That is utf8 which (unlike the entities or latin-1 is understood by all XML processors, so this is actually the best, most portable output to get) > (“ That is also portable, and as I say above is the expected output if you specify an encoding that does not include the character. Given that all XML processors are mandated to understand 2 of teh 3 outputs that you say you got, why do you need the entities? David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|