[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Maintaining character entities

Subject: Re: Maintaining character entities
From: David Carlisle <davidc@xxxxxxxxx>
Date: Tue, 20 May 2003 10:04:33 +0100
xml character entity
  I've got XML documents, marked up to a DTD, and calling character entity
  sets. When I run through the XSLT processor (xalan) to output another XML
  file I find the entities have been converted to something different, and
  fairly inconsistently. 

Entities are expanded by the XML parser (probably xerces in your case)
before the XML application (xalan) sees the data.
So they are all gone by the time your stylesheet starts, and nothing you
can do can preserve them. Tjis is intentional behaviour, entities are
supposed to be an _authoring_ macro system and the behaviour of the
document is supposed to be the same whether the author uses the entity
shorthand or the full form, by having the parser replace all of the
entities at the start, consistent behaviour is ensured.


> What I would like to achieve is having &ldquo; &uuml; in my input xml, and
> these entities still being untouched in my output. Can anyone advise how I
> achieve this please?

You can not do that but you can control whether characters are output as
themselves or as entity references or as numerical character references.

If you output as html then most xslt systems will use "& u u m l;" and
friends on output whether or not the entity was used on input.

In XML output, if your processor supports an output encoding (eg ascii)
that does not have the characters, then these characters will be output
as numeric references & # ... ;

Some processors have extension options that give more control, not sure
about xalan though.

> What I'm getting are (&amp;ldquo;, &amp;uuml;),

You should never get that as input from a single character, only if you
input that form (either as &amp;ldquo; or equivalently
<![CDATA[&ldquo;]]> which means the same thing)

>  (ââ,B,Å? (Band Ã,CB¼(B),
That is utf8 which (unlike the entities or latin-1 is understood by all
XML processors, so this is actually the best, most portable output to
get)

>  (&#8220;
That is also portable, and as I say above is the expected output if you
specify an encoding that does not include the character.


Given that all XML processors are mandated to understand 2 of teh 3
outputs that you say you got, why do you need the entities?

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.