[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: character entities

Subject: RE: character entities
From: Kevin Rodgers <kevin.rodgers@xxxxxxx>
Date: Thu, 28 Apr 2005 15:16:44 -0600
xslt ampersand
Edward Bryant writes:
> >The &amp; represents an ampersand; it's seen within the XSLT
> >stylesheet as a single character (for example, string-length() is 1);
> >but if you serialize the result as XML or HTML then it will be
> >>output as &amp; because that's how an ampersand is represented in
> >XML and HTML.
> 
> I get that the "&amp;" is the unicode for an ampersand, but what do
> you mean by "serialize" ?

Think of it this way:

The XML input is parsed to yield a document tree of element, attribute,
text, processing instruction, comment, and namespace nodes.

The XSLT stylesheet is parsed (as XML), to yield a document tree of
element nodes (declarations and instructions), etc.

The XSLT semantics are applied to the XML tree, to yield the result
tree.

The result tree is output as a sequence of characters, according to the
xsl:output declaration.  This is called serialization (of the tree).

> So, if a character reference is in an XML source file it will show up
> as a reference in an XHTML output file (I got the impression from
> other posts that the XSLT would change the reference into the actual
> character)?

Right.  It doesn't really matter whether the markup character was
originally represented in the XML input as a character reference, an
entity reference, or as a data character within a CDATA section; it will
be changed into the actual character during parsing; then the xml, html,
and xhtml output methods will serialize it as an entity (or character)
reference.  And any element content or attributes in the XSLT stylesheet
that have been copied to the result tree will be serialized in the same
way.

> >You can't create character references in an XSLT template.
> 
> So what is the accepted way to add character references to the output?
> Would I have to run some kind of find-and-replace script after the
> XSLT transformation? What do other people do?

Avoid generating entity or character references in the output: If your
output encoding (e.g. UTF-8) has as its domain the entire XML character
set (Unicode), then any character can just be output in that encoding
(whether it's a single- or multi-byte sequence) and doesn't need to be
escaped as a reference.

> I came across "xmlchar" at XML.com. I didn't what to use it, but it changes 
> an element into a character reference. Looking at their stylesheets, I don't 
> understand how that works but my attempt to change <quote></quote> into the 
> #8220 and #8221 entities won't?

Why not just output those characters directly?

But if you're bent on outputting the sequence of characters "&#8220;",
then maybe that string can be represented in your stylesheet as
"&amp;#8220;".

-- 
Kevin Rodgers

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.