[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Character entities in attribute values

Subject: Re: Character entities in attribute values
From: mark_fletcher@xxxxxxxxxxxxxx
Date: Wed, 23 Apr 2003 10:00:38 -0700
xslt eacute
Hi Mike (and others who have responded),

First, I've found and fixed the problem.  I'm using Arbortext's E3 product
to do my processing and there was an instruction in their internal code to
write out non-ASCII characters as numeric character references.  So, that's
how the accented unicode characters in the tag attributes became character
references.  Once I fixed that problem, the HTML output was fine, as there
were no ampersands in any of the attribute values.

However, it still sounds like you're all saying that even when a character
reference does exist in an attribute value, I should not be seeing escaped
ampersands when that attribute value is output as text.  Well, if anyone's
interested (and I'm not sure why you would be, at this point ;-) here's a
sample of my previous input and output data and my xsl code that
demonstrates the problem I was having:

source xml tag:

<xref linkend="i090f42a68009c2c9" book_code="cmkt" book_title="Guide
Marketing du syst&#xe8;me GRC de PeopleSoft, version 8.8"
chapter_title="D&#xe9;finition des entit&#xe9;s de l'application Marketing
de PeopleSoft"
XREF_type="3" target_title="D&#xe9;finition des entit&#xe9;s de
l'application Marketing de PeopleSoft"
chapter_type="Chapitre" file_name="cmkt03.htm"/>

xsl template for this element:

<xsl:template name="xref">
  <A HREF="../../{@book_code}/htm/{@file_name}#{@linkend}"><xsl:value-of
select="@target_title"/></A>
</xsl:template>

html output:

<A HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&amp;#xe9;finition
des entit&amp;#xe9;s de l'application Marketing de PeopleSoft</A>




Mark Fletcher
PeopleSoft Language Engineering
925.694.3753
mark_fletcher@xxxxxxxxxxxxxx



                                                                                                                          
                      "Mike Brown"                                                                                        
                      <mike@xxxxxxxx>                   To:       xsl-list@xxxxxxxxxxxxxxxxxxxxxx                         
                      Sent by:                          cc:                                                               
                      owner-xsl-list@xxxxxxxxxxx        Subject:  Re:  Character entities in attribute values        
                      rrytech.com                                                                                         
                                                                                                                          
                                                                                                                          
                      04/23/2003 06:05 AM                                                                                 
                      Please respond to xsl-list                                                                          
                                                                                                                          
                                                                                                                          





mark_fletcher@xxxxxxxxxxxxxx wrote:
> the output text looks something like this: &amp;eacute; instead of this:
> &eacute;

First please realize that when you output XML or HTML, the XSLT processor
is
(effectively, not necessarily) running a node tree through a serializer,
and
the serializer is what is escaping "&" and "<" and certain other characters
appearing in places where they would otherwise be confused with markup.

If you're getting &amp;eacute; in the output, then you must have put the 8
characters "&" "e" "a" "c" "u" "t" "e" ";" into an attribute node (or text
node, but you mentioned attribute) in your result tree, perhaps by copying
this text from the source tree. Since you told the processor you wanted the
*node* to contain those 8 characters, rather than 1 entity reference, it
serialized the node in such a way that you'd get the characters when the
output document is parsed. In other words, it preserved the semantics of
the data, clearly distinguishing between character data and the structures
implied by markup.

Given that the XML parser feeding parsed data to the XSLT processor would
have
interpreted "&eacute;" in your original source document as a reference to
the
entity named acute, there's no way the 8 characters could have ended up in
your
source tree unless you did one of the following:
 - explicitly constructed that string in your stylesheet
 - copied text that was originally written like &amp;eacute;
 - copied text that was originally written like <![CDATA[&eacute;]]>

Both of the latter two mean exactly the same thing, and since the most
common
FAQ and misconception on this list (well, one of the most common) is the
mistaken assumptions people make about what CDATA sections are, I'm going
to guess that whoever made your XML decided to try to use it as a transport
for entity-laden, non-well-formed HTML, saying that this data is just text,
not markup. Then you tried to use XSLT to copy it through, and were
surprised
to see that you can't use XSLT to pretend character data is actually
markup.

However, as others have mentioned, this is just a wild guess.
Explain more about what you're doing, with sample code (brief).


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list









 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.