[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Character entities in attribute values

Subject: RE: Character entities in attribute values
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Wed, 23 Apr 2003 19:17:28 +0100
xsl eacute
It looks like a simple explanation - you were using a product with a
serious bug in it.

Michael Kay
Software AG
home: Michael.H.Kay@xxxxxxxxxxxx
work: Michael.Kay@xxxxxxxxxxxxxx 

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> mark_fletcher@xxxxxxxxxxxxxx
> Sent: 23 April 2003 18:01
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  Character entities in attribute values
> 
> 
> 
> Hi Mike (and others who have responded),
> 
> First, I've found and fixed the problem.  I'm using 
> Arbortext's E3 product to do my processing and there was an 
> instruction in their internal code to write out non-ASCII 
> characters as numeric character references.  So, that's how 
> the accented unicode characters in the tag attributes became 
> character references.  Once I fixed that problem, the HTML 
> output was fine, as there were no ampersands in any of the 
> attribute values.
> 
> However, it still sounds like you're all saying that even 
> when a character reference does exist in an attribute value, 
> I should not be seeing escaped ampersands when that attribute 
> value is output as text.  Well, if anyone's interested (and 
> I'm not sure why you would be, at this point ;-) here's a 
> sample of my previous input and output data and my xsl code 
> that demonstrates the problem I was having:
> 
> source xml tag:
> 
> <xref linkend="i090f42a68009c2c9" book_code="cmkt" 
> book_title="Guide Marketing du syst&#xe8;me GRC de 
> PeopleSoft, version 8.8" chapter_title="D&#xe9;finition des 
> entit&#xe9;s de l'application Marketing de PeopleSoft" 
> XREF_type="3" target_title="D&#xe9;finition des entit&#xe9;s 
> de l'application Marketing de PeopleSoft" 
> chapter_type="Chapitre" file_name="cmkt03.htm"/>
> 
> xsl template for this element:
> 
> <xsl:template name="xref">
>   <A 
> HREF="../../{@book_code}/htm/{@file_name}#{@linkend}"><xsl:value-of
> select="@target_title"/></A>
> </xsl:template>
> 
> html output:
> 
> <A 
> HREF="../../cmkt/htm/cmkt03.htm#i090f42a68009c2c9">D&amp;#xe9;finition
> des entit&amp;#xe9;s de l'application Marketing de PeopleSoft</A>
> 
> 
> 
> 
> Mark Fletcher
> PeopleSoft Language Engineering
> 925.694.3753
> mark_fletcher@xxxxxxxxxxxxxx
> 
> 
> 
>                                                               
>                                                             
>                       "Mike Brown"                            
>                                                             
>                       <mike@xxxxxxxx>                   To:   
>     xsl-list@xxxxxxxxxxxxxxxxxxxxxx                         
>                       Sent by:                          cc:   
>                                                             
>                       owner-xsl-list@xxxxxxxxxxx        
> Subject:  Re:  Character entities in attribute values        
>                       rrytech.com                             
>                                                             
>                                                               
>                                                             
>                                                               
>                                                             
>                       04/23/2003 06:05 AM                     
>                                                             
>                       Please respond to xsl-list              
>                                                             
>                                                               
>                                                             
>                                                               
>                                                             
> 
> 
> 
> 
> 
> mark_fletcher@xxxxxxxxxxxxxx wrote:
> > the output text looks something like this: &amp;eacute; instead of 
> > this: &eacute;
> 
> First please realize that when you output XML or HTML, the 
> XSLT processor is (effectively, not necessarily) running a 
> node tree through a serializer, and the serializer is what is 
> escaping "&" and "<" and certain other characters appearing 
> in places where they would otherwise be confused with markup.
> 
> If you're getting &amp;eacute; in the output, then you must 
> have put the 8 characters "&" "e" "a" "c" "u" "t" "e" ";" 
> into an attribute node (or text node, but you mentioned 
> attribute) in your result tree, perhaps by copying this text 
> from the source tree. Since you told the processor you wanted the
> *node* to contain those 8 characters, rather than 1 entity 
> reference, it serialized the node in such a way that you'd 
> get the characters when the output document is parsed. In 
> other words, it preserved the semantics of the data, clearly 
> distinguishing between character data and the structures 
> implied by markup.
> 
> Given that the XML parser feeding parsed data to the XSLT 
> processor would have interpreted "&eacute;" in your original 
> source document as a reference to the entity named acute, 
> there's no way the 8 characters could have ended up in your 
> source tree unless you did one of the following:
>  - explicitly constructed that string in your stylesheet
>  - copied text that was originally written like &amp;eacute;
>  - copied text that was originally written like <![CDATA[&eacute;]]>
> 
> Both of the latter two mean exactly the same thing, and since 
> the most common FAQ and misconception on this list (well, one 
> of the most common) is the mistaken assumptions people make 
> about what CDATA sections are, I'm going to guess that 
> whoever made your XML decided to try to use it as a transport 
> for entity-laden, non-well-formed HTML, saying that this data 
> is just text, not markup. Then you tried to use XSLT to copy 
> it through, and were surprised to see that you can't use XSLT 
> to pretend character data is actually markup.
> 
> However, as others have mentioned, this is just a wild guess. 
> Explain more about what you're doing, with sample code (brief).
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 
> 
> 
> 
> 
> 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.