[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Generating numeric character references

Subject: RE: Generating numeric character references
From: "Yates, Danny (ANTS)" <danny.yates@xxxxxxxxxx>
Date: Thu, 16 Jan 2003 09:57:44 -0000
character reference for a house
Hi,

This won't work.

If you took the results of this transform and gave them to SAX
ContentHandler you'd get a 'characters' call with the string
"&#173;", not with the single character represented by U+00AD.

Also, if you re-serialised the result, you end up back where
you started: & a m p ; # 1 7 3 ;

Dan.

-- 
Danny Yates
Technical Architect
Abbey National Treasury Services
E-mail: Danny.Yates@xxxxxxxxxx
Phone: +44 20 7756 5012
Fax: +44 20 7612 4342


-----Original Message-----
From: Andrew Welch [mailto:AWelch@xxxxxxxxxxxxxxx]
Sent: 16 January 2003 09:45
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: RE:  Generating numeric character references



I think the original poster had a problem of double escaping, such as

& a m p ; # 1 7 3 ;

in their source, and they simply wanted to convert this to the correct & # 1
7 3 ;

Wouldn't running the source xml through an indentity transform would give
the desired result, no need for string processing of any kind.....

cheers
andrew


> -----Original Message-----
> From: Wendell Piez [mailto:wapiez@xxxxxxxxxxxxxxxx]
> Sent: 14 January 2003 21:55
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  Generating numeric character references
> 
> 
> Stuart,
> 
> The reason your task is proving difficult is that it's really 
> not what it 
> appears to be at first blush. There is a trap here, which you 
> can recognize 
> if you can clearly distinguish between XML-as-serialization 
> format, and the 
> XML document (a tree of nodes as described in the XPath spec) 
> that an XSLT 
> processor operates on.
> 
> Numeric character references may appear in 
> XML-as-serialization; in the 
> XPath tree (the "document" built by the parser and handed to the XSLT 
> engine), however, these references never appear as such; 
> rather, each has 
> been converted into the character it represents.
> 
> So, for example, if your data has character reference &#x41;, 
> your XSLT 
> processor sees this as an "A". (It may come out the back as 
> "&#x41;" if 
> your serialization encoding happens not to be able to do a 
> proper "A", but 
> internally it's an "A"). Therefore, what's required with 
> "&amp;#x41;" isn't 
> to turn it into "&#x41;", but rather into "A". (Or, if you 
> get my drift: 
> you need to convert "&amp;#x41;" into "&#x41;" *before* your 
> document is 
> parsed, or an "&#x41;" into an "A" *after* your document is parsed.)
> 
> You are currently trying to do the latter; and it can be done 
> -- as you're 
> discovering -- with recursive processing over text nodes, 
> heuristics to 
> recognize target substrings, and a table to map them. But 
> it's not a job 
> that XSLT lends itself towards, since XSLT is as ungainly for 
> processing 
> strings as it is slick for processing nodes. Far preferable 
> would be to use 
> Perl or something else with good support for string-handling 
> and regular 
> expressions, to do the former task (munge the &amp; entities 
> before parsing).
> 
> Yet -- and this is where one has to be *very* cautious -- 
> XSLT does, at 
> least in certain circumstances (i.e. with certain processors 
> in certain 
> operational contexts) give you *some* control over how a 
> document, once 
> processed, is serialized -- and *if your data is clean* this optional 
> feature can be drafted into service to help with your 
> problem. What I'm 
> getting to, of course, is the dreaded disable-output-escaping....
> 
> That is, if your data is otherwise unproblematic, you can 
> achieve your goal 
> by running your document through a near-identity transform 
> that disables 
> output escaping on your text nodes. The document will emerge from the 
> transform unchanged (at least as XPath sees it) but with "&amp;#x41" 
> represented as "&#x41;". This, *when parsed again*, will be 
> seen as the "A" 
> you really want.
> 
> Note that this is not (if we're really strict with our terms) a 
> transformation in the XSLT sense. Rather, it's a tricky 
> application of the 
> serializer attached to most processors, will sometimes break 
> because it 
> disables escaping on the wrong characters (so if you have any 
> data such as 
> "if x &lt; y", you're going to be in trouble unless you write 
> string-processing code to catch and work around it), and uses 
> an optional 
> feature of the language that restricts portability.
> 
> Please consider this only a golden-hammer solution (i.e. 
> lacking a better 
> tool to do the job), and keep in mind it's easy to bang your 
> thumb this way 
> (since any anomalies in the input will make your output not 
> well-formed). 
> It is in view of these limitations that this really should be 
> done in a 
> separate pass, if with XSLT at all.
> 
> Cheers,
> Wendell
> 
>   At 03:05 PM 1/14/2003, you wrote:
> >I'd like to transform specific text subtrings into numeric character
> >references during in an XSLT transformation. For example, I want to
> >transform all occurrences that look like "&amp;#173;" within a string
> >into "&#173".
> >
> >Here's the back story. I have source XML that is generated 
> automatically
> >from HTML by a third-party. The third-party incorrectly 
> handles entity
> >references, so that "&#173;" in the original HTML in becomes
> >"&amp;#173;" in the XML. I want to restore the damage done. 
> To simplify
> >things, I am only interested in documents with ISO 8859-1 encoding.
> >
> >Below is a solution [1] that I am not pleased with. It is a named
> >template that recursively parses a string, trying to replace 
> references.
> >This requires an <xsl:when> element for each value of 
> numeric character
> >reference that might be encountered (see the "additional cases here"
> >comment). Problems with this include linear search of values, omitted
> >values, and opportunity for error in mismatched values.
> >
> >Can anyone suggest a better approach to generating numeric character
> >references? I am would be fine restricting the solution to MSXML or
> >.NET's System.Xml.Xsl XSLT processors, if that is an issue.
> >
> >Many thanks!
> >
> >Cheers,
> >Stuart
> >
> >
> >
> >[1] A less than happy solution:
> >
> >   <xsl:template name="restoreNumCharRefs">
> >     <xsl:param name="string"/>
> >
> >     <xsl:choose>
> >       <xsl:when test="contains($string, '&amp;')">
> >         <xsl:variable name="head" select="substring-before($string,
> >'&amp;')"/>
> >         <xsl:variable name="remainder" 
> select="substring-after($string,
> >'&amp;')"/>
> >         <xsl:variable name="reference"
> >select="substring-before($remainder, ';')"/>
> >
> >         <xsl:variable name="entity">
> >           <xsl:choose>
> >             <xsl:when test="$reference='#167'">&#167;</xsl:when>
> >             <xsl:when test="$reference='#173'">&#173;</xsl:when>
> >
> >             <!-- additional cases here -->
> >
> >             <xsl:otherwise>&amp;<xsl:value-of
> >select="$reference"/>;</xsl:otherwise>
> >           </xsl:choose>
> >         </xsl:variable>
> >
> >         <xsl:variable name="tail">
> >           <xsl:call-template name=" restoreNumCharRefs">
> >             <xsl:with-param name="string"
> >select="substring-after($remainder, ';')"/>
> >           </xsl:call-template>
> >         </xsl:variable>
> >
> >         <xsl:value-of select="concat($head, $entity, $tail)"/>
> >       </xsl:when>
> >       <xsl:otherwise>
> >         <xsl:value-of select="$string"/>
> >       </xsl:otherwise>
> >     </xsl:choose>
> >
> >   </xsl:template>
> >
> >
> >  XSL-List info and archive:  
http://www.mulberrytech.com/xsl/xsl-list


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.441 / Virus Database: 247 - Release Date: 09/01/2003
 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.441 / Virus Database: 247 - Release Date: 09/01/2003
 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


***************************************************************************
This communication (including any attachments) contains confidential information.  If you are not the intended recipient and you have received this communication in error, you should destroy it without copying, disclosing or otherwise using its contents.  Please notify the sender immediately of the error.

Internet communications are not necessarily secure and may be intercepted or changed after they are sent.  Abbey National Treasury Services plc does not accept liability for any loss you may suffer as a result of interception or any liability for such changes.  If you wish to confirm the origin or content of this communication, please contact the sender by using an alternative means of communication.

This communication does not create or modify any contract and, unless otherwise stated, is not intended to be contractually binding.

Abbey National Treasury Services plc. Registered Office:  Abbey National House, 2 Triton Square, Regents Place, London NW1 3AN.  Registered in England under Company Registration Number: 2338548.  Regulated by the Financial Services Authority (FSA).
***************************************************************************


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.