[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: tool, library to replace pseudo escaped entities withreal

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: Uche Ogbuji <uche@ogbuji.net>
  • Date: Mon, 3 Nov 2014 15:13:53 +1100

Re:  tool


Do you want to repair the file? Perhaps this could work:

Make an xslt2 null transform.
Make a template for the description element. 
In that template do a text substitution on data content  to replace " &amp;" with  some unlikely single character, eg &#x4000; convert to a sequence of codepoints with string-to-codepoints(), and put that into a variable.
Iterate over each codepoint in the variable, outputting it as a character, and when you find 0x4000; output it in xsl:text with disable-output-escaping to true.

On 30/10/2014 11:28 PM, "Uche Ogbuji" <uche@ogbuji.net> wrote:
On Thu, Oct 30, 2014 at 5:16 AM, Gareth Oakes <goakes@g...> wrote:
>I'm sure someone must have written a nice little python script or
>something similar to do this sometime, anyway I have some XML with
>stuff like
>
><description>PJ&amp;nbsp;72 fra &amp;Ouml;rsj&amp;ouml; Belysning er
>en funktionel lampe&amp;nbsp;som kan justeres efter eget behov.
>Fremstillet af lakeret metal og&amp;nbsp;f&aring;s i mange
>farver.&amp;nbsp;I serien f&amp;aring;s skrivebordslamper, gulvlamper,
>loftslamper.&amp;nbsp;&amp;nbsp;</description>
>
>anyway, rather than sitting down and writing a solution for this
>problem I am supposing someone has written it in the past, and I can
>just use that.

I'm guessing you want the &amp;s to become ampersands? I'm pretty sure the
regular expression /&amp;/&/g would work in most environments.

Could be dangerous because a plain old &amp; would reduce to a WF error after that transform, and those are pretty common. Unless, that is, you know that &amp; has been "psychoescaped" to &amp;amp; . Can't tell from the sample given.

In other words, the problem is underspecified to provide an off-the shelf solution; it depends on knowing the original pattern reliably, so it might indeed be that writing a bit of code is best.


--
Uche Ogbuji                                       http://uche.ogbuji.net
Founding Partner, Zepheira                  http://zepheira.com
Author, _Ndewo, Colorado_                 http://uche.ogbuji.net/ndewo/
Founding editor, Kin Poetry Journal      http://wearekin.org
http://copia.ogbuji.net    http://www.linkedin.com/in/ucheogbuji    http://twitter.com/uogbuji


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.