[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Dealing mixed content with invalid node-like text

Subject: Re: Dealing mixed content with invalid node-like text
From: Marc <Marc.Liste@xxxxxxx>
Date: Sun, 04 Dec 2011 20:36:48 +0100
Re:  Dealing mixed content with invalid node-like text
I think taht you will have problem to exploit a file like this with XSLT because it's not a valid XML file, no?

Le 04/12/2011 20:15, Karlmarx R a icrit :


I have a situation where in I need to deal mixed content text that also come with text wthin angle brackets, converted to XML output. For example, texts like:

"Sometext<xx>within valid node</xx>  and like<II .>  Title etc"
"Sometext like<1a .>  Title etc,<xx>within<b>something</b>  valid node</xx>  etc".

Now, the output has to be like:

<nodename>Sometext<xx>within valid node</xx>  and like&lt;II .&gt; Title etc</nodename>
<nodename>Sometext like&lt;1a .&gt; Title etc,<xx>within<b>something</b>  valid node</xx>  etc</nodename>

At present I do not get things like<br/> but assume I get so, it being valid, I should treat it as node. The point I am trying to make is,<II .> and<1a .> like non-node things needs to be treated removing their angle brackets to make the XML valid. Currently I use analyze-string with a regex to deal this, which does not work correctly (due to mistakes). But I would like to know whether there is good standard solution to deal with these sort of text. At present each line of text is passed to this template and treated like:

<xsl:template name="tag-text">
                         <xsl:param name="unparsed" required="yes"/>
                         <xsl:analyze-string select="$unparsed" regex="^(.*?)&lt;(.+)&gt;(.*)&lt;/(.+)&gt;(.*?)$">    <!-- this regex has flaws, in that fails to treat those invalid nodes -->
                                     <xsl:matching-substring>  ** do process and if necessary recuressively call this template again **</xsl:matching-substring>
                                                 <xsl:value-of select="."/>

I suspect possibly there could be a better regex to get the solution I wanted, but not sure whether xslt itself has better way to deal this. Pls can you suggest possible solutions (incl better regex if any of you used it successfully).

Thanks in advance,

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.