[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Turning escaped mixed content back to XML

Subject: Re: Turning escaped mixed content back to XML
From: Graydon <graydon@xxxxxxxxx>
Date: Fri, 28 Mar 2014 15:17:53 -0400
Re:  Turning escaped mixed content back to XML
On Fri, Mar 28, 2014 at 12:02:11PM -0700, Martin Holmes scripsit:
> On 14-03-28 11:32 AM, Graydon wrote:
> >On Fri, Mar 28, 2014 at 11:12:37AM -0700, Martin Holmes scripsit:
> >[getting escaped text back into parsed content]
> >>     <xsl:template match="text:p" exclude-result-prefixes="#all">
> >>         <xsl:variable name="unparsed">
> >>             <xsl:copy-of select="*|text()"/>
> >>         </xsl:variable>
> >
> >$unparsed is going to be item()* instead of string if it's formed like
> >that, and I don't think saxon:parse will work on item()* as input, it
> >wants a single string.
> 
> That's why I'm trying to use saxon:serialize to feed into saxon:parse.
> But even if I feed the string-joined text nodes directly into
> saxon:parse(), it fails; I get a "Content not allowed in prolog"
> error, presumably because there's no containing root element in the
> unparsed string. If I try to add that:

Yes.  serialize() and parse() want well-balanced trees, I think the
phrase is; something that could be a document if it was off by itself.

parse-fragment-string() doesn't, and it might be a better bet for the
data you've got.

>     <xsl:template match="text:p" exclude-result-prefixes="#all">
> 
>         <xsl:variable name="unparsed" select="concat('&lt;p&gt;',
> string-join(//text(), ''), '&lt;/p&gt;')"/>
>         <xsl:variable name="parsed" select="saxon:parse($unparsed)"/>
>          <xsl:copy-of select="$parsed" exclude-result-prefixes="#all"/>
> 
>     </xsl:template>
> 
> I get "The entity name must immediately follow the '&' in the entity
> reference," which is a bit puzzling...

Is it possible you've got &amp; entities (or other default XML entities)
in the data?  Those tend to make this whole serialization/parse process
really unpleasant.

If not, xsl:message and xsl:sequence and dump the value you're trying to
parse and see what it really looks like.  One of the other problems with
markup escaped as text is that there isn't anything parsing it until you
try and it can lose angle brackets and gain spaces in bad places and so
on and there often isn't any good automated way to fix that.

-- Graydon

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.