[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Turning escaped mixed content back to XML

Subject: Re: Turning escaped mixed content back to XML
From: Martin Holmes <mholmes@xxxxxxx>
Date: Fri, 28 Mar 2014 14:06:30 -0700
 Re: Turning escaped mixed content back to XML
I spoke too soon. Passing this:

contains a single TEI-conformant document, comprising a TEI header and a text, either in isolation or as part of a &lt;gi&gt;teiCorpus&lt;/gi&gt; element.

into parse-xml-fragment() gets this fatal error:

FODC0006: First argument to parse-xml-fragment() is not a well-formed and namespace-well-formed XML fragment. XML parser reported: I/O error reported by XML parser processing file:/home/mholmes/Documents/tei/council/translation/new_translations_into_specs.xsl: 404 Not Found for: http://www.saxonica.com/parse-xml-fragment/actual.xml

This is with Saxon 9.1.5.3 PE.

I must be missing something here. The default namespace is tei, the xpath-default-namespace is tei, and all the other namespaces have defined prefixes (tei has tei: too).

Cheers,
Martin

On 14-03-28 12:09 PM, Martin Holmes wrote:
That's what I needed: parse-xml-fragment(). This seems to work:

<xsl:template match="text:p" exclude-result-prefixes="#all">

  <!--       <xsl:variable name="unparsed" select="concat('&lt;p&gt;',
string-join(//text(), ''), '&lt;/p&gt;')"/>
         <xsl:variable name="parsed" select="saxon:parse($unparsed)"/>
          <xsl:copy-of select="$parsed" exclude-result-prefixes="#all"/>-->
         <xsl:if test="string-length(.) gt 0">
         <tei:p>
             <xsl:value-of
select="parse-xml-fragment(string-join(//text(), ''))"/>
             </tei:p></xsl:if>
     </xsl:template>

for most cases. I do have some horrible edge-cases though:

<text:p>a start-tag, with delimiters &lt; and &gt; is intended</text:p>

I should be able to pre-process the input text for angle brackets in the
context of spaces and swap them out for something else temporarily though.

Thanks,
Martin

On 14-03-28 11:35 AM, Martin Honnen wrote:
Martin Holmes wrote:

I'm trying to process an ODS spreadsheet which has <text:p> nodes which
contain embedded mixed-content markup in escaped form:

<text:p>indicates the amount by which this zone has been rotated
clockwise, with respect to the normal orientation of the parent
&lt;gi&gt;surface&lt;/gi&gt; element as implied by the dimensions given
in the &lt;gi&gt;msDesc&lt;/gi&gt; element or by the coordinates of the
&lt;gi&gt;surface&lt;/gi&gt; itself. The orientation is expressed in arc
degrees.</text:p>

I need to turn this back into parsed XML for insertion into XML
documents. I'm using Saxon 9.4 with XSLT 2 (and I can use 3 if
necessary).

I tried


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:text="http://example.com"
   xmlns:tei="http://example.com/tei"
   version="3.0">

<xsl:template match="text:p">
   <tei:p>
     <xsl:copy-of select="parse-xml-fragment(.)"/>
   </tei:p>
</xsl:template>

</xsl:stylesheet>

with Saxon 9.5 PE and got


<?xml version="1.0" encoding="UTF-8"?><tei:p xmlns:text="http://example.com" xmlns:tei="http://example.com/tei">indicate s the amount by which this zone has been rotated clockwise, with respect to the normal orientation of the parent <gi>sur face</gi> element as implied by the dimensions given in the <gi>msDesc</gi> element or by the coordinates of the <gi>sur face</gi> itself. The orientation is expressed in arc degrees.</tei:p>

That has XML elements and not escaped markup so should do, you will need
to change the namespaces and maybe use exclude-result-prefixes.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.