[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Dealing mixed content with invalid node-like text

Subject: Re: Dealing mixed content with invalid node-like text
From: Graydon <graydon@xxxxxxxxx>
Date: Sun, 4 Dec 2011 15:31:19 -0500
Re:  Dealing mixed content with invalid node-like text
On Sun, Dec 04, 2011 at 03:00:36PM -0500, Syd Bauman scripsit:
[parsing a string containing an imbalanced XML fragment into nodes]
> In which case, someone who knows more about such things will need to
> answer, as I don't think I know how to convert a string to a sequence
> of nodes or a result tree fragment. I'm not really sure why one would
> want to do such a thing, 

Sometimes you get mixed content that needs to be wrapped on delimiters
in the string -- think of a comma-separated list of links with
associated ancillary text, where you want to have output that replaces
the comma delimiters with a wrapper element but keep the link elements
in the output.  The best way I know of to do this is to serialize the
whole chunk of input, tokenize on the delimiter pattern, and convert the
results back into nodes.

In XSLT 2.0, you can do the node reconstitution using a recursive

<xsl:function as="node()*" name="d:parseFragmentString">
  <xsl:param as="xs:string" name="instring"/>
    <xsl:when test="not(normalize-space($instring))">
      <!-- stop; we're out of string -->
    <xsl:when test="matches($instring,'^&lt;\p{L}')">
      <!-- we start with an element tag; figure out what it is, create it, and call again on the element
                  contents and everything after the element -->
      <xsl:variable name="eName">
          <xsl:when test="matches($instring,'^&lt;\w+&gt;')">
            <!-- no attributes -->
          <xsl:when test="matches($instring,'^&lt;\w+/&gt;')">
            <!-- no attributes, empty element -->
            <!-- attributes -->
            <xsl:sequence select="replace(substring-before($instring,' '),'^&lt;','')"/>
      <xsl:variable name="attribString">
          <xsl:when test="matches($instring,'^&lt;\w+&gt;')">
            <xsl:sequence select="()"/>
              select="substring-after(substring-before($instring,'&gt;'),' ')"/>
      <xsl:variable name="closeTag" select="concat('&lt;/',$eName,'&gt;')"/>
      <!-- construct the element, its attributes if any, and call again on its contents -->
      <xsl:element name="{$eName}">
        <xsl:if test="$attribString">
          <xsl:variable name="attribList" select="tokenize($attribString,'\s+')"/>
          <xsl:for-each select="$attribList">
            <xsl:variable name="name" select="substring-before(.,'=')"/>
            <xsl:variable name="value"
            <xsl:attribute name="{$name}">
              <xsl:value-of select="$value"/>
        <!-- before the close tag but after the first > which closes this initial element -->
      <!-- everything after the element -->
      <xsl:if test="substring-after($instring,$closeTag)">
    <xsl:when test="matches($instring,'^&lt;/')">
      <!-- we've made it down to a close tag; if there's anything after it, process that -->
      <xsl:if test="normalize-space(substring-after($instring,'&gt;'))">
        <xsl:sequence select="d:parseFragmentString(substring-after($instring,'&gt;'))"
    <xsl:when test="matches($instring,'^&lt;\?')">
      <!-- oh look a processing instruction -->
        name="{substring-after(substring-before($instring,' '),'&lt;?')}"
        select="substring-after(substring-before($instring,'?&gt;'),' ')"/>
      <xsl:sequence select="d:parseFragmentString(substring-after($instring,'?&gt;'))"/>
    <xsl:when test="matches($instring,'^[^&lt;]')">
      <!-- it's not a delimited node; emit it as a text node, and call again on everything after
                  the first < if we have one -->
        <xsl:when test="contains($instring,'&lt;')">
          <xsl:value-of select="substring-before($instring,'&lt;')"/>
          <!-- nothing but a string, but it can have escaped XML entities in it
               which we need to unescape-->
          <xsl:value-of select="d:unEscapeXMLEntities($instring)"/>
    <xsl:when test="matches($instring,'^&lt;$')">
      <!-- we have a wandering less-than sign -->
      <xsl:value-of select="$instring"/>
        <xsl:text>NO MATCH!&#x000A;</xsl:text>
        <xsl:value-of select="$instring"/>

The above works in its context. I should not care to assert that it was
fully general, but it ought to at least present a notion of how to
approach the problem.

-- Graydon

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.