[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Dealing mixed content with invalid node-like text
On Sun, Dec 04, 2011 at 03:00:36PM -0500, Syd Bauman scripsit: [parsing a string containing an imbalanced XML fragment into nodes] > In which case, someone who knows more about such things will need to > answer, as I don't think I know how to convert a string to a sequence > of nodes or a result tree fragment. I'm not really sure why one would > want to do such a thing, Sometimes you get mixed content that needs to be wrapped on delimiters in the string -- think of a comma-separated list of links with associated ancillary text, where you want to have output that replaces the comma delimiters with a wrapper element but keep the link elements in the output. The best way I know of to do this is to serialize the whole chunk of input, tokenize on the delimiter pattern, and convert the results back into nodes. In XSLT 2.0, you can do the node reconstitution using a recursive function: <xsl:function as="node()*" name="d:parseFragmentString"> <xsl:param as="xs:string" name="instring"/> <xsl:choose> <xsl:when test="not(normalize-space($instring))"> <!-- stop; we're out of string --> </xsl:when> <xsl:when test="matches($instring,'^<\p{L}')"> <!-- we start with an element tag; figure out what it is, create it, and call again on the element contents and everything after the element --> <xsl:variable name="eName"> <xsl:choose> <xsl:when test="matches($instring,'^<\w+>')"> <!-- no attributes --> <xsl:sequence select="replace(substring-before($instring,'>'),'^<','')"/> </xsl:when> <xsl:when test="matches($instring,'^<\w+/>')"> <!-- no attributes, empty element --> <xsl:sequence select="replace(substring-before($instring,'/>'),'^<','')"/> </xsl:when> <xsl:otherwise> <!-- attributes --> <xsl:sequence select="replace(substring-before($instring,' '),'^<','')"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="attribString"> <xsl:choose> <xsl:when test="matches($instring,'^<\w+>')"> <xsl:sequence select="()"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="substring-after(substring-before($instring,'>'),' ')"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="closeTag" select="concat('</',$eName,'>')"/> <!-- construct the element, its attributes if any, and call again on its contents --> <xsl:element name="{$eName}"> <xsl:if test="$attribString"> <xsl:variable name="attribList" select="tokenize($attribString,'\s+')"/> <xsl:for-each select="$attribList"> <xsl:variable name="name" select="substring-before(.,'=')"/> <xsl:variable name="value" select="substring-before(substring-after(.,'"'),'"')"/> <xsl:attribute name="{$name}"> <xsl:value-of select="$value"/> </xsl:attribute> </xsl:for-each> </xsl:if> <!-- before the close tag but after the first > which closes this initial element --> <xsl:sequence select="d:parseFragmentString(substring-after(substring-before($instring,$closeTag),'>'))" /> </xsl:element> <!-- everything after the element --> <xsl:if test="substring-after($instring,$closeTag)"> <xsl:sequence select="d:parseFragmentString(substring-after($instring,$closeTag))"/> </xsl:if> </xsl:when> <xsl:when test="matches($instring,'^</')"> <!-- we've made it down to a close tag; if there's anything after it, process that --> <xsl:if test="normalize-space(substring-after($instring,'>'))"> <xsl:sequence select="d:parseFragmentString(substring-after($instring,'>'))" /> </xsl:if> </xsl:when> <xsl:when test="matches($instring,'^<\?')"> <!-- oh look a processing instruction --> <xsl:processing-instruction name="{substring-after(substring-before($instring,' '),'<?')}" select="substring-after(substring-before($instring,'?>'),' ')"/> <xsl:sequence select="d:parseFragmentString(substring-after($instring,'?>'))"/> </xsl:when> <xsl:when test="matches($instring,'^[^<]')"> <!-- it's not a delimited node; emit it as a text node, and call again on everything after the first < if we have one --> <xsl:choose> <xsl:when test="contains($instring,'<')"> <xsl:value-of select="substring-before($instring,'<')"/> <xsl:sequence select="d:parseFragmentString(concat('<',substring-after($instring,'<')))" /> </xsl:when> <xsl:otherwise> <!-- nothing but a string, but it can have escaped XML entities in it which we need to unescape--> <xsl:value-of select="d:unEscapeXMLEntities($instring)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="matches($instring,'^<$')"> <!-- we have a wandering less-than sign --> <xsl:value-of select="$instring"/> </xsl:when> <xsl:otherwise> <xsl:message> <xsl:text>NO MATCH!
</xsl:text> <xsl:text>:|</xsl:text> <xsl:value-of select="$instring"/> <xsl:text>|:
</xsl:text> </xsl:message> </xsl:otherwise> </xsl:choose> </xsl:function> The above works in its context. I should not care to assert that it was fully general, but it ought to at least present a notion of how to approach the problem. -- Graydon
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|