[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Parsing complex line (mixed text and markup)

Subject: RE: Parsing complex line (mixed text and markup)
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 14 Feb 2008 23:14:30 -0000
RE:  Parsing complex line (mixed text and markup)
This problem has come up in the past and it's not particularly easy. There
seem to be two main approaches:

(a) convert the string delimiters into element markup, and then use grouping
facilities (xsl:for-each-group) to analyze the overall structure

(b) convert the markup into string delimiters, and then use
xsl:analyze-string.

Both work, but I think (a) is probably a bit easier. 

Do all the delimiters (commas) occur in top-level text nodes, or can they
occur nested within elements? I'll assume the former.

Start by making a copy of the data in which the commas are replaced by
<comma/> elements:

<xsl:template match="tbentry">
  <xsl:variable name="temp">
    <xsl:apply-templates mode="replace-commas"/>
  </xsl:variable>
  ..[G]..
</xsl:template>

<xsl:template match="*" mode="replace-commas">
  <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="text()" mode="replace-commas">
  <xsl:analyze-string select="." regex=",">
    <xsl:matching-substring><comma/></xsl:matching-substring>
    <xsl:non-matching-substring><xsl:value-of
select="."/></xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

Then (at [G] above) process the new tbentry using grouping

  <xsl:for-each-group select="$temp/child::node()"
group-starting-with="comma">
    <entry><xsl:copy-of select="current-group()"/></entry>
  <xsl:for-each-group>

Not tested!

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Ilya Lifshits [mailto:chehlo@xxxxxxxxx] 
> Sent: 14 February 2008 22:38
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Parsing complex line (mixed text and markup)
> 
> Hello experts,
> 
> I'm using  xslt 2.0 processor  both saxon and and altova.
> 
> I'm trying to parse complex line like:
> <tbentry>Some text, Some more text <xref linkend="somelink">  
> even more text , , ,</tbentrys>
> 
> and get following output :
> 
> <row>
>         <entry>Some text</entry>
>         <entry>Some more text <xref 
> linkend="ut_man_related_docs"> and even more text </entry> </row>
> 
> Number of entries is not constant.
> 
> I have easily find the solution of this without mixing the 
> text and markup by using tokenize function.
> But failed to separate text and markup using this approach.
> Example can be found here : http://pastebin.com/m40fd204f
> 
> To formalize the goal: I want to simplify life of our tech 
> writes  by creating wrappers on  top of DocBook that will 
> help transform from my defined syntax to standard Docbook code.
> So if there is another more appropriate way (which is not WYSIWYG
> editor) to achieve this, i can completely change the source line:
>  <tblrow>Some text, Some more text <xref linkend="somelink">  
> even more text </tblrow> as soon as  it's still easy to write 
> :) The only solution i found is pass linkend entry as an 
> attribute to tblrow and another attribute which will specify  
> the entry  number.
> But this is very limited solution and will not allow me to 
> use  xref in  2 entries for example.
> Additional note, I'm absolutely newby in XML.
> 
> Thanks in advance,
>  Ilya.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.