[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Processing mixed content. [Was: Parsing complex li

Subject: Re: Processing mixed content. [Was: Parsing complex line (mixed text and markup)]
From: "Manfred Staudinger" <manfred.staudinger@xxxxxxxxx>
Date: Sun, 17 Feb 2008 14:38:09 +0100
Re:  Processing mixed content. [Was: Parsing complex li
On 16/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> I wonder if the Michael first suggestion has disadvantages for your opinion and
> you are trying to improve, or this is just another possible solution ?
I would think, this solution is more general, but I had hoped to get
Michael to comment on that. Certainly it's easy to implement in XSLT
1.0.
Anyway here is a _corrected_ version of the above, tested with saxon 9.0

<xsl:template match="tbentry">
	<xsl:copy>
		<xsl:apply-templates select="@*"/>
		<xsl:variable name="curr" select="."/>
		<xsl:variable name="temp">
			<xsl:apply-templates select="node()" mode="text"/>
		</xsl:variable>
		<xsl:for-each select="tokenize($temp, ',')">
			<entry>
				<xsl:for-each select="tokenize($temp, '@xy')">
					<xsl:choose>
						<xsl:when test="starts-with(., 'xy')">
							<xsl:apply-templates
select="$curr/node()[xs:integer(substring(current(), 3))]"/>
						</xsl:when>
						<xsl:otherwise>
							<xsl:value-of select="."/>
						</xsl:otherwise>
					</xsl:choose>
				</xsl:for-each>
			</entry>
		</xsl:for-each>
	</xsl:copy>
</xsl:template>
<xsl:template match="*" mode="text">
	<xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
</xsl:template>

Manfred

On 16/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> While I'm absolutely not capable to comment if this solution is valid,
> since i'm completely newbie . I wander if the Michael first suggestion
> has disadvantages for your opinion and you are trying to improve, or
> this is just another possible solution ?
> From my newbie point of view the Michael suggestion is more straight
> forward and clear.
>
> Ilya.
>
>
> On Feb 15, 2008 10:43 PM, Manfred Staudinger
> <manfred.staudinger@xxxxxxxxx> wrote:
> > Hi All,
> >
> > I would like to propose a third variant and to get your comments about it.
> >
> > On 15/02/2008, Michael Kay <mike@xxxxxxxxxxxx> wrote:
> > > On 14/02/2008, Ilya Lifshits <chehlo@xxxxxxxxx> wrote:
> > > > I'm using xslt 2.0 processor both saxon and and altova.
> > > >
> > > > I'm trying to parse complex line like:
> > > > <tbentry>Some text, Some more text <xref linkend="somelink">
> > > > even more text , , ,</tbentrys>
> > > >
> > > > and get following output :
> > > >
> > > > <row>
> > > > <entry>Some text</entry>
> > > > <entry>Some more text <xref
> > > > linkend="ut_man_related_docs"> and even more text </entry> </row>
> > > >
> > > > Number of entries is not constant.
> > > >
> > > > I have easily find the solution of this without mixing the
> > > > text and markup by using tokenize function.
> > > > But failed to separate text and markup using this approach.
> > > > Example can be found here : http://pastebin.com/m40fd204f
> > > >
> > > > To formalize the goal: I want to simplify life of our tech
> > > > writes by creating wrappers on top of DocBook that will
> > > > help transform from my defined syntax to standard Docbook code.
> > > > So if there is another more appropriate way (which is not WYSIWYG
> > > > editor) to achieve this, i can completely change the source line:
> > > > <tblrow>Some text, Some more text <xref linkend="somelink">
> > > > even more text </tblrow> as soon as it's still easy to write
> > >
> > > This problem has come up in the past and it's not particularly easy. There
> > > seem to be two main approaches:
> > >
> > > (a) convert the string delimiters into element markup, and then use grouping
> > > facilities (xsl:for-each-group) to analyze the overall structure
> > >
> > > (b) convert the markup into string delimiters, and then use
> > > xsl:analyze-string.
> > >
> > > Both work, but I think (a) is probably a bit easier.
> > >
> > > Do all the delimiters (commas) occur in top-level text nodes, or can they
> > > occur nested within elements? I'll assume the former.
> > >
> > > Start by making a copy of the data in which the commas are replaced by
> > > <comma/> elements:
> > >
> > > <xsl:template match="tbentry">
> > > <xsl:variable name="temp">
> > > <xsl:apply-templates mode="replace-commas"/>
> > > </xsl:variable>
> > > <xsl:for-each-group select="$temp/child::node()"
> > > group-starting-with="comma">
> > > <entry><xsl:copy-of select="current-group()[not(self::comma)]"/></entry>
> > > <xsl:for-each-group>
> > > </xsl:template>
> > >
> > > <xsl:template match="*" mode="replace-commas">
> > > <xsl:copy-of select="."/>
> > > </xsl:template>
> > >
> > > <xsl:template match="text()" mode="replace-commas">
> > > <xsl:analyze-string select="." regex=",">
> > > <xsl:matching-substring><comma/></xsl:matching-substring>
> > > <xsl:non-matching-substring><xsl:value-of
> > > select="."/></xsl:non-matching-substring>
> > > </xsl:analyze-string>
> > > </xsl:template>
> > >
> >
> > (c) convert the elements into strings which contain the position()
> > of the element. After processing the string, reinsert those elements.
> >
> > Let's assume the document does not contain 'xy'. Then
> > <xsl:template match="tbentry">
> > <xsl:variable name="temp">
> >    <xsl:apply-templates mode="text"/>
> > </xsl:variable>
> > <xsl:for-each select="tokenize($temp, ',')">
> >    <entry>
> >       <xsl:for-each select="tokenize(., '@xy')">
> >          <xsl:choose>
> >             <xsl:when test="starts-with(., 'xy')">
> > <!-- A -->   <xsl:apply-templates
> > select="/node()[xs:integer(substring(., 3))]"/>
> >             </xsl:when>
> >             <xsl:otherwise>
> >                <xsl:value-of select="."/>
> >             </xsl:otherwise>
> >          </xsl:choose>
> >       <xsl:for-each>
> >    </entry>
> > <xsl:for-each>
> > </xsl:template>
> >
> > <xsl:template match="*" mode="text">
> >         <xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
> > </xsl:template>
> > <xsl:template match="text()" mode="text">
> >         <xsl:value-of select="."/>
> > </xsl:template>
> >
> > Not tested and I'm uncertain about (A), but a very similar solution
> > works fine in XSLT 1.0, where the processing of the string is done by
> > recursive templates.
> >
> > Thanks in advance,
> >
> > Manfred
> > http://documenta.rudolphina.org/Indices/Index.html

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.