|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: HTML text extraction
Hope this could help -
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>
<xsl:template match="/root">
<root>
<xsl:for-each select="p[(. = 'Heading 1') or (. =
'Heading 2')]">
<Subject>
<xsl:value-of select="." />
<xsl:text>
</xsl:text>
<xsl:variable name="p-id"
select="generate-id()"/>
<content>
<xsl:for-each
select="following-sibling::p[generate-id(preceding-sibling::p[starts-with(.
, 'Heading')][1]) = $p-id][not(starts-with(.,
'Heading'))]"> <xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</content>
</Subject>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Regards,
Mukul
--- Myron Bennet <vbj34@xxxxxxxxx> wrote:
> Hello,
>
> I am using XSL to extract text from HTML pages into
> XML. I get all the text between predefined delimiter
> keywords such as Heading 1 and Heading 2. The
> problem
> I am having is the template continues matching past
> the delimiter keywords (For example I want to match
> between Headings 1 and 2 only, but the template
> matches between Headings 1-2 plus everything else
> after Heading 2). Example input/output and the
> recursive template I use are shown below. I would
> appreciate any input on this. Thanks.
>
>
> INPUT HTML:
>
> <p>Heading 1</p>
> <p>bbb</p>
> <p>aaa</p>
> <p>Heading 2</p>
> <p>aaa</p>
> <p>ccc</p>
> <p>Heading 3</p>
> ...
>
>
> OUTPUT XML:
>
> <Subject>
> Heading 1
> <content>
> bbb
> aaa
> </content>
> </Subject>
> <Subject>
> Heading 2
> <content>
> aaa
> ccc
> </content>
> </Subject>
> <Subject>
> Heading 3
> <content>
> ...
> </content>
> </Subject>
>
>
> RECURSIVE TEMPLATE:
> <xsl:template
>
match="//p[starts-with(normalize-space(.),'Heading')]">
> <Subject>
> <xsl:value-of select="."/>
> <content>
> <xsl:variable name="next"
>
select="following-sibling::*[not(starts-with(normalize-space(.),
> 'Heading'))]"/>
> <xsl:if test="$next">
> <xsl:apply-templates select="$next"
> mode="getContent"
> />
> </xsl:if>
> </content>
> </Subject>
> </xsl:template>
>
> <xsl:template name="getContent">
> <xsl:value-of select="."/>
> <xsl:variable name="next"
>
select="following-sibling::*[not(starts-with(normalize-space(.),
> 'Heading'))]"/>
> <xsl:if test="$next">
> <xsl:apply-templates select="$next"
> mode="getContent"
> />
> </xsl:if>
> </xsl:template>
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








