[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Changing a from unstructured HTML to XML

Subject: Re: Changing a from unstructured HTML to XML
From: Martin Honnen <Martin.Honnen@xxxxxx>
Date: Tue, 21 Sep 2010 15:29:44 +0200
Re:  Changing a from unstructured HTML to XML
Evan Leibovitch wrote:

I am working with an HTML input file, and I'd like to group things
better by sections (ultimately, with the intent of using
xml:result-document to create a new file for each section).

What I have is not uncommon:

<h1 class="section">Section Name</h1>
<h1 class="headline">Headline name</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 2</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 3</h1>
[... assorted HTML marked up text ...]
<h1 class="section">Section 2</h1>
<h1 class="headline">Headline 4</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 5</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 6</h1>
[... assorted HTML marked up text ...]

and so on.

What I'd like to end up with is, if possible

<section id="Section Name">
  <headline id="Headline ">
     [...marked up text...]
  </headline id="Headline 2">
  <headline>
     [...marked up text...]
   </headline>
  <headline id="Headline 3">
     [...marked up text...]
   </headline>
</section>

XSLT 2.0 and group-starting-with could do that e.g.


<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

<xsl:output method="xml" indent="yes" version="1.0"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

<xsl:template match="body">
<xsl:copy>
<xsl:for-each-group select="node()" group-starting-with="h1[@class = 'section']">
<xsl:if test="self::h1[@class = 'section']">
<section id="{.}">
<xsl:for-each-group select="current-group() except ." group-starting-with="h1[@class = 'headline']">
<xsl:if test="self::h1[@class = 'headline']">
<headline id="{.}">
<xsl:apply-templates select="current-group() except ."/>
</headline>
</xsl:if>
</xsl:for-each-group>
</section>
</xsl:if>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>


</xsl:stylesheet>

will turn

<body>
<h1 class="section">Section Name</h1>
<h1 class="headline">Headline name</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 2</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 3</h1>
[... assorted HTML marked up text ...]
<h1 class="section">Section 2</h1>
<h1 class="headline">Headline 4</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 5</h1>
[... assorted HTML marked up text ...]
<h1 class="headline">Headline 6</h1>
[... assorted HTML marked up text ...]
</body>

into

<body>
   <section id="Section Name">
      <headline id="Headline name">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 2">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 3">
[... assorted HTML marked up text ...]
</headline>
   </section>
   <section id="Section 2">
      <headline id="Headline 4">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 5">
[... assorted HTML marked up text ...]
</headline>
      <headline id="Headline 6">
[... assorted HTML marked up text ...]
</headline>
   </section>
</body>


--


	Martin Honnen
	http://msmvps.com/blogs/martin_honnen/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.