Subject: Re: Move elements to preceding parent
From: Israel Viente <israel.viente@xxxxxxxxx>
Date: Mon, 15 Jun 2009 15:59:37 +0300
|
Hi Martin,
Thank you for this. It looks very elegant.
Can you please explain the idea of the line:
> <xsl:template match="p[preceding-sibling::p[1][span[@class ne 'chapter']
> and not(matches(span[@class ne 'chapter'][last()], '[.?"!]$'))]]"/>
Does it remove the p that has preceding sibling with no ending
character at the end of the last span?
I tried it with a more complete example like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml;
charset=utf-8"/>
<title/>
<link href="test.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<p dir="rtl">
<span class="chapter">line1</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line3.</span>
<span class="italic">line4</span>
<span class="regular">line5."</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line6.</span>
<br />
<span class="regular">line7</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line8.</span>
<span class="regular">line9.</span>
</p>
</body>
</html>
The output was:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en"
version="-//W3C//DTD XHTML 1.1//EN">
<head profile="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
<link href="test.css" rel="stylesheet" type="text/css"
xml:space="preserve" />
</head>
<body xml:space="preserve">
<p dir="rtl" xml:space="preserve">
<span class="chapter" xml:space="preserve">line1</span>
</p>
<p dir="rtl" xml:space="preserve"> <br xml:space="preserve" />
<span class="regular" xml:space="preserve">line3.</span>
<span class="italic" xml:space="preserve">line4</span>
<span class="regular" xml:space="preserve">line5."</span>
</p>
<p dir="rtl" xml:space="preserve"> <br xml:space="preserve" />
<span class="regular" xml:space="preserve">line6.</span>
<br xml:space="preserve" />
<span class="regular" xml:space="preserve">line7</span>
<br xml:space="preserve" />
<span class="regular" xml:space="preserve">line8.</span>
<span class="regular" xml:space="preserve">line9.</span>
</p>
</body>
</html>
How can I remove the following:
1. extra xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and
version="-//W3C//DTD XHTML 1.1//EN" inside html element.
2. extra profile="" in head element
3. extra xml:space="preserve" in p, span and br elements.
Thanks, Viente
On Sun, Jun 14, 2009 at 6:50 PM, Martin Honnen<Martin.Honnen@xxxxxx> wrote:
> Israel Viente wrote:
>
>> My input is something like the following:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <body>
>> <p dir="rtl">
>> <span class="chapter">line1</span>
>> </p>
>> <p dir="rtl"> <br />
>> <span class="regular">line3.</span>
>> <span class="italic">line4</span>
>> <span class="regular">line5."</span>
>> </p>
>> <p dir="rtl"> <br />
>> <span class="regular">line6.</span>
>> <br />
>> <span class="regular">line7</span>
>> </p>
>> <p dir="rtl"> <br />
>> <span class="regular">line8.</span>
>> <span class="regular">line9.</span>
>> </p>
>> </body>
>> </html>
>>
>>
>> The reault output should be:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>> <html xmlns="http://www.w3.org/1999/xhtml">
>> <body>
>> <p dir="rtl">
>> <span class="chapter">line1</span>
>> </p>
>> <p dir="rtl"> <br />
>> <span class="regular">line3.</span>
>> <span class="italic">line4</span>
>> <span class="regular">line5."</span>
>> </p>
>> <p dir="rtl"> <br />
>> <span class="regular">line6.</span>
>> <br />
>> <span class="regular">line7</span>
>> <span class="regular">line8.</span>
>> <span class="regular">line9.</span>
>> </p>
>> </body>
>> </html>
>>
>> For every span element that the class<>'chapter' verify that in every
>> p the last span element text ends with one character of .?"!
>> (paragraph ending char).
>> If it does, copy as is to the output.
>> Otherwise: Move the span elements from the next p to the current one
>> and remove the next p completely.
>
> Here is an attempt at solving that with XSLT 2.0:
>
> <xsl:stylesheet
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> xpath-default-namespace="http://www.w3.org/1999/xhtml"
> version="2.0">
>
> <xsl:output method="xhtml"/>
>
> <xsl:template match="@* | node()">
> <xsl:copy>
> <xsl:apply-templates select="@* | node()"/>
> </xsl:copy>
> </xsl:template>
>
> <xsl:template match="p[span[@class ne 'chapter'] and
> not(matches(span[@class ne 'chapter'][last()], '[.?"!]$'))]">
> <xsl:copy>
> <xsl:apply-templates select="@* | node() |
> following-sibling::p[1]/node()"/>
> </xsl:copy>
> </xsl:template>
>
> <xsl:template match="p[preceding-sibling::p[1][span[@class ne 'chapter']
> and not(matches(span[@class ne 'chapter'][last()], '[.?"!]$'))]]"/>
>
> </xsl:stylesheet>
>
> For the posted input using Saxon 9 it produces the described output but I
> have not tested with other inputs.
>
> --
>
> Martin Honnen
> http://msmvps.com/blogs/martin_honnen/
|