[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How to copy attribute value to text? (Suspected bu

Subject: Re: How to copy attribute value to text? (Suspected bug involving supplementary characters)
From: "Kenneth Reid Beesley krbeesley@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Jul 2016 18:54:30 -0000
Re:  How to copy attribute value to text? (Suspected bu
From: Kenneth Reid Beesley <krbeesley@xxxxxxxxx>
Subject: Re: [XSL-List: The Open Forum on XSL] Digest for 2016-07-06
Date: July 7, 2016 at 12:43:54 PM EDT
To: "XSL-List: The Open Forum on XSL"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>


Many thanks to Martin Honnen for his response below.  I add more comments
below (suspected bug in Saxon).


> On 7Jul2016, at 05:28, XSL-List: The Open Forum on XSL
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
>
> From: Martin Honnen <martin.honnen@xxxxxx <mailto:martin.honnen@xxxxxx>>
> Subject: Re:  How to copy attribute value to text?
> Date: 7 July 2016 at 00:43:37 MDT
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
<mailto:xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
>
>
> On 07.07.2016 07:22, Kenneth Reid Beesley krbeesley@xxxxxxxxx
<mailto:krbeesley@xxxxxxxxx> wrote:
>> If I start with an input XML document that contains mixed text with <word>
elements like this:
>>
>> 	b& this is just <word correction=btoob>to</word> funny
>>
>> Ibd like to write an XSLT stylesheet that yields as output
>>
>> 	b& this is just <word origerror=btob>too</word> funny
>>
>> So in the output I effectively want (in the same <word> element) to
>>
>> 	1.  Set the value of a new attribute to the original text() value, and
>> 	2.  Reset the text() value to be the value of the original @correction
attribute
>>
>> Ibve tried many variants of the following, so far without success.  Ibm
using SaxonHE9-7-0-6J;
>> it runs, but the results are not as expected/hoped.
>
>> Ibve tried matching the text() in a separate template, but I canbt seem
to reference the attribute values of the parent node (i.e., <word>) of the
text() and the parent nodebs attributes.  E.g, the following doesnbt work
for me, failing somehow in the
>> select=b../@correctionb  reference.
>>
>> <xsl:template match=bword[@correction]/text()b>
>> 	<xsl:value-of select=b../@correctionb/>
>> </xsl:template>
>
>
> You can use
>
> 	<xsl:template match="@* | node()">
> 		<xsl:copy>
> 			<xsl:apply-templates select="@* | node()"/>
> 		</xsl:copy>
> 	</xsl:template>
>
> 	<xsl:template match="word[@correction]/text()">
> 		<xsl:value-of select="../@correction"/>
> 	</xsl:template>
>
> 	<xsl:template match="word/@correction">
> 		<xsl:attribute name="origerror" select=".."/>
> 	</xsl:template>

Your solution looks perfect and appears to work perfectly for ASCII-based XML
input examples like the following

<?xml version="1.0" encoding="UTF-8"?>

<foo>
  <bar>this is just <word correction="too">to</word> funny</bar>
</foo>

yielding the correct/desired output

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar>this is just <word origerror="to">too</word> funny</bar>
</foo>


I now see that some of my own attempts also worked, on the same ASCII-based
example.

*****  Suspected bug involving supplementary characters *****

But my real task involves an input XML document, in UTF-8 encoding, that
consists of Deseret Alphabet characters, which are encoded in the
supplementary area.  In such a case, the resulting text content in the <word>
element, copied from an original attribute value, is corrupted.  I saw such
corruption in my own attempts, and couldnbt understand what was happening.

Using the following input document (the Deseret Alphabet characters may not
display correctly for you)

<?xml version="1.0" encoding="UTF-8"?>

<foo>
  <bar>pp.p p.p p>p2pp; <word
correction="p;p-">pp/p	p.</word> pp2pp.</bar>
</foo>

the output, using your script, is corrupted.  The text() value in the output
is not the same as the original @correction value.  Extra characters (just one
in this case) are inserted.  The longer the original attribute value, the more
extra characters are inserted.

<?xml version="1.0" encoding="UTF-8"?>
<foo>
  <bar>pp.p p.p p>p2pp; <word
origerror="pp/p	p.">p;p;p-</word> pp2pp.</bar>
</foo>

This kind of corruption is exactly what I was seeing using my own scripts,
leading me to bother the group.

I suspect a bug in the XSLT engine involving supplementary characters.  Again,
Ibm using SaxonHE9-7-0-6J.

Whatbs my next step?

Thanks,

Ken

********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA










********************************
Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054
USA

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.