Re: Why does the tokenize() function behave strangely

Play the video

Subject: Re: Why does the tokenize() function behave strangely when I use ENTITIES and variables?
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Apr 2016 14:02:16 -0000

It's called attribute value normalization, and is described in the XML
specification. It's of the bizarreness of XML not being able to define
consistently whether and when whitespace is significant. If you write a
newline character entity explicitly in an attribute value, then it decides you
probably intended it, but if a newline gets in there by a expanding an entity
reference, it decides that you probably didn't.

When I do this kind of thing I'm increasingly inclined to use
codepoints-to-string():

> <xsl:variable name="rule-separator" select="'codepoints-to-string((10,
10))"/>

That's much more robust against entity-expansion and transcoding glitches.

Michael Kay
Saxonica

> On 7 Apr 2016, at 14:40, Costello, Roger L. costello@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Folks,
>
> I have a stylesheet which reads a text file and tokenizes it. The token
delimiter is two consecutive newline characters (hex 0A, hex 0A).
>
> If I use the tokenize() function like this:
>
> 	tokenize($text-file, '&#x0A;&#x0A;')
>
> then the text file is correctly tokenized.
>
> But if I create an entity:
>
> <!DOCTYPE xsl:stylesheet [
>    <!ENTITY line-separator     '&#x0A;'>
> ]>
>
> and a variable whose value is two line-separators:
>
> <xsl:variable name="rule-separator"
select="'&line-separator;&line-separator;'"/>
>
> and then use the variable with the tokenize() function:
>
> 	tokenize($text-file, $rule-separator)
>
> then the text file is not tokenized correctly. Specifically, the XSLT
processor uses two consecutive space characters (hex 20, hex 20) as the token
delimiter rather than two consecutive newline characters (hex 0A, hex 0A) as
the token delimiter.
>
> Do you know why this is happening? How do I fix it?
>
> /Roger

Current Thread
Why does the tokenize() function behave strangely when I use ENTITIES and variables? Costello, Roger L. costello@xxxxxxxxx - 7 Apr 2016 13:40:10 -0000 Michael Kay mike@xxxxxxxxxxxx - 7 Apr 2016 14:02:16 -0000 <= G. Ken Holman g.ken.holman@xxxxxxxxx - 7 Apr 2016 14:20:40 -0000

<- Previous	Index	Next ->
Why does the tokenize() funct, Costello, Roger L. c	Thread	Re: Why does the tokenize() f, G. Ken Holman g.ken.
Why does the tokenize() funct, Costello, Roger L. c	Date	Re: Why does the tokenize() f, G. Ken Holman g.ken.
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >