[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why does the tokenize() function behave strangely

Subject: Re: Why does the tokenize() function behave strangely when I use ENTITIES and variables?
From: "G. Ken Holman g.ken.holman@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 7 Apr 2016 14:20:40 -0000
Re:  Why does the tokenize() function behave strangely
At 2016-04-07 13:40 +0000, Costello, Roger L. costello@xxxxxxxxx wrote:
I have a stylesheet which reads a text file and tokenizes it. The token delimiter is two consecutive newline characters (hex 0A, hex 0A).

If I use the tokenize() function like this:

tokenize($text-file, '&#x0A;&#x0A;')

then the text file is correctly tokenized.

But if I create an entity:

<!DOCTYPE xsl:stylesheet [
    <!ENTITY line-separator     '&#x0A;'>

and a variable whose value is two line-separators:

<xsl:variable name="rule-separator" select="'&line-separator;&line-separator;'"/>

and then use the variable with the tokenize() function:

tokenize($text-file, $rule-separator)

then the text file is not tokenized correctly. Specifically, the XSLT processor uses two consecutive space characters (hex 20, hex 20) as the token delimiter rather than two consecutive newline characters (hex 0A, hex 0A) as the token delimiter.

Do you know why this is happening?

Attribute value normalization:

  Subsection 3, bullet 1 states that a character reference is
  Subsection 3, bullet 3 states that any white-space character
  found in the attribute value is normalized to a space.

The numeric character reference in your first example is simply appended. The expansion value of the entity reference in your second example is a white-space character and so it does get normalized.

How do I fix it?

There is no way to preserve a numeric character reference in an entity in its value:

  "An entity reference refers to the *content* of a named entity."
  (my emphasis)

But, you can encode the string that needs to be decoded in order to solve your problem:

t:\>type ent.xsl
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
    <!ENTITY line-separator1 '&#x0A;'>
    <!ENTITY line-separator2 '&#38;#x0A;'>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:value-of select="'1=',string-to-codepoints('&line-separator1;'),

t:\>xslt2 ent.xsl ent.xsl
1= 32
2= 10
3= 10

The content is parsed creating the sequence you need.

I hope this helps.

. . . . . . Ken

Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training @US$45: http://goo.gl/Dd9qBK |
Crane Softwrights Ltd. _ _ _ _ _ _ http://www.CraneSoftwrights.com/s/ |
G Ken Holman _ _ _ _ _ _ _ _ _ _ mailto:gkholman@xxxxxxxxxxxxxxxxxxxx |
Google+ blog _ _ _ _ _ http://plus.google.com/+GKenHolman-Crane/posts |
Legal business disclaimers: _ _ http://www.CraneSoftwrights.com/legal |

--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.