[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: tokenize
At 2011-10-14 13:51 +0100, Peter Flynn wrote:
It's either my brain slowing down, or the fact that it's nearly the weekend, or my lack of sleep and coffee, but I can't understand this: I need to break up the content of a td element which represents a Unix filepath, tokenizing on slashes, and getting rid of bogus visual formatting: The above could simply be: <xsl:variable name="uri" select="translate(h:td[@class='x1'],' 
','')"/> ... because you were creating a temporary tree of a root node and a text node when all you need is a string, thus needing only to use the select= on the <xsl:variable>. <xsl:variable name="urifrag" select="tokenize($uri,'/')"/> <xsl:text>"</xsl:text> <xsl:value-of select="$urifrag[1]"/> <xsl:text>" </xsl:text> <xsl:text>
</xsl:text> ... </xsl:template> That surprises me ... I would have expected "" because tokenize produces an empty string in front of the first "/". If you look on pages 300 and 303 of my XSLT book here you will see that tokenize() produces a non-matching substring before the first match: http://www.CraneSoftwrights.com/training/#ptux When covering this in the classroom, I have to point out the nuance of the first non-matching string. Here is an example from page 303: tokenize(" a ","\s+") produces the three strings "", "a", "" This is illustrated by doing the following with your string: ~/t/ftemp $ cat peter.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="/"> <xsl:variable name="in"> <xsl:value-of select="' 

 /researchprofiles/A015/pcrowley/'"/> </xsl:variable> <xsl:variable name="uri" select="translate($in,' 
','')"/> <xsl:variable name="urifrag" select="tokenize($uri,'/')"/> Tokens: <xsl:for-each select="$urifrag"> <xsl:value-of select="concat('*',.,'* ')"/> </xsl:for-each> End </xsl:template> </xsl:stylesheet> ~/t/ftemp $ xslt2 peter.xsl peter.xsl <?xml version="1.0" encoding="UTF-8"?> Tokens: ** *researchprofiles* *A015* *pcrowley* ** End ~/t/ftemp $ In other words, not only has it not tokenized the string, but something has gobbled the trailing slash from the input content. I suspected that there was some character encoding error (slashes except the final one not being real slashes, perhaps) but they are all genuine. You don't say which processor you are using ... I'm using Saxon above. I have clearly misunderstood how tokenize works (except that I have been using it perfectly happily elsewhere for years). The variable $urifrag seems to be returning the entire string rather than breaking it up, except for the trailing slash, which means it is actually splitting the string on its final slash only, instead of on all slashes. I can't see it either because I cannot reproduce your results. Even when I use the wasteful tree version of your variable I get the same results. Please try the above stylesheet in your environment and see if you get the same results. I hope this helps. . . . . . . . . . . . Ken -- Contact us for world-wide XML consulting and instructor-led training Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Google+ profile: https://plus.google.com/116832879756988317389/about Legal business disclaimers: http://www.CraneSoftwrights.com/legal
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|