On Mon, Nov 23, 2015 at 07:04:32PM -0000, Rick Quatro rick@xxxxxxxxxxxxxx scripsit:
> I have a series of strings that I need to split if they are longer than a
> particular length, say 30 characters. But I need to make the split at the
> previous space. Here is an example string:
>
> This is a long line that I want to split at a space.
>
> The 30th character is in the middle of a word, so I need to do the split at
> the previous space. I am using XSLT/XPath 2.0. I am having trouble
> developing a good algorithm for this. Any pointers would be appreciated.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
exclude-result-prefixes="xs xd" version="2.0">
<xsl:output method="text" />
<xsl:variable name="input">I am a large string which needs to be broken at the last space on or before character thirty-one</xsl:variable>
<xsl:template match="/">
<xsl:variable name="cutLength" select="30" />
<xsl:variable name="tokens" select="tokenize($input, '\p{Zs}')" />
<!-- \p{Zs} because someone might have provided an unusual space -->
<xsl:variable as="element(bucket)" name="candidates">
<!-- we can't use one sequence for this and 2.0 hasn't got maps or arrays -->
<bucket>
<xsl:for-each select="1 to count($tokens)">
<candidate>
<xsl:value-of select="string-join($tokens[position() le current()], ' ')" />
</candidate>
</xsl:for-each>
</bucket>
</xsl:variable>
<xsl:value-of select="$candidates/candidate[string-length() le $cutLength][last()]" />
</xsl:template>
</xsl:stylesheet>
returns
"I am a large string which"
It's not as compact as the regexp solution from David Carlisle and it's asking a lot of the optimizer if it's a really, really big input line. The pattern does generalize fairly well for making substrings from rules, rather than character positions.
-- Graydon
|