[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Implementation Advice: Grouping Strings by Charact

Subject: Re: Implementation Advice: Grouping Strings by Character Range in XSLT 2
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 29 Apr 2016 18:38:05 -0000
Re:  Implementation Advice: Grouping Strings by Charact
I have my generated analyze-text approach working generally. However, some
of my regular expressions are not matching when I would expect them to.

For example, given this @regex value:

        
regex="'([&#xa9;&#xae;&#x2120;&#x2122;]+)|([&#xa6;&#xb2;&#xb3;&#xb9;&#xbc;&
#xbd;&#xbe;&#xd0;&#xd7;&#xdd;&#xde;&#xf0;&#xfd;&#xfe;&#x160;&#x161;&#x2202;
&#x220f;&#x2211;&#x2212;&#x222b;&#x2260;&#x2264;&#x2265;]+)|([&#x27a4;]+)'"
>

And this text:

"&#x00A9;&#x00AE;"

The regular expression does not match, even though the first group clearly
matches on \uA9 and \uAE.


However, this text:

"&#x00DD;&#x00DE;" 

does match (second group).

If I copy the entire regex or any group from the @regex value and try it
in Oxygen against the same text I get the expected matches.

Have I made a stupid syntax mistake in my regular expression? Is there
some subtlety to matching groups that makes XSLT different from what
Oxygen is doing? I can't see any obvious syntax error in the regular
expression.

Thanks,

Eliot


----
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com




On 4/29/16, 11:54 AM, "Eliot Kimber ekimber@xxxxxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

>Dimitre,
>
>I see how that can work.
>
>Cheers,
>
>E.
>----
>Eliot Kimber, Owner
>Contrext, LLC
>http://contrext.com
>
>
>
>
>On 4/29/16, 11:38 AM, "Dimitre Novatchev dnovatchev@xxxxxxxxx"
><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>I am at work and don't have the time for a complete/tested
>>implementation, but one can use the function string-to-codepoints()
>>and then perform on the result:
>>
>><xsl:for-each-group select="$theCodepoints"
>>group-adjacent=f:codepointToRange(.)>
>>
>> . . . . . . . .
>></xsl:for-each-group>
>>
>>Cheers,
>>Dimitre
>>
>>On Fri, Apr 29, 2016 at 8:04 AM, Eliot Kimber ekimber@xxxxxxxxxxxx
>><xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>> Using XSLT 2, I have a requirement to take text and group contiguous
>>> sequences of characters in markup according to a given character range
>>>the
>>> characters are in. This is to support the application of range-specific
>>> fonts to text in HTML.
>>>
>>> I have a static definition of the character ranges for a given national
>>> language and there shouldn't be any overlap between ranges. Given this
>>> static definition, I'm generating XSLT code to operate on text nodes in
>>> order to apply the range markup. The
>>>
>>> For example, given the text string "abcdefg" where range "R1" is "cde"
>>>and
>>> R2 is "g", the marked up result should be: abc<span
>>> class="R1">cde</span>f<span class="R2">g</span>
>>>
>>> My initial approach is to generate a template that takes the current
>>> language and the text node and then applies templates in a
>>> language-specific mode.
>>>
>>> For each language I'm then generating a template to do the range
>>>matching.
>>>
>>> My question, once I'm in a language-specific template for a text node,
>>> what is the most efficient and/or easiest to code way to map the string
>>>to
>>> ranges? Since I'm generating the code it doesn't have to be concise.
>>>
>>> I'm thinking along the lines of using analyze-string to match on any of
>>> the groups and then within the matching-substring clause have a choice
>>> group to determine which range actually matched. But it feels like I'm
>>> missing a more elegant way to determine the actual range.
>>>
>>> Or maybe there's a clearer/simpler/more efficient way using tail
>>>recursion?
>>>
>>> Thanks,
>>>
>>> Eliot
>>> ----
>>> Eliot Kimber, Owner
>>> Contrext, LLC
>>> http://contrext.com
>>>
>>> 
>>
>>
>>
>>-- 
>>Cheers,
>>Dimitre Novatchev
>>---------------------------------------
>>Truly great madness cannot be achieved without significant intelligence.
>>---------------------------------------
>>To invent, you need a good imagination and a pile of junk
>>-------------------------------------
>>Never fight an inanimate object
>>-------------------------------------
>>To avoid situations in which you might make mistakes may be the
>>biggest mistake of all
>>------------------------------------
>>Quality means doing it right when no one is looking.
>>-------------------------------------
>>You've achieved success in your field when you don't know whether what
>>you're doing is work or play
>>-------------------------------------
>>To achieve the impossible dream, try going to sleep.
>>-------------------------------------
>>Facts do not cease to exist because they are ignored.
>>-------------------------------------
>>Typing monkeys will write all Shakespeare's works in 200yrs.Will they
>>write all patents, too? :)
>>-------------------------------------
>>Sanity is madness put to good use.
>>-------------------------------------
>>I finally figured out the only reason to be alive is to enjoy it.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.