[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Seeking a smarter tokenize for augmented text

Subject: Re: Seeking a smarter tokenize for augmented text
From: "Trevor Nicholls trevor@xxxxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 7 May 2021 09:41:15 -0000
Re:  Seeking a smarter tokenize for augmented text
I have made some progress on this, not to a working point yet but I'm more
confident than I was, so thanks to all for the suggestions which have been
helpful. I also found some hints in a stackoverflow answer of Martin Honnen's
which reinforced the advice to work on this by adding a line marker element
and using grouping.

The original statement of the requirement was a bit vague, and the content
model currently in use is a bit too flexible. So I think I can stipulate that
inline elements will not run across line breaks (and if they do I should be
able to run a pre-fix which splits them), nor will the content include any
nested inline elements.

At the moment I'm assuming that in the step where I insert line marker
elements, I also have to use modal templates to insert inline element markers,
then run another pass to restore the inline elements. Something like this,
correct?

  <xsl:variable name="brokenlines">
    <xsl:element name="textlines">
      <xsl:element name="linemarker"/>
      <xsl:analyze-string select="." regex="(\r\n?|\n\r?)">
        <xsl:matching-substring>
          <xsl:element name="linemarker"/>
        </xsl:matching-substring>
        <xsl:non-matching-substring>
          <xsl:apply-templates mode="break"/>
        </xsl:non-matching-substring>
      </xsl:analyze-string>
    <xsl:element>
  </xsl:variable>
  <xsl:variable name="textlines">
    <xsl:call-template name="rebuild">
      <xsl:with-param name="lines" select="$brokenlines"/>
    </xsl:call-template>
  <xsl:variable>
  <-- $textlines/textlines is now the original textlines with line children
-->
  ...

  <xsl:template match="textlines/*/text()" mode="break">
    <xsl:value-of select="concat('[[{', name(..), '}', ., ']]')" />
  </xsl:template>

  <xsl:template name="rebuild">
    <xsl:param name="lines" as="document-node()" />
    <xsl:element name="textlines">
      <xsl:for-each select="$lines/textlines">
        <xsl:for-each-group select="node()" group-starting-with="linemarker">
          <xsl:element name="line">
            <xsl:apply-templates
select="current-group()[not(self::linemarker)]" mode="rebuild" />
          </xsl:element>
        </xsl:for-each-group>
      </xsl:element>
    </xsl:template>

  <xsl:template match="text()" mode="rebuild">
    <xsl:analyze-string select="." regex="something matching
[[{name}content]]">
      <xsl:matching-substring>
        <xsl:element name="the name in the regex">
          the content in the regex
        </xsl:element>
      </xsl:matching-substring>
      <xsl:non-matching-substring>
        <xsl-value-of select="." />
      </xsl:non-matching-substring>
    </xsl:analyze-string>
  </xsl:template>

Am I going along the right lines? I'd prefer to be set straight sooner rather
than later!

Cheers
T

-----Original Message-----
From: Michael MC<ller-Hillebrand mmh@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Friday, 7 May 2021 20:26
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re:  Seeking a smarter tokenize for augmented text

Hi,

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.