[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XSLT match with regex what's the best current solu

Subject: Re: XSLT match with regex what's the best current solution?
From: Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Jan 2002 22:11:34 -0500
xslt match
Steven Noels wrote:

as you can read in the regular expression thread
http://www.biglist.com/lists/xsl-list/archives/200201/msg00488.html and
further on, we are working on a tool which might be helpful for your
purposes.

It is some mixture between regexes and an XSLT-like language, and we
have called it regexslt. ...


Steven, thanks for your hint to the thread. I knew there was something.
But I'm really looking for something that is XSLT (extension) rather
than just 'like' xslt.

Of course I would not use regexes to match for XML/HTML tags, but
I would want to use regexes in template match tests.

My model is really AWK, and it's interesting how the prinicple
approach of XSLT is very similar to AWK (forget about global
variables and the sequential flow of AWK rules for a moment.)

A common example would be:

Some heading, with subphrases:
  An item without a bullet.
    Name = value pair.
    Property: value.
    Score = 7 (a = 1, b =3, c = 4).
    A full sentence that has so many words that it spans
        multiple lines.
    Sometimes we can't even trust whether people get the
indention consistent.

This should be marked up as

<entry>
  <heading>
     Some heading
     <subheading>with subheading:</subheading>
  </heading>
  <item>
    <heading>An item without a bullet.</heading>
       <pair name='name' value='value pair.'/>
       <pair name='property' value='value.'/>
       <pair name='Score' value='7'>
           <pair name='a' value='1'/>
           <pair name='b' value='3'/>
           <pair name='c' value='4'/>
       </pair>
       <sentence>A full sentence that has so many words that it spans
        multiple lines.</sentence>
       <sentence>Sometimes we can't even trust whether people get the
indention consistent.</sentence>
  </item>
</entry>

In AWK I have a line-by-line matching (or whatever RS is set to)
find the colons and indentions. I guess I could prime the XSLT
process by sed-ing record terminators as:

<rec>Some heading, with subphrases:</rec>
<rec>  An item without a bullet.</rec>
<rec>    Name = value pair.</rec>
<rec>    Property: value.</rec>
<rec>    Score = 7 (a = 1, b =3, c = 4).</rec>
<rec>    A full sentence that has so many words that it spans </rec>
<rec>        multiple lines.</rec>
<rec>    Sometimes we can't even trust whether people get the</rec>
<rec>indention consistent.</rec>

And then I could use templates just like AWK rules:

<xsl:template match="rec[regex:test(text(),'^\(.+\), \(.+\):$')]">
  <entry>
     <heading>
        <xsl:value-of select='$1'/>
        <subheading><xsl:value-of select='$2'/></subheading>
     </heading>
     <xsl:apply-templates>
  </entry>
</xsl:template>

O.K. that wouldn't work because the apply-templates thing could not
go beyond the first line to match stuff into the content of the
entry element that I just synthesized. So, I guess may be I'm on
the completely wrong track now.

May be the initial <rec> elements put me on the wrong track. But
if you have read this to this point you might see what I'm
doing wrong. May be I should just stick with AWK, or may be I
should do some call-out of XSLT to AWK.

What I want to do is incremental structure induction, i.e. the
first run might only find the entries (e.g., blocks separated
by blank lines), the next run would find the items, the next
run would find the pairs, and the next pairs in the parenthese
of the pairs value, etc.

So, the AWK-callouts would work on the text nodes of certain
elements (beginning with the full file, then with the entries,
items, pairs, etc.)

any more ideas appreciated,
-Gunther


PS: I looked at this OmniMark thing, and I'm a bit turned away by how different it is from anything that we know from sed, awk, perl, lex, yacc, etc.

--
Gunther Schadow, M.D., Ph.D.                    gschadow@xxxxxxxxxxxxxxx
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.