[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XSLT match with regex what's the best current solu
Steven Noels wrote:
as you can read in the regular expression thread http://www.biglist.com/lists/xsl-list/archives/200201/msg00488.html and further on, we are working on a tool which might be helpful for your purposes. Steven, thanks for your hint to the thread. I knew there was something. But I'm really looking for something that is XSLT (extension) rather than just 'like' xslt. Of course I would not use regexes to match for XML/HTML tags, but I would want to use regexes in template match tests. My model is really AWK, and it's interesting how the prinicple approach of XSLT is very similar to AWK (forget about global variables and the sequential flow of AWK rules for a moment.) A common example would be: Some heading, with subphrases: An item without a bullet. Name = value pair. Property: value. Score = 7 (a = 1, b =3, c = 4). A full sentence that has so many words that it spans multiple lines. Sometimes we can't even trust whether people get the indention consistent. This should be marked up as <entry> <heading> Some heading <subheading>with subheading:</subheading> </heading> <item> <heading>An item without a bullet.</heading> <pair name='name' value='value pair.'/> <pair name='property' value='value.'/> <pair name='Score' value='7'> <pair name='a' value='1'/> <pair name='b' value='3'/> <pair name='c' value='4'/> </pair> <sentence>A full sentence that has so many words that it spans multiple lines.</sentence> <sentence>Sometimes we can't even trust whether people get the indention consistent.</sentence> </item> </entry> In AWK I have a line-by-line matching (or whatever RS is set to) find the colons and indentions. I guess I could prime the XSLT process by sed-ing record terminators as: <rec>Some heading, with subphrases:</rec> <rec> An item without a bullet.</rec> <rec> Name = value pair.</rec> <rec> Property: value.</rec> <rec> Score = 7 (a = 1, b =3, c = 4).</rec> <rec> A full sentence that has so many words that it spans </rec> <rec> multiple lines.</rec> <rec> Sometimes we can't even trust whether people get the</rec> <rec>indention consistent.</rec> And then I could use templates just like AWK rules: <xsl:template match="rec[regex:test(text(),'^\(.+\), \(.+\):$')]"> <entry> <heading> <xsl:value-of select='$1'/> <subheading><xsl:value-of select='$2'/></subheading> </heading> <xsl:apply-templates> </entry> </xsl:template> O.K. that wouldn't work because the apply-templates thing could not go beyond the first line to match stuff into the content of the entry element that I just synthesized. So, I guess may be I'm on the completely wrong track now. May be the initial <rec> elements put me on the wrong track. But if you have read this to this point you might see what I'm doing wrong. May be I should just stick with AWK, or may be I should do some call-out of XSLT to AWK. What I want to do is incremental structure induction, i.e. the first run might only find the entries (e.g., blocks separated by blank lines), the next run would find the items, the next run would find the pairs, and the next pairs in the parenthese of the pairs value, etc. So, the AWK-callouts would work on the text nodes of certain elements (beginning with the full file, then with the entries, items, pairs, etc.) any more ideas appreciated, -Gunther
-- Gunther Schadow, M.D., Ph.D. gschadow@xxxxxxxxxxxxxxx Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|