[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: How to parse text into words, phrases, clauses, s

Subject: RE: How to parse text into words, phrases, clauses, sentences, and paragraphs
From: mark bordelon <markcbordelon@xxxxxxxxx>
Date: Thu, 7 Jun 2007 07:04:41 -0700 (PDT)
RE:  How to parse text into words
--- Michael Kay <mike@xxxxxxxxxxxx> wrote:
> You don't really make it clear where you are having
> difficulty. There seem
> to be four separate problems here:

Mike, Thanks for helping me even break this down. THis
is definitely something I can and want to do myself.
Just need the initial hints.

> (a) translating your concepts, such as "words" and
> "sentences" into precise
> specifications
> (b) translating these specifications into regular
> expressions

Got these. 
E.g. the specification for "word" could be [^ '-]*

> 
> (c) using these regular expressions within a
> stylesheet, for example as an
> argument to the tokenize() function or the
> xsl:analyze-string instruction.
> 

This is my first problem. How to apply a template
match ysing the tokenize() function. And which order
to apply (from paragraph -> word or word ->
paragraph).

> (d) doing the output numbering.

I haven't a clue how this would be done, either way.

> 
> The fourth problem seems quite unrelated to the
> others. Of the other three,
> I'm reluctant to launch into answering without
> knowing which of the three
> steps you need help with. (Generally I think most
> people answering on this
> list adopt the approach of trying to help you solve
> your problem, rather
> than doing the work for you.)

After any initial hints, I would and could be able to
do the rest of the work myself.

> 
> Incidentally, regular expressions are an XSLT 2.0
> feature so I assume you're
> looking for XSLT 2.0 solutions.
> 

That is an issue. IS there any way to do this without
regular expressions?


> Michael Kay
> http://www.saxonica.com/
> 
> > -----Original Message-----
> > From: mark bordelon
> [mailto:markcbordelon@xxxxxxxxx] 
> > Sent: 06 June 2007 22:52
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject:  How to parse text into words,
> phrases, 
> > clauses, sentences, and paragraphs
> > 
> > Hey XML gurus,
> > 
> > Still somewhat new to XML/XSL and need some help
> getting 
> > started on how to use regular expressions and
> tokens in 
> > English text to transform it into an XML document
> marked up for:
> > 
> > 1.words (delimited by WS, excluding any external 
> > 2.punctuation, but allowing internal punctuation)
> 3.phrases 
> > (delimited by the comma) 4.clauses (delimited by
> colon or 
> > semicolon) 5.sentences (delimited by the period, 
> > question-mark, or  exclamation mark) 6.paragraphs
> (delimited 
> > by a line break)
> > 
> > Also ideal would be to assign sequenced id's to
> every tag, 
> > either in a running consecutive style from
> beginning to end, 
> > or repeating from 1 for every level of nesting. 
> > 
> > In more concrete terms,
> > 
> > To transfrom this text:
> > 
> > THOU still unravish'd bride of quietness,  Thou
> foster-child 
> > of Silence and slow Time, Sylvan historian, who
> canst thus 
> > express  A flowery tale more sweetly than our
> rhyme:
> > What leaf-fringed legend haunts about thy shap  Of
> deities or 
> > mortals, or of both,  In Tempe or the dales of
> Arcady?
> >  What men or gods are these? What maidens loth?
> > What mad pursuit? What struggle to escape?
> >  What pipes and timbrels? What wild ecstasy?
> > 
> > into this XML: (using indexing that renumbers for
> each
> > sub-group)
> > 
> > <para id=1>
> >  <sent id=1>
> >   <clause id=1>
> >    <phrase id=1>THOU still unravish'd bride of
> quietness,</phrase>
> >    <phrase id=2>Thou foster-child of Silence and
> slow Time,</phrase>
> >    <phrase id=3>Sylvan historian,</phrase>
> >    <phrase id=4> who canst thus express A flowery
> tale more 
> > sweetly than our rhyme</phrase>:
> >   </clause>
> >   <clause id=2>
> > What leaf-fringed legend haunts about thy shape Of
> deities or 
> > mortals,</phrase>
> >    <phrase id=1> or of both,</phrase>
> >    <phrase id=2> In Tempe or the dales of Arcady?
> >   </clause>
> >  </sent>
> >  <sent id=2>What men or gods are these?</sent> 
> <sent 
> > id=3>What maidens loth?</sent>  <sent id=4>What
> mad 
> > pursuit?</sent>  <sent id=5>What struggle to
> escape?</sent>  
> > <sent id=6>What pipes and timbrels?</sent>  <sent
> id=7>What 
> > wild ecstasy?</sent> </para>
> > 
> > 
> > or into this XML: (using indexing that is
> continuous per tag)
> > 
> > <para id=1>
> >  <sent id=1>
> >   <clause id=1>
> >    <phrase id=1>THOU still unravish'd bride of
> quietness,</phrase>
> >    <phrase id=2>Thou foster-child of Silence and
> slow Time,</phrase>
> >    <phrase id=3>Sylvan historian,</phrase>
> >    <phrase id=4> who canst thus express A flowery
> tale more 
> > sweetly than our rhyme</phrase>:
> >   </clause>
> >   <clause id=2>
> > What leaf-fringed legend haunts about thy shape Of
> deities or 
> > mortals,</phrase>
> >    <phrase id=5> or of both,</phrase>
> >    <phrase id=6> In Tempe or the dales of Arcady?
> >   </clause>
> >  </sent>
> >  <sent id=2>What men or gods are these?</sent> 
> <sent 
> > id=3>What maidens loth?</sent>  <sent id=4>What
> mad 
> > pursuit?</sent>  <sent id=5>What struggle to
> escape?</sent>  
> > <sent id=6>What pipes and timbrels?</sent>  <sent
> id=7>What 
> > wild ecstasy?</sent> </para>
> > 
> > Surely this has been done before. I have searched
> through 
> > archives and have not found anything, probably
> since I am 
> > searching using the wrong terminology.
> > 
> > Would really appreciate the help as it would give
> me insight 
> > into using regular expressions and sequencing in
> XSL.
> > 
> > Thanks in advance
> > 
> > Mark Bordelon
> > 
> > 
> > 
> >  
> >
>
______________________________________________________________
> > ______________________
> > Need Mail bonding?
> > Go to the Yahoo! Mail Q&A for great tips from
> Yahoo! Answers users.
> >
>
http://answers.yahoo.com/dir/?link=list&sid=396546091

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.