[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: How to parse text into words, phrases, clauses, s
--- Michael Kay <mike@xxxxxxxxxxxx> wrote: > You don't really make it clear where you are having > difficulty. There seem > to be four separate problems here: Mike, Thanks for helping me even break this down. THis is definitely something I can and want to do myself. Just need the initial hints. > (a) translating your concepts, such as "words" and > "sentences" into precise > specifications > (b) translating these specifications into regular > expressions Got these. E.g. the specification for "word" could be [^ '-]* > > (c) using these regular expressions within a > stylesheet, for example as an > argument to the tokenize() function or the > xsl:analyze-string instruction. > This is my first problem. How to apply a template match ysing the tokenize() function. And which order to apply (from paragraph -> word or word -> paragraph). > (d) doing the output numbering. I haven't a clue how this would be done, either way. > > The fourth problem seems quite unrelated to the > others. Of the other three, > I'm reluctant to launch into answering without > knowing which of the three > steps you need help with. (Generally I think most > people answering on this > list adopt the approach of trying to help you solve > your problem, rather > than doing the work for you.) After any initial hints, I would and could be able to do the rest of the work myself. > > Incidentally, regular expressions are an XSLT 2.0 > feature so I assume you're > looking for XSLT 2.0 solutions. > That is an issue. IS there any way to do this without regular expressions? > Michael Kay > http://www.saxonica.com/ > > > -----Original Message----- > > From: mark bordelon > [mailto:markcbordelon@xxxxxxxxx] > > Sent: 06 June 2007 22:52 > > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > > Subject: How to parse text into words, > phrases, > > clauses, sentences, and paragraphs > > > > Hey XML gurus, > > > > Still somewhat new to XML/XSL and need some help > getting > > started on how to use regular expressions and > tokens in > > English text to transform it into an XML document > marked up for: > > > > 1.words (delimited by WS, excluding any external > > 2.punctuation, but allowing internal punctuation) > 3.phrases > > (delimited by the comma) 4.clauses (delimited by > colon or > > semicolon) 5.sentences (delimited by the period, > > question-mark, or exclamation mark) 6.paragraphs > (delimited > > by a line break) > > > > Also ideal would be to assign sequenced id's to > every tag, > > either in a running consecutive style from > beginning to end, > > or repeating from 1 for every level of nesting. > > > > In more concrete terms, > > > > To transfrom this text: > > > > THOU still unravish'd bride of quietness, Thou > foster-child > > of Silence and slow Time, Sylvan historian, who > canst thus > > express A flowery tale more sweetly than our > rhyme: > > What leaf-fringed legend haunts about thy shap Of > deities or > > mortals, or of both, In Tempe or the dales of > Arcady? > > What men or gods are these? What maidens loth? > > What mad pursuit? What struggle to escape? > > What pipes and timbrels? What wild ecstasy? > > > > into this XML: (using indexing that renumbers for > each > > sub-group) > > > > <para id=1> > > <sent id=1> > > <clause id=1> > > <phrase id=1>THOU still unravish'd bride of > quietness,</phrase> > > <phrase id=2>Thou foster-child of Silence and > slow Time,</phrase> > > <phrase id=3>Sylvan historian,</phrase> > > <phrase id=4> who canst thus express A flowery > tale more > > sweetly than our rhyme</phrase>: > > </clause> > > <clause id=2> > > What leaf-fringed legend haunts about thy shape Of > deities or > > mortals,</phrase> > > <phrase id=1> or of both,</phrase> > > <phrase id=2> In Tempe or the dales of Arcady? > > </clause> > > </sent> > > <sent id=2>What men or gods are these?</sent> > <sent > > id=3>What maidens loth?</sent> <sent id=4>What > mad > > pursuit?</sent> <sent id=5>What struggle to > escape?</sent> > > <sent id=6>What pipes and timbrels?</sent> <sent > id=7>What > > wild ecstasy?</sent> </para> > > > > > > or into this XML: (using indexing that is > continuous per tag) > > > > <para id=1> > > <sent id=1> > > <clause id=1> > > <phrase id=1>THOU still unravish'd bride of > quietness,</phrase> > > <phrase id=2>Thou foster-child of Silence and > slow Time,</phrase> > > <phrase id=3>Sylvan historian,</phrase> > > <phrase id=4> who canst thus express A flowery > tale more > > sweetly than our rhyme</phrase>: > > </clause> > > <clause id=2> > > What leaf-fringed legend haunts about thy shape Of > deities or > > mortals,</phrase> > > <phrase id=5> or of both,</phrase> > > <phrase id=6> In Tempe or the dales of Arcady? > > </clause> > > </sent> > > <sent id=2>What men or gods are these?</sent> > <sent > > id=3>What maidens loth?</sent> <sent id=4>What > mad > > pursuit?</sent> <sent id=5>What struggle to > escape?</sent> > > <sent id=6>What pipes and timbrels?</sent> <sent > id=7>What > > wild ecstasy?</sent> </para> > > > > Surely this has been done before. I have searched > through > > archives and have not found anything, probably > since I am > > searching using the wrong terminology. > > > > Would really appreciate the help as it would give > me insight > > into using regular expressions and sequencing in > XSL. > > > > Thanks in advance > > > > Mark Bordelon > > > > > > > > > > > ______________________________________________________________ > > ______________________ > > Need Mail bonding? > > Go to the Yahoo! Mail Q&A for great tips from > Yahoo! Answers users. > > > http://answers.yahoo.com/dir/?link=list&sid=396546091
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|