Subject: Re: segmenting a paragraph
From: David Carlisle <davidc@xxxxxxxxx>
Date: Tue, 2 Oct 2007 13:47:14 +0100
|
> This is essentially a variant of the approach using saxon:serialize(), which
> inserts the strings <note> and </note> instead of @@@@@ and !!!!!.
yes but it's for the simpler case (that I think we are in) that you can
remove the entire note element (and put it back later) saxon serialise
would not only put <note> there it would flatten out all its contents as
well, (which is why you'd needs saxon:parse to reconstruct it)
You need the serialise version (or something like it) if there is a
need to have regexp starting outside the note and finishing inside it,
but that can't happen here as if you had
Blah blah <note> blah. Blah blah </note> blah..
then serialing to
Blah blah <note> blah. Blah blah </note> blah..
would let you find the two sentences with regexp, but you wouldn't be
able to simultaneously wrap the sentences in ,seq> and reconstruct the
note element (unless you are Jeni and using LMML rather than XML of
course)
so rather than serialise the note to <note> blah. Blah blah
</note> I serialise it to @@@@@1!!!!! and lose all its content for
the regex processing, then just copy the node back from the original
source at the end.
David
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
|