[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Aligning/merging two sequences

Subject: Re: Aligning/merging two sequences
From: Michael Kay <mike@xxxxxxxxxxxx>
Date: Thu, 30 Sep 2010 18:08:32 +0100
Re:  Aligning/merging two sequences
I don't think it's straightforward at all - people have spent years perfecting algorithms for finding diffs between two sequences. I'm no expert on this area, but if I had the problem I would start by searching for appropriate algorithms before even thinking about writing an XSLT implementation. Presumably there's a trade-off between the time spent and the perfection of the result.

Michael Kay
Saxonica

On 30/09/2010 5:51 PM, Markus Flatscher wrote:
I'm banging my head against a sequence alignment problem. I have a feeling that this is straightforward, but I can't put my finger on what's missing from my attempts.

Suppose I have two inputs like so, where input1//w is always a subset of input2//w:

<input1>
<w n="1">I</w>
<w n="2">am</w>
<w n="3">a</w>
<w n="4">sequence</w>
</input1>

<input2>
<w>I</w>
<w>am</w>
<w>a</w>
<w>longer</w>
<w>longer</w>
<w>sequence</w>
</input2>

I'd like to get output like so:

<output>
<w n="1">I</w>
<w n="2">am</w>
<w n="3">a</w>
<w n="skipped">longer</w>
<w n="skipped">longer</w>
<w n="4">sequence</w>
</output>

I.e., for each input1//w, @n should be copied to the nearest following sibling <w> in input2 that matches .; <w>s in input2 that aren't in input1 should be flagged as "skipped".

P.S.: The use case is aligning an imperfect but timestamped transcription of an audio file (input1, machine-generated) with a perfect but not-timestamped one (input2, human-generated).

Thanks much for any help,

Markus

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.