[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Aligning/merging two sequences

Subject: Aligning/merging two sequences
From: Markus Flatscher <markus.flatscher@xxxxxxxxxxxx>
Date: Thu, 30 Sep 2010 12:51:00 -0400
 Aligning/merging two sequences
I'm banging my head against a sequence alignment problem. I have a feeling that this is straightforward, but I can't put my finger on what's missing from my attempts.

Suppose I have two inputs like so, where input1//w is always a subset of input2//w:

<input1>
 <w n="1">I</w>
 <w n="2">am</w>
 <w n="3">a</w>
 <w n="4">sequence</w>
</input1>

<input2>
 <w>I</w>
 <w>am</w>
 <w>a</w>
 <w>longer</w>
 <w>longer</w>
 <w>sequence</w>
</input2>

I'd like to get output like so:

<output>
 <w n="1">I</w>
 <w n="2">am</w>
 <w n="3">a</w>
 <w n="skipped">longer</w>
 <w n="skipped">longer</w>
 <w n="4">sequence</w>
</output>

I.e., for each input1//w, @n should be copied to the nearest following sibling <w> in input2 that matches .; <w>s in input2 that aren't in input1 should be flagged as "skipped".

P.S.: The use case is aligning an imperfect but timestamped transcription of an audio file (input1, machine-generated) with a perfect but not-timestamped one (input2, human-generated).

Thanks much for any help,

Markus

--
Markus Flatscher, Project Editor
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville VA 22904, USA
Courier: 211 Emmet Street South, Charlottesville VA 22903, USA
Email: markus.flatscher@xxxxxxxxxxxx
Web: http://rotunda.upress.virginia.edu/

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.