[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Text based stage play scripts to XML

Subject: Re: Text based stage play scripts to XML
From: Liam R E Quin <liam@xxxxxx>
Date: Mon, 24 Jan 2011 13:05:34 -0500
Re:  Text based stage play scripts to XML
On Mon, 2011-01-24 at 14:37 +0200, Jacobus Reyneke wrote:

> Take any input file and output a similar output file. While doing so
> however, look for text located between identifiable patterns. Surround
> this text with tags.
> 
> If input file contains:
> a b c d e f g h i j
> 
> Pattern description:
> any string that follow after the string "c d" and is followed by the
> string "g h"
> 
> If pattern found:
> Surround with <found-you>
> 
> Result:
> a b c d<found-you> e f </found-you>g h i j

Others have mentioned some XSLT approaches, and that's generally a good
way to go.  Of course, if you don't mind learning a programming
language, Perl is the king (or at least a princess) of transformations
where you don't yet have XML, but want to add markup. Use XML-aware
tools as early in the process as possible, though!

while (<>) { # for each line of input
    s{c d\K e f (?=g h)}{ # replace with the value of...:
	element(
	    "found-you",  # element name
	    $&,           # what was matched (" e f " here)
            # optional attributes:
	    "rule" => "31",
	    "before" => "c d"
	)
    }e;  # "e" flag means the replacement is an expression, not text

    print; # print the line whether or not it was changed
}

Given the input a b c d e f g h
this produces
a b c d<found-you rule="31" before="c d"> e f </found-you>g h

To process a whole file at once, you can use the rather odd Perl idiom,
my $text { 
    local $/; # slurp mode
    $text = <>;
};

# and then do the substitution:
$text =~ s{as before}{as before}gme;

At that point you might (or might not) want to use \s+ rather than a
space between the tokens in the input, to match one or more whitespace
characters.  Start by normalizing the text though -- look for lines
ending with spaces, for example, and trim them.

Adding an attribute showing which pattern put a tag in place can
considerably aid debugging the process.  It also helps to be consistent
in your markup, e.g. *always* use double quotes for attribute values.

A simple definition of the "element" function follows - I have tried to
avoid "clever" Perl, and I have left a couple of items in place that
help debugging.  For production it would probably also handle quoting
special characters (& < > in content) as well as (already done) " in
attribute values.

It's relatively straight forward using this approach to get files that
can be processed further with XML tools, although even then I sometimes
use Perl, e.g. because of its more powerful regular expressions, or
because I can more easily check for filenames...

You could have a separate file of patterns that are loaded and matched
against. On Linux, run the command, perldoc perlre, for some
documentation.

Liam

#! /usr/bin/perl -w
use warnings;
use strict;

sub element($$;%)
{
    my ($name, $content, %attributes) = @_;

    sub quotedattvalue($$)
    {
	my ($name, $value) = @_;

	# print STDERR "q $name, $value\n";
	$value =~ s/"/\&quot;/g; # so we can safely use quotes
	return '"' . $value . '"';
    }

    # make a list of att="value" pairs, each with a leading space:
    # (could use join and map to do this too more succinctly,
    # see perldoc -f map)
    my $atts = "";
    if (%attributes) {
	foreach (keys %attributes) {
	    $atts .= " " .
	        $_ . '=' .  quotedattvalue($_, $attributes{$_})
	    ;
	}
    }

    return "<${name}${atts}>${content}</${name}>";
}

my $text;
{
    local $/;
    $text = <>;
};

$text =~ s{c d\K e f (?=g h)}{
	element(
	    "found-you",
	    $&,
	    "rule" => "31",
	    "before" => "c d"
	)
    }gme;
    print $text;

# end

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.