|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] A utility to make msxsl more useful
I wrote a small Perl script that can be used to preprocess XML files before sending them to msxsl. Why might you want to do this? So you can expand ENTITY references and do something like <INCLUDE HREF="included_file.xml"/> It's very basic and very small so I just attached it to this message for anyone who's interested. Here's the syntax for using it from the DOS command prompt... C:\<your path to Perl>\Perl.exe expand.pl myfile.xml > temp.xml msxsl -i myfile.xml -s myfile.xsl -o output.html myfile.xml can define entities in its internal and external DTD by saying <!ENTITY entityname 'VALUE'> or <!ENTITY entityname SYSTEM 'filepath'> You can use single or double quotes. I also made it so you can include a file by saying <INCLUDE HREF="filetoinclude"/> Basically, I'm trying to find ways to make msxsl usable now. I was sort of hoping some Java programmers would leap to the rescue and turn msxml (or some equivalent parser) into type of preprocessor for msxsl but, failing that, I worked up a quick and dirty way to do what I want. Hopefully some one else will find it useful.
main();
sub main {
$xml = (&readFile($ARGV[0]));
%externalEntities = &parseExternalDTD($xml);
%internalEntities = &parseInternalDTD($xml);
my($moreToGo) = (1);
while ($moreToGo) {
$moreToGo = &expandEntities(%externalEntities, %internalEntities) | &expandLinks(%externalEntities, %internalEntities);
}
print $xml;
}
# $_[0] = file name or path
# returns full text of file
sub readFile {
my($contents);
my(@fileInfo) = stat($_[0]);
open(F, $_[0]) or die "Couldn't open $_[0]\n";
read F, $contents, $fileInfo[7];
close(F);
return $contents;
}
# $_[0] full text of an XML document
# returns hash of external entities and what they reference
sub parseExternalDTD {
# Looking for... <!DOCTYPE foo SYSTEM 'bar.dtd'>
unless ($_[0] =~ /<!DOCTYPE\s+\w+\s+SYSTEM\s+['"]([^"']+)/) {
return {};
}
my($dtdPath) = ($1);
my($dtd) = &readFile($dtdPath);
my(%entities) = (&extractEntities($dtd));
return %entities;
}
# $_[0] full text of XML document
# returns hash of internally defined entities and what they reference
sub parseInternalDTD {
my(%entities) = (&extractEntities($_[0]));
return %entities;
}
# $_[0] text, possibly containing <!ENTITY> declarations
# returns entity has of names and values
sub extractEntities {
my($text) = $_[0];
my(%entities);
my($entityName, $entityPath);
# Looking for <!ENTITY foo 'bar'> or <!ENTITY foo SYSTEM 'bar'>
while ($text =~ /<!ENTITY/) {
if ($text =~ s/<!ENTITY\s+(\w+)\s+['"]([^'"]*)['"]>//s) {
$entities{$1} = $2;
} elsif ($text =~ s/<!ENTITY\s+(\w+)\s+SYSTEM\s+['"]([^'"]+)['"]>//s) {
($entityName, $entityPath) = ($1, $2);
$entities{$entityName} = &readFile($entityPath);
}
}
return %entities;
}
# @_ is a hash of entities and what they expand to
# works on global variable $xml searching for &foo; references
# returns true if it was able to make any replacements
sub expandEntities {
my(%entities) = @_;
my($gotOne) = (0);
while ($xml =~ s/\&(\w+);/$entities{$1}/) {
$gotOne = 1;
}
return $gotOne;
}
sub expandLinks {
my($gotOne) = (0);
# We're looking for... <INCLUDE HREF="foo"/>
# This is not a complete implementation! A real XML processor would
# look for any type of link that's defined to have SHOW="EMBED" and ACTUATE="AUTO"
# ...but that's too much work for what I'm after
while ($xml =~ s/<INCLUDE\s+HREF=["']([^"']+)["']\/>/&readFile($1)/se) {
$gotOne = 1;
}
return $gotOne;
}
-- Andrew Andrew Bunner President, Founder Mass Quantities, Inc. Professional Supplements for the Perfect Physique http://www.massquantities.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








