Re: grouping + global variable (?) (was re: regexs,
Your project sounds very ambitious. Up-conversion is a challenging and fascinating business, which we're all going to learn much more about. You have several conference papers' worth of material here, I bet.
At 08:15 PM 8/12/2004, you wrote:
But I've been thinking, based on the comments from the list, that a better process might be eliminating the perl script entirely.
Maybe: but you'll need something at least as good to do the work it's doing, and Perl is really good at regular-expressions and string processing generally.
(Personally I might have tried it in Python, but that's mainly because I can count the lines of Perl I've written in my life on one hand. Of course, I can count in binary on my hands, which gets me higher than five.)
Now it has some regexp support, XSLT 2.0 should be at least a credible option here, but its features have yet to be stress-tested TMK and tools support is still somewhat up in the air. (I believe Mike Kay is speaking on this very topic at XML 2004 this November in Washington DC.)
A split-down-the-middle option could be to write a little function library in the language of your choice to do the upconversion string-processing, and call out to it from your XSLT using extension functions. (This is what I kind of imagined would happen five years ago, but it turns out processor-dependent extension functions are unfashionable these days.)
I'm not sure I'd want to eliminate the intermediate XML file, though. There have been times when I've needed to tweak it. For example, I have old files with smart quotes not saved in UTF-8, and the perl script barfs on UTF-8 files, so I do the XML conversion, open the file and re-save the XML as UTF-8.
I think having the intermediate format will prove to be good design in any case. I was just reading that the complexity of a solution to a problem generally increases in proportion to the square of the size of the problem space, which is why breaking problems down into pieces works so well. (Don't ask me why those guys think this: it didn't say.)
Option 3 seems to be ruled out based on my current toolchain (apache-FOP), which probably eliminates #2 as well. (I could easily be wrong on this)
Apache Xalan-J has support for a node-set function, so you could use option 2 if you wanted. It will even recognize it in the exslt.org namespace, which is nice.
Options 1 and 4 seem most like what the current process is. Currently, a new XML file is generated only if the timestamp is less than the timestamp of the text file it's transformed from.
Saxon is well-liked by developers (it runs well, it's conformant, and it has good error messages), and can be switched in for Xalan in your toolchain if you prefer it. Saxon also supports exslt:node-set, so you can use option #2 with it as well.
As I mentioned, it has an extension attribute, saxon:next-in-chain, that can be invoked for pipelining. IIRC it passes SAX events between processor invocations (Mike?), so it's much faster than writing a file and reparsing, though perhaps not quite as fast as passing unserialized trees, as options 2 and 3 would do.
I am reasonably sure Xalan offers similar features, however, or the Cocoon framework does.
I'd also eventually like to get a decent RTF output. Standard manuscript prose is not terribly complex, so something that supported basic features should suffice for that. Unfortunately, the commercial options are too expensive for the intended audience. Is jfop likely to be my best available option?
I'd be interested to hear myself from the list on this question. I haven't yet myself seen a really nice route to RTF. I think two passes to this (analogous to the way IBM deployed a "TeXML" which could be targeted as a route to TeX) might be the best way to do it: have yet another tag set that describes only the formatting primitives supported by RTF and a utility stylesheet to make RTF out of that. Or use XSL-FO, if any of the formatters can make decent RTF yet.
I hope this helps! Wendell
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format