Re: grouping + global variable (?) (was re: regexs,
I do hope this thread provokes others to contribute.
At 11:05 PM 8/13/2004, you wrote:
Quite frankly, I hadn't realized we were so cutting edge. :)
Well up-conversion has been going on along as e-text has. But XML is only now beginning to catch up to where Omnimark and even Perl were years ago. And to do it in public and share is a new thing, since it's both difficult and profitable enough that it is best understood by the data-conversion vendors, who know a lot about it but whose methods and technologies tend to be proprietary for understandable reasons (it's their bread and butter).
Also, it is a tough enough business in the general case that often it's dealt with by throwing people at it, not machines, or both together. In many cases when input is underspecified or unconstrainable this cannot be avoided.
Ultimately, my goal is to provide an application that offers integration between the text file (written using the user's text processor of choice).
User wants to submit a manuscript, then the application performs all the necessary generation of the document (including cover letter) using user-specific information about how they want the document to appear, including any market- or genre-specific styles. Press a button, out pops the PDF or RTF. For now, I'll settle for PDF. :)
Good choice. The long-term challenge of these systems is "round-tripping" but you may want to avoid that for now; an opaque format like PDF helps force users to edit their original input, not the system's output (which then has to become input again).
I didn't write the perl script, thus my frustration (as a Python person). My partner-in-crime and I have come at the problem from entirely different directions.
This can be useful.
> Now it has some regexp support, XSLT 2.0 should be at least a credible > option here, but its features have yet to be stress-tested TMK and > tools support is still somewhat up in the air. (I believe Mike Kay is > speaking on this very topic at XML 2004 this November in Washington > DC.)
Saxon 8 is available but other vendors are standing in the wings (where they're hard to see). Only when we have a range of tools will it become clear (IMHO) how well the spec is designed. (For example the fact that W3C XML Schema implementations differ on details of implementation compromises the use of Schema generally, since its portability is impaired. This is a shame, though getting the spec right the first time in every detail on something like Schema is near-impossible; over time we can hope this situation will improve.)
> A split-down-the-middle option could be to write a little function > library in the language of your choice to do the upconversion > string-processing, and call out to it from your XSLT using extension > functions. (This is what I kind of imagined would happen five years > ago, but it turns out processor-dependent extension functions are > unfashionable these days.)
For this to work the text has to start life as some kind of XML, though that could be nothing but a dumb wrapper. Then you'd need a processor whose API allows you to return node-sets from functions.
Also, don't forget that XSLT 2 gives user-defined functions, so for many things it may be possible to avoid the external language altogether.
99% of the problem comes from documents saved in the native platform that aren't correctly tagged. I'm not quite certain what to do about this so that the editing is transparent. Yet.
I think this is the most difficult problem. This is why XML's well-formedness rules constitute its secret weapon. (Felt only when they chafe, this set of rules makes all downstream issues much easier to deal with, so XML developers can be quite unconscious of how much we don't have to think about.)
You need a way to trap and fix bad incoming tagging before it gets into your system, where it's expensive to deal with.
A plain-text editing window is appealing (many writers like their keyboards), but you're going to need at least a "galley" preview on input, before commit, or you're going to go insane. A real grammar for your syntax would be even better.
I feel moderately confident that this might make it a more contiguous process, which would also require fewer installed pieces in order to work.
> I'd be interested to hear myself from the list on this question. I haven't > yet myself seen a really nice route to RTF. I think two passes to this > (analogous to the way IBM deployed a "TeXML" which could be targeted as a > route to TeX) might be the best way to do it: have yet another tag set that > describes only the formatting primitives supported by RTF and a utility > stylesheet to make RTF out of that. Or use XSL-FO, if any of the formatters > can make decent RTF yet.
An indication that the problem of generating nice RTF is harder than it may first appear.
I should add that I *do* need API access rather than a standalone application.
If it were me I'd be inclined to see how far I could go with XSLT 2. But then, I like XSLT. I am actually fairly hopeful that XSLT 2 processors will be strong contenders in this space.
"Thus I make my own use of the telegraph, without consulting
the directors, like the sparrows, which I perceive use it
extensively for a perch." -- Thoreau
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format