RE: How to mark every 5th output record.
Hello. Thank you for a long and considerate post. It's true that taming the spec in any language would be a challenge. It's a myriad of special cases and exceptions, but it's also unfortunately a standard. In retrospect it would have been a lot easier to do it in C++, especially since we have access to the source code of the application that exports the XML. The choice to use XSLT was chosen as a stress test to validate the XML schema and to prove to third parties that they could use XSLT to implement their own file converters. In other words: "If we can export to *that* format using XSLT, then our customers can export to any file format". Regarding your defense of XSLT, I'm not trying to force xslt to do something it wasn't designed to do. I'm simply trying to find the path of least resistance to accomplish that last 0.05% to meet spec compliance. Patrick Bergeron -----Original Message----- From: Wendell Piez [mailto:wapiez@xxxxxxxxxxxxxxxx] Sent: Tuesday, March 11, 2008 10:55 AM To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: RE: How to mark every 5th output record. Patrick, Just because your logic is currently at 2900 lines of code doesn't mean it has to be. In fact, if its approach to processing is as imperative as what you've suggested you "should" be able to do, chances are reasonably good that someone who's familiar and comfortable with the XSLT processing model could reduce it radically by refactoring. Nor is pipelining (the term of art for processing your output as input) inherently such a bad thing. Indeed, in XSLT 2.0, it can be done transparently in one stylesheet. Depending on your architecture and implementation, it need not be inefficient. As Mike said, the details of what you are trying to do are critical. For one thing, if your logic is complex, that's an indication that the process you are designing involves upconversion. If so, you should tell us right off whether you can use XSLT 2.0 or whether you are limited to 1.0. XSLT 1.0 wasn't designed for upconversion (its general assumption is that the dataset is clean and optimally structured and ordered going in, and transformations are geared mainly to presentation not data processing), which isn't to say that it can't be done. Rather, it's to say that when consulting the experts on how to do things, you will constantly hear the refrain "It's easier in 2.0". As you have learned, XSLT is declarative and functional, not imperative. Variables are variables in the sense they are in algebra -- values defined in relation to other values in a processing context -- not just labels for memory registers, which you can reassign at will (a dangerous and destructive practice, since this means that any bug is at risk of infecting parts of the system far beyond where it does its immediate damage). While for you at this moment, this fact may present an impediment to using XSLT well, it's still not really a problem, as it offers numerous advantages at many layers of the system including yours (once you know how to take advantage of it), especially as complexity scales up. I know this is a defense, not a solution. But if your platform resources are really so tight, maybe you need something with a different processing model than XSLT (maybe a SAX filter or series of them, or a Perl or Python script), at least for part of your problem. If things are that difficult, there's a reason. Either you are trying to use the language for something it wasn't designed for and doesn't do well, or you are approaching it wrong. Or both. My guess, from your description, is that the specification itself is a monster, and that taming it would be difficult in any language. As far as that goes, in general, there's filtering, grouping and sorting. Sometimes any or all of these require additional processing to determine criteria for them. Also, sometimes sorting has to happen before grouping (that is, logically prior if not necessarily temporally), sometimes after -- that is, both are reordering or rearranging operations (as is filtering, strictly speaking). In my experience, the sequence (1) data analysis followed by (2) filtering followed by (3) reordering has made sense. Often (1) and (2) can be collapsed. If (1) is done well, usually (3) can be done in one pass. Your requirement is tricky because you want grouping to occur after filtering and sorting, which is often (though not always) impractical in one pass. As Mike indicated before, XSLT 2.0 provides features that make necessary facilities for (1) (in the general case) available during later operations, which frequently reduces the need for pipelining since analysis can be done on the fly. On the other hand, when you need to pipeline, XSLT 2.0 makes that easier too. Cheers, Wendell At 10:03 AM 3/11/2008, you wrote: >As I said the rules under which I process my list are quite complex. So much >so that my XSLT stylesheet is over 2900 lines of code (and yes, that's just >nuts). > >Different records (and types of records) are processed using different >rules, other records are deferred for later processing, others merged >together to produce a final one, some are skipped altogether, some complex >operations are performed on yet another set of records, etc. The output file >format is crazy, and the spec for the file format is about as obscure and >obtuse as I have ever seen in 20 years programming. > >But in the end, I end up with a text file that has 1 line per "output >record", but these "output records" have almost nothing to do with the input >records, and I need to separate them with a marker every 5th. > >I can't really do (position() mod 5) on my original input data because it >has no correlation to the order of the output records, and it's impossible >to create an expression that would select them properly in the order I need. > >Is my only option to create another tree that contains all of my output >record results, and then iterate over that tree once again, and putput the >same data verbatim, only this time insert a marker every 5th? > >Gheesh, talk about using a tank to shoot a bird. > >I'm trying to avoid doing this for other reasons: > >1) My input data set is quite large. >2) The XSLT processor is running on an embedded platform with limited >memory. >3) I'm already paying the price of doing a copy of the data in an earlier >pass, I'd like to not pay the price again. > >Is there really, really, really _any_ other way of doing this without making >a 3rd copy of my data set? ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format