[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: XML/XHTML fragment to text
Hi Alain,
You find yourself in a typical legacy-heritage entanglement. It is this kind of trouble that old legacy can give us and that costs companies zillions in time & material. see my comments, Cheers, -- Abel Alain wrote:
indeed.
You are mixing things up a bit. If you want that your apps run at dazzling speed, you should code in C++, or ASM for that matter. But that's not what you are doing. You are using XSLT, and that is an interpreted language. In terms of speed, Saxon-J runs much faster than Xalan-C. It might be that Xalan-J runs a bit slower than Xalan-C, but that will only marginally be so (and if it is not marginally so, than the port has been done badly). Yes, starting the JVM has a cost. If you have many small batches, than that's a problem. If they are large, than it is negligible. But it is easy to workaround: let the JVM stay in-memory and you are done. But this all is a useless discussion of course if "authorization" by the AIX team is an issue. If you can use any XSLT 2.0 processor, it is likely that your speed increases by a magnitude (I'm not talking percentages, I am talking factors). The reason that I dare say that is that you seem to use many recursive templates that are called quite repeatedly. If you want me to help you port it (once you've convinced the team that using JVM on AIX for XSLT will increase the batches' speed by a magnitude) you can contact me off-list for that. The goal is to be able to run a 5 million base customer, so we have to count every second in our batch process. Just for comparison: I've done a job for KPN (largest phone company in Holland) that sends 8 million invoices each month in 14 batches. Each batch processes between 2 and 4 GB of data. Using XSLT 1.0 this was a nightmare, a batch taking up to 14 hours. Using XSLT 2.0 this has become a breeze and it runs a batch in about one to two hours (there's more to it than only this of course, like that another process creates the AFP files for the printer and PDF is output for WORM tape, all in the same time). If you have to code for speed, there's no other option than to switch to XSLT 2.0 and the JVM.
so, what are you waiting for? Let it run Saxon as well ;)
In XSLT 2.0 you can do: $myString, for $i in 1 to $FieldLen - string-length($myString) return ' ' (the comma is intentional) or anything similar. But you are right, the concat-trick is just as easy. I think I saw a padding function in EXSLT, but it doesn't seem to have been made standard in 2.0indeed, it is not.
I have them on the shelf, I use them regularly. If you are interested.... ;)
aha, of course. The eternal legacy problem: back in the 70s they didn't think international yet...
This is practically impossible because you don't know exactly how the serializer will serialize. I.e., when it will use < and when <. Furthermore, UTF-8 can be encoded in different ways for one single character. In XSLT 2.0 you can cover this with the normalize-unicode attribute of xsl:output, in XSLT 1.0 you cannot and I haven't found a note on how to treat it. If you have XSLT 1.0 and you want to know exactly the size of bytes, use UTF-32 and you can (almost) be certain of the correct length (apart from the < / " etc). Drawback is the almost 4-fold increased size (you can use UTF-16 if all you need are the plane-1 characters). [...] or is there a function I didn't notice that can return a string length in bytes and not in chars ? Yes and no. But there's a simple trick. And this will solve your problems 100%, I believe, as long as you can find your bosses to move onto Saxon, because that's the only processor I found that can do it correctly. Forget serializing + reading back as unparsed-text, use this instead: <xsl:output name="output-def" encoding="UTF-8" normalization-form="NFD" omit-xml-declaration="yes" /> <snip ... /> <xsl:variable name="serialized" select="saxon:serialize($my-result-tree, 'output-def')" /> <xsl:variable name="hexBin" select="saxon:string-to-hexBinary($serialized, 'UTF-8')" /> <xsl:variable name="length" select="string-length(xs:string($hexBin)) div 2" /> I tested it, and it works even so well that it returns different amounts when you choose different normalization-forms (i.e., Compose / Decompose will give radically different results). It also correctly gives < as 4 characters when it is part of a text node or an attribute. It *does not* correctly interpret cdata-section-elements on the xsl:output definition, but that's only a minor inconvenience (and an insignificant little bug in Saxon), it does correctly interpret the omit-xml-declaration yes/no. You must be careful that the selected encodings match. If they don't, the string-to-hexBinary function will proof leading (logically so). All-in-all, this is by far the easiest way to calculate the length of a node in bytes. And you can use the resulting string to put into your fixed-length system as you want: <xsl:function name="f:padding" as="xs:string"> <xsl:param name="string" as="xs:string" /> <xsl:param name="width" as="xs:integer" /> <xsl:value-of select="$string, for $i in 1 to $width - string-length($string) return ' ' " separator="" /> </xsl:function> <snip ... /> <xsl:sequence select="f:padding($columnData1, 20)" /> <xsl:sequence select="f:padding($columnData2, 4)" /> <xsl:sequence select="f:padding($serialized,4096)" /> <xsl:sequence select="f:padding($columnData3, 400)" /> <xsl:sequence select="f:padding($columnData4, 2)" /> <xsl:sequence select="f:padding($columnData5, 12)" /> ..... etc Convinced that things *can* be easier in XSLT 2.0? Though I only showed you very few XSLT 2.0 specific things. Your major gain of switching to Saxon is that you can use the saxon:serialize() function. Otherwise it will be quite hard to guarantee that your recursive templates will be correct (I think that it is not so hard to proof that they are incorrect, unless you really rewrite the serialization algorithm of your processor in XSLT 1.0).
See above, using the right tools for the job, you will not need this hard-to-maintain solutions anymore. But as I'm on holidays now, I'll have to check the project status when I'm back in September ! Enjoy your holidays! Cheers, -- Abel Braaksma
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|