[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: XML/XHTML fragment to text
Hi Abel,
you've got 99% right ! A from D, your understanding is perfect, and your wording is much more clear than mine !.. [English is not my mother tongue] E. You seem to have a preference for Xalan-C (but the above is *much*easier with Saxon 8.9!!!) This one is not quite right. Let me discuss it as you suggested. Personally I would prefer Saxon: XSLT2.0 make things so much easier. And my advise would be to buy the schema-aware version, because, for example, it would be stupid to write code to check the input, where it's so much safer and easier to maintain relying on schema validation. For my personal sites, I'm generating the HTML and XHTML versions from XML source using non-validating Saxon (at home I don't bother with schema). I first used Xalan, but then I had to write a make.bat and separate stylesheets to calculate the whole site. So I switched to Saxon and it made things so much better. Typically, it takes 7 seconds to generate 400 HTML pages on my laptop, but at home I won't bother it is 7sec or 4,5sec as it was with Xalan, because I'm not regenerating my sites every other day ! But at work, the only thing that has been authorized for now is Xalan-C. It is running in batches (jobs) on AIX machines. The reason why they are not considering another transformation engine, at the moment, is performance. Even for a small transformation if you run Saxon or Xalan-J, you will have to set up an run a JVM in your Unix batch. Launching the JVM has a cost in memory and time. And even if you don't count the JVM cost, Saxon is Java code, so it has to pay the Java overload compared to a code written in C++... Although Saxon may perform faster on some specific templates where it has better optimisations, on an "average" template it will still be slower because it's Java versus C++. The goal is to be able to run a 5 million base customer, so we have to count every second in our batch process. But things might change a little bit now. Because for SEPA (Single Euro Payments Area) they choosed to write a Java program and run it in our main batch to transform the XML (ISO 20022 defined) files to fixed-length format understandable by our legacy Cobol programs. So they are definitely running a JVM inside main the batch, and I will point that to the persons in charge of choosing standard software. The cost of maintaining a Java program for such a transformation might also be higher than having just a XSL stylesheet where possible (I'm not on the SEPA project so I didn't look if it is possible to transform the ISO 20022 XML easily). For instance, making the records fixed length is as easy as 1-2-3 (where you would need recursive templates in XSLT 1.0) I must have read only part of the XSLT2.0 documentation then... because it still looks to me *not* so easy, even with XSLT2.0 Even for a string, you still have to write things like : substring(concat(myString,$padding),1,$N) to pad it correctly ... which you could already do with 1.0 (with no recursion provided $padding is long enough). I think I saw a padding function in EXSLT, but it doesn't seem to have been made standard in 2.0 Of course XSTL1.0 is even worse. For example, you don't have any function to handle dates, and that's painful. Or we could probably write (or buy) "generic" patterns to transform to fix-length. 3. When the HTML field passes by, use unparsed-text(...temp filehere...) to include the textized HTML data I didn't use this function yet, it looks as a very elegant way to solve the problem, except for one little thing (but I'll check the documentation). The last bit of headache is the "UTF-8" problem ! Because fixed-length is fixed-length in *bytes*. It is fine with ISO8859-1 where char=byte. But as we run internationally and also have Greece, Russia, etc... and could run in China, we need UTF-8. For that, with XSLT1.0, I agree with you, I had to build insane recursive templates to calculate the length in bytes of an UTF-8 string. It's so insane, I tend to think at this point we would rather write a Java or even Cobol program. With XSLT2.0 it's easier, but still difficult because you are doing things only the serializer should do... or is there a function I didn't notice that can return a string length in bytes and not in chars ? Not sure if you mean if this is already dropped by your team. The last memo I read suggested that we could do otherwise, because mixing fixed-length and variable-text doesn't look a good architecture to start with. You came to the same conclusion, your advise been to separate the variable part (e.g. HTML) in a temporary file, even if your templates are smarter and to put every piece together again. But as I'm on holidays now, I'll have to check the project status when I'm back in September ! Thanks again for all your help.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|