RE: Java Technology and XML : API benchmark
That was precisely the kind of pinches of salt I was thinking about. Don't expect to find deeps facts about reality from micro benchmarks or informal tests. I think benchmarking is rocket science, because it's very difficult to design benchmarks that really mean something. But that shouldn't prevent us from reading micro-benchmark reports. Anyway, the document is worth reading : - see for example the impact of validation. It's surprising to see that due to problems in parser implementations, a document with a schema reference show an "non-validation" overhead, even if validation is not enabled. I still don't understand why validation is not performed in a SAX filter rather than in parsers. Parsers like Xerces have dramatically grown in size and have performance problem due to the fact that validation is built in in the parser. To me, parsing and validating are two different activities. They may have been integrated for performance reasons of the parsing+validation pipeline, but I'd still like to have a clean, high performance parsing pipeline in which I could plugin any kind of validators (eg. Sun MSV). Take the bug that causes Xerces 2 to dereference and parse schemas even if validation is disabled for example. I hope it has been fixed since, but if Xerces 2 was not mixing a lot of APIs and technologies (parsing, validating, building a DTD specific DOMs, etc.) in a monolithic way, the bug would have never appeared in the first place. - The getElementsByTagName is also interesting, because it clearly shows that even if the functional behaviour of the DOM is defined by the API, its performance behaviour is totally undefined. Either the code for this method on Crimson is totally dumb, or there is a CPU/memory tradeoff in the other DOMs (only an examination of the different source codes can tell). I have always suspected that getElementsByTagName was probably poor in performance (due to the instanciation of immutable NodeList instances), so I never used it anyway, but this definitely reinsure me in the idea that getElementsByTagName is inherently bad. It's in the top 10 of the "do not use this" list we have in my team. We use a finely tuned homegrown selector library, or XPath expressions with Jaxen when the selectors are too complicated. Those selectors or XPath expressions are built once and used many times without causing unnecessary instanciations, and thus have pretty well predictable in performance. - I found Figure 9 particularly interesting, though it's not related to XML but to Java. Hotspot optimization can sometimes take a long time to take place, and the fun thing here it that it takes place too late to have an impact on the test. I tried many times to benchmark a particular piece of code and found it extremely difficult due to the variations the GC and Hotspot could introduce in the system. Benchmarking Java code is truly not a simple task :). Regards, Nicolas >-----Message d'origine----- >De : Daniel Veillard [mailto:veillard@r...] >Envoye : mercredi 13 mars 2002 14:26 >A : Nicolas LEHUEN >Cc : 'xml-dev@l...' >Objet : Re: Java Technology and XML : API benchmark > > >On Wed, Mar 13, 2002 at 01:52:48PM +0100, Nicolas LEHUEN wrote: >> Like all benchmark made by any given "vendor" (the quotes >are here because >> the different APIs are free), this should be taken with a >pinch of salt. It >> is still interesting to read, though. >> >> >http://developer.java.sun.com/developer/technicalArticles/xml/J >avaTechandXML >> _part2/ > > Did they give the input for their tests ? I don't think so. >What would >become really fun is to see the result of processing those data without >having to run through the Java stuff. I.e. reporting side by side what >MSXML or libxml2/libxslt results would be. It's a long time since any >XSLTMark  benchmark had been produced ... > > Benchmarks are statistic, and hence show only a few facets of the >real object, in this case the goal seems to be more of >comparing various >processing costs in the Java environment than to make a roundup of >to set of tools available, but still releasing the sources would allow >to scope those result better and give more weight to their analysis. As >they state they are to "be considered as micro-benchmarks". > > Also any "single shot" run in a Java based environment doesn't give >good results (time to find the "Hot Spot" needing compilation) this is >interestingly pointed explicitely shown in their "Comparing Different >JVM Versions" part. > >Daniel > > http://www.datapower.com/XSLTMark/ > >-- >Daniel Veillard | Red Hat Network https://rhn.redhat.com/ >veillard@r... | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format