Re: XML Performance in a Transacation
Michael Kay wrote: >>My expectation is that XML parsing can be significantly sped up with ... >> >> > >I think that UTF-8 decoding is often the bottleneck and the obvious way to >speed that up is to write the whole thing in assembler. I suspect the only >way of getting a significant improvement (i.e. more than a doubling) in >parser speed is to get closer to the hardware. I'm surprised no-one has done >it. Perhaps no-one knows how to write assembler any more (or perhaps, like >me, they just don't enjoy it). > > Yes.* The technique using C++ intrinsics (which is assembler in disguise) I gave in my blog (URL in previous post) gives a *four to five* times speed increase compared to fairly tight C++ code, for the libxml utf-8 to UTF-16 transcoder, for ASCII valued data. The fact that this gives such a large increase in what is sometimes deemed to be a bottleneck augers well for XML speed-ups generally. *If* the dynamic for optimization were there. Unfortunately, I expect that (MS excepted) that the people most active in internationalization libraries (like Mark Davis' wonderful ICU project at IBM, which feeds into Java for example, or libxml) are more keen on cross-platform portability (in C++ and Java) rather than exploiting the fast instructions of particular CPUs. With modern CPUs and things like SSE2, even the best C++ (without intrinsics) or Java code simply cannot compete with C++ with intrinsics or with assembler; Java has certainly caught up with C++ but the introduction of intrinsics has shifted the goalposts for C++ far beyond what Java can cope with. The simple reason is that you need to use particular data structures (e.g. 16 byte arrays) and particular (non-stalling) algorithms, and you need to have some kind of pragma or tag to tell the compiler that your code is SSE friendly: it is just too hard, hence the rise of intrinsics and the competitive fall of Java/C# etc. Because Open Source XML software generally is either Java or cross-platform C++, there hasn't been the dynamic there to add processor-dependent optimizations or algorithms. I hope that will change. But the change will happen faster with user demand: in particular, the organizations with high transaction needs who are currently doing such a convincing impersonation of passive beached whales. They need to be more proactive in stimulating open source development in more efficient XML; it directly addresses their business requirements for high transaction rates. Hence my suggestion for a consortium offering a cash prize. There are still a lot of CPUs without SSE out there. I also develop digital audio filters for modular synthesizers, as a hobby, and I am surprised at the number of older machines stillout there. But they are disappearing fast. Cheers Rick Jelliffe * Yes for getting closer to the hardware. But assembler is not necessary IMHO: compilers generate great code...the big performance potential is from utilizing the pipelining/streaming instructions on small arrays, in particular SSE2 on x86. You don't need to resort to assembler because of availability of C++ intrinsic functions. I was surprised about them: I had thought C and C++ had stopped evolving in any interesting way in the early 90s and that the action was in Java/C#. But these Intrinsics utterly change the tradeoffs for Java versus C++, back in favour of C++. I (and I think a lot of my generation of programmers who switched to Java in the 90s) missed the advent of Intrinsics this decade. I think Java still wins for WORA Java versus WORA C++, but for optimised code, it is no contest: Java simply cannot compete with C++ with intrinsics because Java can only make use of SSE2 instructions piecemeal, just as normal instructions, but has no capability of making use of their pipelining or virtual parallelism. In DSP, compiling the same C++ code to use SSE instructions usually results in a 30% increase in performance. Small, but good. C+ compilers with "generate SSE" ticked, or Javva compilers can get this kind of advantage now with. But the real speed up, the speed up I am talking about, is to use the SSE(2) code for pipelining, which requires that you structure your algorithm in a way to suit the CPU. That is beyond what we can expect a Java compiler to do unaided. So please, no flames or pointers to benchmarks of Java versus C++ based on code that is not written to utilize intrinsics: they are so 90s! Java needs to add the equivalent of intriniscs (i.e. in the way that they have system dependent arraycopy method), to catch up with C++. There may be some way to make it slightly more CPU independent (e.g. make the arraysize of MIMD data such as _i128 into a System constant.)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format