[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Performance in a Transacation
Wolfgang Hoschek wrote: > This is even though the conversion routines are highly optimized, > taking full advantage of pure or partial ASCII valued data, similar > in spirit to the technique your blog mentions (except that it's in Java). Oh, if it is Java it is not really similar in spirit. The point of the blog is not scanning ahead for non-ASCII codes but on taking advantage of the parallelism/pipelining functions in current CPUs, as exposed by C++ Intrinsic funcitons, to speed things up. I apologize for not being clear. > I do have some hope that future VMs with better dynamic optimization > logic for memory prefetching, bulk operations, etc. could make more > of a difference here, though. Care to explain why a dynamic optimizer > couldn't get close to what those handcoded assembler routines do, in > particular considering modern memory latencies? It is highly unlikely that a programmer would write code that is readily parallelizable* into optimal SSE2 instructions unless they knew SSE2's constraints in the first place. They have to process data in 128-bit chunks. The data has to be aligned on certain memory boundaries. Only some kinds of data are allowed. Only some kinds of operations are available. Almost any call to a function or method will break the pipeline. Expressions have to be written with certain variable-writing constraints to prevent pipeline stalling. Expressions have to be written to interleave use of different execution units in the CPU. (*I say parallelizable, because the intrinsics make pipelined instructions look to the programmer like parallel instructions.) The reason that current C++ compilers don't attempt to do anything sophisticated with parallelization is that it is too hard and defeatable. Providing built-in Intrinsic functions which act on special built-in 128bit data types had turned out to be workable instead. I think Java's best hopes are * add little optimizations like my one to the X86 version of the Java libraries, and call as native code; * add more functions to System that can use SSE2, but hide it. For example, a function to scan a byte array and detect the location of the first non-ASCII code value like my example. But the Java designers could only do this *after* it becomes clear what the useful functions are, and this will only happen *after* programmers have explored using the SSE2 instructions for non-mathematical uses like parsing; * add some kinds of annotations and datatypes to support small-grain parallelized/pipelined code, generalizing SSE2 or perhaps even just having direct equivalents to SSE intrinsics: @parallel(128) ? > On the standard textual XML front: As has been noted, Xerces and > woodstox can be made to run quite fast, but in practise, few people > know how do configure them accordingly, and to do so reliably, and > without conformance compromises. A red herring. Xerces' defaults are an issue unrelated to the merits of stimulating software developers to use modern C++ features instead of sticking to slow 90's features. (In any case, these optimisations are potentially also applicable to binary XML parsing as well as to real XML processing.) > Most users can't afford to study the complex reliability vs. > performance interactions of myriads of more or less static tuning knobs. Same fish. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|