Re: XML Performance in a Transacation

To: xml-dev@l...
Subject: Re: XML Performance in a Transacation
From: Rick Jelliffe <rjelliffe@a...>
Date: Fri, 24 Mar 2006 17:33:30 +1100
In-reply-to: <444C61A3-72FB-44D6-9ECC-2B5839CFA713@m...>
References: <200603231325.k2NDPBG8029517@m...> <44236A54.4060302@a...> <444C61A3-72FB-44D6-9ECC-2B5839CFA713@m...>
User-agent: Mozilla Thunderbird 0.6 (X11/20040502)

Play the video

Wolfgang Hoschek wrote:

> This is even though the conversion routines are highly  optimized, 
> taking full advantage of pure or partial ASCII valued  data, similar 
> in spirit to the technique your blog mentions (except  that it's in Java).

Oh, if it is Java it is not really similar in spirit. The point of the 
blog is not scanning ahead for non-ASCII codes but on taking advantage 
of the parallelism/pipelining functions in current CPUs, as exposed by 
C++ Intrinsic funcitons, to speed things up.  I apologize for not being 
clear.

> I do have some hope that future VMs with better  dynamic optimization 
> logic for memory prefetching, bulk operations,  etc. could make more 
> of a difference here, though. Care to explain  why a dynamic optimizer 
> couldn't get close to what those handcoded  assembler routines do, in 
> particular considering modern memory  latencies?

It is highly unlikely that a programmer would write code that is readily 
parallelizable* into optimal SSE2 instructions unless they knew SSE2's 
constraints in the first place.  They have to process data in 128-bit 
chunks. The data has to be aligned on certain memory boundaries. Only 
some kinds of data are allowed. Only some kinds of operations are 
available. Almost any call to a function or method will break the 
pipeline.  Expressions have to be written with certain variable-writing 
constraints to prevent pipeline stalling. Expressions have to be written 
to interleave use of different execution units in the CPU.

(*I say parallelizable, because the intrinsics make pipelined 
instructions look to the programmer like parallel instructions.)

The reason that current C++ compilers don't attempt to do anything 
sophisticated with parallelization is that it is too hard and 
defeatable. Providing built-in Intrinsic functions which act on special 
built-in 128bit data types had turned out to be workable instead.

I think Java's best hopes are
  * add little optimizations like my one to the X86 version of the Java 
libraries, and call as native code;

  * add more functions to System that can use SSE2, but hide it. For 
example, a function to scan a byte array and detect the location of the 
first non-ASCII code value like my example. But the Java designers could 
only do this *after* it becomes clear what the useful functions are, and 
this will only happen *after* programmers have explored using the SSE2 
instructions for non-mathematical uses like parsing;

 * add some kinds of annotations and datatypes to support small-grain 
parallelized/pipelined code, generalizing SSE2 or perhaps even just 
having direct equivalents to SSE intrinsics:   @parallel(128)  ?

> On the standard textual XML front: As has been noted, Xerces and  
> woodstox can be made to run quite fast, but in practise, few people  
> know how do configure them accordingly, and to do so reliably, and 
> without conformance compromises.

A red herring.  Xerces' defaults are an issue unrelated to the merits of 
stimulating software developers to use modern C++ features instead of 
sticking to slow 90's features.

(In any case, these optimisations are potentially also applicable to 
binary XML parsing as well as to real XML processing.)

> Most users can't  afford to study the complex reliability vs. 
> performance interactions  of myriads of more or less static tuning knobs.

Same fish.

Cheers
Rick Jelliffe

Follow-Ups:
- Re: XML Performance in a Transacation
  - From: Wolfgang Hoschek <wolfgang.hoschek@m...>

References:
- Re: XML Performance in a Transacation
  - From: Rick Jelliffe <rjelliffe@a...>
- Re: XML Performance in a Transacation
  - From: Wolfgang Hoschek <wolfgang.hoschek@m...>

Prev by Date: Re: XML Performance in a Transacation
Next by Date: Re: XML Performance in a Transacation
Previous by thread: Re: XML Performance in a Transacation
Next by thread: Re: XML Performance in a Transacation
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >