RE: XML Schema: "Best used with the ______ tool"

From: "Michael Kay" <mike@s...>
To: "'Boris Kolpackov'" <boris@c...>
Date: Fri, 5 Dec 2008 09:33:47 -0000

Play the video

> 
> The test runs 100 queries with various max_age values. I was 
> hoping you would implement the same test in Java using Saxon 
> instead of simply running the query from the command line. I 
> don't think it is meaningful to compare the result of the 
> benchmark to these numbers since they are representing only 
> one query and are missing a substantial part of the 
> benchmark, notably interfacing with the non-XML data sources.

Sorry, but I couldn't find a description of the non-XML data sources.
Perhaps it was buried inside your C++ code - I'm afraid I can't read C++
without great difficulty.

And I'm talking about writing applications that use XML end-to-end; to do
that, I need to see a description of the application in terms of its user
input and user output. Of course converting the results to Java or C will
add cost and overhead, which is why I advise against doing it.

I didn't see any advantage in writing a Java harness for this query when I
could get all the numbers I needed running it from the command line (which
itself of course just invokes an existing Java harness).
> 
> Even if we ignore all this (as well as that you have a faster 
> CPU), your average query execution time is 15.0ms vs 0.9ms 
> for data binding, which makes data binding over 15 times 
> faster on this benchmark.

That's one of the numbers. But I also showed that the execution time was
trivially small compared with the data loading time, meaning that this
difference is irrevelant. How long did it take to load the data in the
data-binding version of the code?
> 
> > I haven't tried making the query schema-aware, but I did 
> try modifying 
> > it to do an integer comparison on age ($x/xs:integer(@age) 
> < $age) and 
> > this reduces the execution times marginally, to 12.0ms and 
> 17.2ms respectively.
> 
> Do you expect an average user to do this kind of optimizations?

Well, I certainly don't expect the average user to be prepared to write this
in C++! No, clearly, most users having written a one-liner for this query
that executes in under 30ms will not attempt to optimize it any further. I
was just exploring what further improvements were possible.
> 
> The test executes 100 queries with varying genders and max 
> ages. The time measurement excludes XML parsing and includes 
> passing input parameters to the query and extracting the 
> result. Check the xquery.cxx file for details.

As I say, C++, is totally impenetrable to me.
> 
> That would be the case if we ran only one query. But 
> remember, we are testing repetitive access to the data.

Well, you didn't actually say that. Clearly, the longer you keep the data
around and the more you use it, the more investment it is worth putting into
loading it into a suitable form for querying, building indexes, etc. But how
typical is this? Most scenarios I see have two kinds of data - persistent
data that lives in a database, and transient data that arrives in an input
message, is transformed, and results in an output message. Yes, there are
some in-memory lookup datasets as well, but I've never worked on an
application where they were critical to the performance. In fact, most
people are quite happy with the performance they can get accessing such data
using SQL.

> 
> Surely applications that run multiple queries will notice the 
> 15 times speedup.

Only if the query time is a significant part of the total, which I think is
unlikely. Even serializing the results for display will take much longer
than this.
> 
> As far as I remember, I stated that in a scenario with 
> repetitive access to most of the data, data binding will have 
> an advantage. You asked for evidence and I believe I have 
> shown that it can certainly be the case. Here are the relevant quotes:
> 
You also said there would be an advantage for a single query that accesses
most of the data.
> 
> Michael Kay <mike@s...> writes:
> 
> > Boris Kolpackov <boris@c...> writes:
> >
> > > I agree with Dennis here in that XQuery can be usable when
> > > you need to access a small subset of an XML document.
> > > However, when one needs to access most of the data, or,
> > > worse, access the same data many times, data binding will
> > > have speed/memory advantage.
> >
> > Evidence please! I don't see any reason why it should.
> 
> 
> Boris
> 
> -- 
> Boris Kolpackov, Code Synthesis Tools   
> http://codesynthesis.com/~boris/blog
> Open source XML data binding for C++:   
> http://codesynthesis.com/products/xsd
> Mobile/embedded validating XML parsing: 
> http://codesynthesis.com/products/xsde

Follow-Ups:
- Re: XML Schema: "Best used with the ______ tool"
  - From: "Mukul Gandhi" <gandhi.mukul@g...>

References:
- Re: XML Schema: "Best used with the ______ tool"
  - From: Boris Kolpackov <boris@c...>
- RE: XML Schema: "Best used with the ______ tool"
  - From: "Michael Kay" <mike@s...>
- Re: XML Schema: "Best used with the ______ tool"
  - From: Boris Kolpackov <boris@c...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >