[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML-enabled databases, XQuery APIs


python xquery
Hi Ken,

You've found some quite interesting references there.

One point I would like to make is from the python link 
(http://www.oreillynet.com/pub/wlg/6291) where mention
is made to the assumption of parsing 8-bit text documents 
when Unicode docs may be the norm in the future.

As an old assembly language programmer, I can't help
but rub my hands with joy as this seems to be a sort of 32-bit
challenge.

Unicode is (according to my understanding) an 8-bit escaping system. That is 
if the character is extended, it is written into a second, third and then 
consecutive bytes if required.

What a pain to chomp on a 32-bit processor..

So to do really *fast* unicode stuff, ideally, the in-memory view wouldn't 
store the characters in 8-bit, but just as 32-bit (4 byte) or 64-bit (8 byte) 
strings.

Of course, I think this is a few months away. But true 32-bit in memory 
processing of XML might be fun.

Maybe we might be having a year of benchmarks and crunch testing.

btw, if you have 2500 documents to crunch, you can add macines to your lan and 
crunch them on multiple machines. It's not a purely linear process. 

Best regards

David

On Tue, 19 Apr 2005 5:44 pm, Ken North wrote:
> > Michael Kay wrote:
> > > I would be very surprised if XML parsing contributes anything
> > > noticeable to the cost of a database load (in shredding mode).
>
> These benchmarks were run on different testbeds so this isn't an
> apples-to-apples comparison.
>
> This parser performance test of Expat (C, SAX) reported a best time of 0.05
> sec to parse an 884K document with 32K nodes. For 2500 documents, that
> would be approx. 125 seconds.
> http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html
>
> This benchmark compared two SQL APIs. It was written in C and executed in a
> client-server mode, so there was network latency. It used an SQL INSERT,
> not a bulk load.
> http://www.datadirect.com/techres/odbc/docs/wp_odbcvsoci.pdf
>
> The average time to INSERT 2500 rows with ODBC was 23.53 seconds. The
> minimum execution time for an SQL SELECT query to return 2500 rows was 0.05
> sec.
>
> This single-cpu Java benchmark parsed simpler documents than the Expat test
> and the data was closer to the tables in the SQL API test. It took about 2
> seconds to parse 10,000
> records using SAX, or about .5 seconds to parse 2500 records.
> http://www.devsphere.com/xml/benchmark/method.html
>
> This Python benchmark took between 2.32 and 3.97 seconds for a 3.3 MB
> document. http://www.oreillynet.com/pub/wlg/6291
>
> My guess is if these benchmarks were all run on the same testbed under the
> same conditions, we'd see a repeatable pattern:
>
> The parsing overhead for loading a database is negligible if we're
> processing simple documents, but becomes more significant as the documents
> increase in size.
>
>
>
>
>
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
>
> The list archives are at http://lists.xml.org/archives/xml-dev/
>
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>

-- 
Computergrid : The ones with the most connections win.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.