|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML-enabled databases, XQuery APIs
Hi Ken, You've found some quite interesting references there. One point I would like to make is from the python link (http://www.oreillynet.com/pub/wlg/6291) where mention is made to the assumption of parsing 8-bit text documents when Unicode docs may be the norm in the future. As an old assembly language programmer, I can't help but rub my hands with joy as this seems to be a sort of 32-bit challenge. Unicode is (according to my understanding) an 8-bit escaping system. That is if the character is extended, it is written into a second, third and then consecutive bytes if required. What a pain to chomp on a 32-bit processor.. So to do really *fast* unicode stuff, ideally, the in-memory view wouldn't store the characters in 8-bit, but just as 32-bit (4 byte) or 64-bit (8 byte) strings. Of course, I think this is a few months away. But true 32-bit in memory processing of XML might be fun. Maybe we might be having a year of benchmarks and crunch testing. btw, if you have 2500 documents to crunch, you can add macines to your lan and crunch them on multiple machines. It's not a purely linear process. Best regards David On Tue, 19 Apr 2005 5:44 pm, Ken North wrote: > > Michael Kay wrote: > > > I would be very surprised if XML parsing contributes anything > > > noticeable to the cost of a database load (in shredding mode). > > These benchmarks were run on different testbeds so this isn't an > apples-to-apples comparison. > > This parser performance test of Expat (C, SAX) reported a best time of 0.05 > sec to parse an 884K document with 32K nodes. For 2500 documents, that > would be approx. 125 seconds. > http://okmij.org/ftp/Scheme/SSAX-benchmark-1.html > > This benchmark compared two SQL APIs. It was written in C and executed in a > client-server mode, so there was network latency. It used an SQL INSERT, > not a bulk load. > http://www.datadirect.com/techres/odbc/docs/wp_odbcvsoci.pdf > > The average time to INSERT 2500 rows with ODBC was 23.53 seconds. The > minimum execution time for an SQL SELECT query to return 2500 rows was 0.05 > sec. > > This single-cpu Java benchmark parsed simpler documents than the Expat test > and the data was closer to the tables in the SQL API test. It took about 2 > seconds to parse 10,000 > records using SAX, or about .5 seconds to parse 2500 records. > http://www.devsphere.com/xml/benchmark/method.html > > This Python benchmark took between 2.32 and 3.97 seconds for a 3.3 MB > document. http://www.oreillynet.com/pub/wlg/6291 > > My guess is if these benchmarks were all run on the same testbed under the > same conditions, we'd see a repeatable pattern: > > The parsing overhead for loading a database is negligible if we're > processing simple documents, but becomes more significant as the documents > increase in size. > > > > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> -- Computergrid : The ones with the most connections win.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








