[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML Performance question

  • From: Marcelo Cantos <marcelo@m...>
  • To: xml-dev@i...
  • Date: Tue, 6 Apr 1999 10:26:44 +1000

expat xml performance
On Mon, Apr 05, 1999 at 09:25:48AM -0400, Lippmann, Jens wrote:
> Following the XML for the last couple month, I am surprised how little
> attention is paid to performance. My  optimistic personality leads me to the
> conclusion that performance is not an issue. :) 
> 
> However, I would be very interested on an expert's guess on the following
> problem:
> 
> Assume the following XML document:
> 
> <PORTFOLIO>
>    <ACCOUNT MANAGER="Joe Smith" ID="000001">
>       <AUDIT DATE="03/31/1999">
>          <SECURITYDESC>
>             <SECURITY>
>                <CUSIP>0815</CUSIP>
>                <PRICE CURRENCY="US">4289.23</PRICE>
>                <TRADEDSHARES>4289.23</TRADEDSHARES>
>             </SECURITY>
>          </SECURITYDESC>
>       </AUDIT>
>    </ACCOUNT>
> </PORTFOLIO>
>  
> 
> Each document will contain about 10^4 <SECURITY> elements each will contain
> between 10 - 10^2 child tags, and I have to handle about 10^2 documents a
> day, i.e. we're dealing with 10^7 to 10^8 tags. So far, the benchmarks I've
> got are pretty devastating.  I have to visit every sub-element
> of  <SECURITY> at least once during the number crunching and I cannot keep
> everything in memory. I am considering one of the XML repositories to help
> me with the job.

I just ran one million elements through SP with a scripting language
on top of it.  The run took 7m 15s.  This extrapolates to 12 hours for
10^8 tags.  This could easily be sped up by:

  1. Using expat instead of SP (this is makes a _big_ difference).
  2. Accessing the data from C++ rather than a script language.
  3. Shortening your element names (currently they overload
     the data; they seem to incur roughly a 12% performance hit, and
     this would get much worse if you were looking for specific
     elements during parsing).

I ran some brief tests, handling 10^6 elements with no processing
(beyond parsing, that is), using expat in C.  It completed in just
under 2 minutes.  This would suggest that 100 of the largest possible
documents would take approximately 3h 20m.

This extremely rough analysis suffices to establish some idea of the
lower the bound for your problem.  It doesn't address the full
complexity of your situation, since we don't know the specifics of
what you are trying to achieve.

Also note that these figures were acquired using an event model,
rather than a parse tree.  This can have a significant impact on the
performance.  It may well be that your processing requirements don't
permit an event-based approach, in which case the above figures are
meaningless (this situation is less likely than is commonly perceived,
however).

Finally, note that this was all done in one thread (a 333 UltraSPARC).
Multiple threads could potentially improve this figure substantially.
Spreading the second test across 2 cpus brought the time down to 70
seconds (2 hours for 100 documents).  Of course, this depends on your
hardware.


Cheers,
Marcelo

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.