[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XML Performance in a Transacation

  • To: Michael Champion <michael.champion@h...>, d_a_carver@y..., xml-dev@l...
  • Subject: RE: XML Performance in a Transacation
  • From: Tatu Saloranta <cowtowncoder@y...>
  • Date: Wed, 22 Mar 2006 15:07:45 -0800 (PST)
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=G/4OYsbHvFbK/BKGKcDH2XMfzCxsAzqKw6ry8EDMC7Y36gIfh5e+IbUJ7SiGKrquTNDk9OX5LEJWN9FEVvPxqmS0Byo5wpYGSHo3l/hoLFRVxbbTewwiQ8I4rsp6RnaQ+uOAOa6phONxiGrUj4VhKxvc7qqaC7ABG1R+knGjY+c= ;
  • In-reply-to: <BAY114-W1EBB1B3A0DB6B09329E9E99D90@p...>

increase performance of dom parsing
--- Michael Champion <michael.champion@h...>
wrote:

> > Date: Wed, 22 Mar 2006 16:19:56 -0500> From:
> d_a_carver@y...> To: xml-dev@l...>
> Subject:  XML Performance in a
> Transacation> > I've been requested to provide some
> numbers to show that actual XML > validation results
> and parsing are a small portion of the overall >
> transaction process, when dealing with XML in a B2B
> process.  Any > information that can be provided
> would be appreciated.
>
> See
>
http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf
> "XML Parsing - A Threat to Database Performance." 
> Be forewarned that the conclusion may be
> unpalatable:
>  
> "We reported real-world experiences of using XML
> with databases
> where XML parsing was the main performance
> bottleneck. This
> motivated an analysis of the cost of SAX parsing and
> DTD &
> XML schema validation. We find that parsing even
> small XML
> documents without validation can increase the CPU
> cost of a relational
> database transaction by 2 to 3 times or more.
> Parsing with
> schema validation and without grammar caching can
> increase
> transaction cost by 10 times or more. This is a
> serious problem for
> high performance transaction oriented database
> applications which
> intend to use XML"

As everyone else has said, it all depends on your
usage. But I personally think above comment is not
closely tied to current reality. I did a quick test on
my development system (see below for details); and
_raw_ parsing speed (java streaming parser that scans
through the whole doc, just counting stats for lengths
etc) were as follows:

* 43 MBps for big xml export/import files (1 MB, no
namespaces, but namespace aware parser)
  [== file parsed 1093 times during 30 seconds, from
disk ~= 30 milliseconds to parse]
* 37 MBps for big StarOffice xml content file (500 kB,
fully namespaced, lots of attributes)
  [2182 reads over 30 seconds ~= 15 milliseconds]
* 8 MBps for a small SOAP request (718 bytes)
  [322,000 times over 30 seconds ~= 0.1 milliseconds];
  the lower throughput is probably due to constant
  overhead of instantiating the parser instance.

XML content was read from a file, although in practice
Linux caches repeated disk access so it's equivalent
to from memory parsing (meaning i/o should not matter
a lot). System is plain old 3Ghz single-CPU intel
linux work station, with reasonably fast scsi disk.
Test was single-threaded, with 10 second warm up
period for the parser (parsing the same file as during
the test).

For comparison, simple scanning of file from Java is
less than 50% faster than xml parsing.

For my purposes, at least, xml parsing itself is not
the most significant performance overhead: it's all
xml processing above and beyond parsing. But I just
parse XML content and use it; no validation (DTD
validation seems to add 50% overhead for me [-> 35%
lower throughput], if DTD caching is enabled... just
as one data point)

Your mileage may vary. Specifically, if you have to
use in-memory document model (DOM etc), prepare to
reduce throughput by half an order of magnitude
(compared to simple streaming use case).

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.