RE: XML Performance in a Transacation

To: Michael Champion <michael.champion@h...>, d_a_carver@y..., xml-dev@l...
Subject: RE: XML Performance in a Transacation
From: Tatu Saloranta <cowtowncoder@y...>
Date: Wed, 22 Mar 2006 15:07:45 -0800 (PST)
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=G/4OYsbHvFbK/BKGKcDH2XMfzCxsAzqKw6ry8EDMC7Y36gIfh5e+IbUJ7SiGKrquTNDk9OX5LEJWN9FEVvPxqmS0Byo5wpYGSHo3l/hoLFRVxbbTewwiQ8I4rsp6RnaQ+uOAOa6phONxiGrUj4VhKxvc7qqaC7ABG1R+knGjY+c= ;
In-reply-to: <BAY114-W1EBB1B3A0DB6B09329E9E99D90@p...>

Play the video

--- Michael Champion <michael.champion@h...>
wrote:

> > Date: Wed, 22 Mar 2006 16:19:56 -0500> From:
> d_a_carver@y...> To: xml-dev@l...>
> Subject:  XML Performance in a
> Transacation> > I've been requested to provide some
> numbers to show that actual XML > validation results
> and parsing are a small portion of the overall >
> transaction process, when dealing with XML in a B2B
> process.  Any > information that can be provided
> would be appreciated.
>
> See
>
http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf
> "XML Parsing - A Threat to Database Performance." 
> Be forewarned that the conclusion may be
> unpalatable:
>  
> "We reported real-world experiences of using XML
> with databases
> where XML parsing was the main performance
> bottleneck. This
> motivated an analysis of the cost of SAX parsing and
> DTD &
> XML schema validation. We find that parsing even
> small XML
> documents without validation can increase the CPU
> cost of a relational
> database transaction by 2 to 3 times or more.
> Parsing with
> schema validation and without grammar caching can
> increase
> transaction cost by 10 times or more. This is a
> serious problem for
> high performance transaction oriented database
> applications which
> intend to use XML"

As everyone else has said, it all depends on your
usage. But I personally think above comment is not
closely tied to current reality. I did a quick test on
my development system (see below for details); and
_raw_ parsing speed (java streaming parser that scans
through the whole doc, just counting stats for lengths
etc) were as follows:

* 43 MBps for big xml export/import files (1 MB, no
namespaces, but namespace aware parser)
  [== file parsed 1093 times during 30 seconds, from
disk ~= 30 milliseconds to parse]
* 37 MBps for big StarOffice xml content file (500 kB,
fully namespaced, lots of attributes)
  [2182 reads over 30 seconds ~= 15 milliseconds]
* 8 MBps for a small SOAP request (718 bytes)
  [322,000 times over 30 seconds ~= 0.1 milliseconds];
  the lower throughput is probably due to constant
  overhead of instantiating the parser instance.

XML content was read from a file, although in practice
Linux caches repeated disk access so it's equivalent
to from memory parsing (meaning i/o should not matter
a lot). System is plain old 3Ghz single-CPU intel
linux work station, with reasonably fast scsi disk.
Test was single-threaded, with 10 second warm up
period for the parser (parsing the same file as during
the test).

For comparison, simple scanning of file from Java is
less than 50% faster than xml parsing.

For my purposes, at least, xml parsing itself is not
the most significant performance overhead: it's all
xml processing above and beyond parsing. But I just
parse XML content and use it; no validation (DTD
validation seems to add 50% overhead for me [-> 35%
lower throughput], if DTD caching is enabled... just
as one data point)

Your mileage may vary. Specifically, if you have to
use in-memory document model (DOM etc), prepare to
reduce throughput by half an order of magnitude
(compared to simple streaming use case).

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

References:
- RE: XML Performance in a Transacation
  - From: "Michael Champion" <michael.champion@h...>

Prev by Date: Re: XML Performance in a Transacation
Next by Date: RE: XML Performance in a Transacation
Previous by thread: RE: XML Performance in a Transacation
Next by thread: RE: XML Performance in a Transacation
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >