[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: XML Performance in a Transacation
--- Michael Champion <michael.champion@h...> wrote: > > Date: Wed, 22 Mar 2006 16:19:56 -0500> From: > d_a_carver@y...> To: xml-dev@l...> > Subject: XML Performance in a > Transacation> > I've been requested to provide some > numbers to show that actual XML > validation results > and parsing are a small portion of the overall > > transaction process, when dealing with XML in a B2B > process. Any > information that can be provided > would be appreciated. > > See > http://lists.w3.org/Archives/Public/www-ws/2004Oct/att-0032/MNicola_CIKM_2003_1_.pdf > "XML Parsing - A Threat to Database Performance." > Be forewarned that the conclusion may be > unpalatable: > > "We reported real-world experiences of using XML > with databases > where XML parsing was the main performance > bottleneck. This > motivated an analysis of the cost of SAX parsing and > DTD & > XML schema validation. We find that parsing even > small XML > documents without validation can increase the CPU > cost of a relational > database transaction by 2 to 3 times or more. > Parsing with > schema validation and without grammar caching can > increase > transaction cost by 10 times or more. This is a > serious problem for > high performance transaction oriented database > applications which > intend to use XML" As everyone else has said, it all depends on your usage. But I personally think above comment is not closely tied to current reality. I did a quick test on my development system (see below for details); and _raw_ parsing speed (java streaming parser that scans through the whole doc, just counting stats for lengths etc) were as follows: * 43 MBps for big xml export/import files (1 MB, no namespaces, but namespace aware parser) [== file parsed 1093 times during 30 seconds, from disk ~= 30 milliseconds to parse] * 37 MBps for big StarOffice xml content file (500 kB, fully namespaced, lots of attributes) [2182 reads over 30 seconds ~= 15 milliseconds] * 8 MBps for a small SOAP request (718 bytes) [322,000 times over 30 seconds ~= 0.1 milliseconds]; the lower throughput is probably due to constant overhead of instantiating the parser instance. XML content was read from a file, although in practice Linux caches repeated disk access so it's equivalent to from memory parsing (meaning i/o should not matter a lot). System is plain old 3Ghz single-CPU intel linux work station, with reasonably fast scsi disk. Test was single-threaded, with 10 second warm up period for the parser (parsing the same file as during the test). For comparison, simple scanning of file from Java is less than 50% faster than xml parsing. For my purposes, at least, xml parsing itself is not the most significant performance overhead: it's all xml processing above and beyond parsing. But I just parse XML content and use it; no validation (DTD validation seems to add 50% overhead for me [-> 35% lower throughput], if DTD caching is enabled... just as one data point) Your mileage may vary. Specifically, if you have to use in-memory document model (DOM etc), prepare to reduce throughput by half an order of magnitude (compared to simple streaming use case). -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|