|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Fast validating XML parser
I agree with Mike's intuition. Beyond that, you'd have to do more than say "fast". If you said something like: on a 2Ghz Xeon we need to parse and validate 1000 messages documents per second, of average size 10K bytes each, with moderately dense markup, and throwing SAX events as the API, then it's possible that someone would have an intuition as to whether off the shelf parsers such as Xerces can do it. Of course, your mileage will vary according to the details, but saying I need a fast parser is a bit like saying I need a fast car. What you mean by fast may depend on whether you're driving Nascar, Formula 1, or just trying to make good time on a vacation. For what it's worth, my group published a paper on some experimental work we did on high performance validation a few years ago. The parser we described was a prototype, and it remains difficult (as far as I know) to find off the shelf parsers that give quite the speed we reported. Nonetheless, the paper includes some benchmarks for then-current versions of Xerces doing validation. Those are not official Apache or IBM benchmarks, but they were run with some care, and I expect that Xerces has probably improved a bit in speed since then. So, you might want to check out the paper. It also explains in great detail some of the factors that we found to be issues when trying to parse and validate at high speed. Copies are available online at [1]. I suggest that unless you have a strong preference for html that you read the PDF version; the formatting is much better. Noah [1] http://www2006.org/programme/item.php?id=5011 -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- "Michael Kay" <mike@s...> 10/22/2007 03:42 PM To: "'Llacuna, Phillip V'" <phillip.v.llacuna@l...>, <xml-dev@l...> cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: RE: Fast validating XML parser I suspect that an off-the-shelf parser like Xerces is quite fast enough if your application invokes it intelligently. You might find parsers that are 20% faster than that, but I think the order-of-magnitude improvement will come by changing your application architecture: in particular, change the driving code from Javascript to Java. Xerces has a fairly high start-up cost so it's worth reusing the parser for multiple documents. However, that's more of a factor when your files are 200 bytes rather than 50K bytes. Michael Kay http://www.saxonica.com/ From: Llacuna, Phillip V [mailto:phillip.v.llacuna@l...] Sent: 22 October 2007 19:32 To: xml-dev@l... Subject: Fast validating XML parser Hi: We need a very fast validating XML parser and was wondering if anyone has any suggestions? Our project involves one main XML file with about 1200 supporting XML files (each about 50KB or less). Our current environment calls on a java script to validate each file against the DTD, but it is painfully slow to process the complete project. We suspect that that the overhead in creating the java environment each time the script is called is slowing down the process. I have searched (and am still searching) the web for a good alternative. Any suggestions? Phillip Llacuna Multi-media Design Engineer Lockheed Martin Ph: (651) 456-7152 Fax: (651) 456-2643
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||






