[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Fast validating XML parser

  • From: noah_mendelsohn@u...
  • To: "Michael Kay" <mike@s...>
  • Date: Mon, 22 Oct 2007 17:18:23 -0400

RE:  Fast validating XML parser
I agree with Mike's intuition.  Beyond that, you'd have to do more than 
say "fast".  If you said something like:  on a 2Ghz Xeon we need to parse 
and validate 1000 messages documents per second, of average size 10K bytes 
each, with moderately dense markup, and throwing SAX events as the API, 
then it's possible that someone would have an intuition as to whether off 
the shelf parsers such as Xerces can do it.  Of course, your mileage will 
vary according to the details, but saying I need a fast parser is a bit 
like saying I need a fast car.  What you mean by fast may depend on 
whether you're driving Nascar, Formula 1, or just trying to make good time 
on a vacation.

For what it's worth, my group published a paper on some experimental work 
we did on high performance validation a few years ago.  The parser we 
described was a prototype, and it remains difficult (as far as I know) to 
find off the shelf parsers that give quite the speed we reported. 
Nonetheless, the paper includes some benchmarks for then-current versions 
of Xerces doing validation.  Those are not official Apache or IBM 
benchmarks, but they were run with some care, and I expect that Xerces has 
probably improved a bit in speed since then.  So, you might want to check 
out the paper.  It also explains in great detail some of the factors that 
we found to be issues when trying to parse and validate at high speed. 
Copies are available online at [1].  I suggest that unless you have a 
strong preference for html that you read the PDF version; the formatting 
is much better.

Noah

[1]  http://www2006.org/programme/item.php?id=5011

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Michael Kay" <mike@s...>
10/22/2007 03:42 PM
 
        To:     "'Llacuna, Phillip V'" <phillip.v.llacuna@l...>, 
<xml-dev@l...>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        RE:  Fast validating XML parser


I suspect that an off-the-shelf parser like Xerces is quite fast enough if 
your application invokes it intelligently. You might find parsers that are 
20% faster than that, but I think the order-of-magnitude improvement will 
come by changing your application architecture: in particular, change the 
driving code from Javascript to Java.
 
Xerces has a fairly high start-up cost so it's worth reusing the parser 
for multiple documents. However, that's more of a factor when your files 
are 200 bytes rather than 50K bytes.
 
Michael Kay
http://www.saxonica.com/

From: Llacuna, Phillip V [mailto:phillip.v.llacuna@l...] 
Sent: 22 October 2007 19:32
To: xml-dev@l...
Subject:  Fast validating XML parser

Hi:
 
We need a very fast validating XML parser and was wondering if anyone has 
any suggestions? Our project involves one main XML file with about 1200 
supporting XML files (each about 50KB or less). Our current environment 
calls on a java script to validate each file against the DTD, but it is 
painfully slow to process the complete project. We suspect that that the 
overhead in creating the java environment each time the script is called 
is slowing down the process. I have searched (and am still searching) the 
web for a good alternative. Any suggestions?
 
Phillip Llacuna
Multi-media Design Engineer
Lockheed Martin
Ph:   (651) 456-7152
Fax: (651) 456-2643
 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.