[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML processing experiments
>I also tried LT XML (which is written in C). I didn't find a program that >did nothing but parsing. The fastest one I found was the sgcount >program (which counts the number of each element type); it took about 11 >seconds. That's much slower than I expected; I suspect there may be >some Windows-specific performance problems. It's true that we do our development under unix, and I don't have any benchmarks for MS Windows. I just ran "sgcount <ot.xml" on an AMD K5 PR-100 (supposedly equivalent to a 100MHz Pentium) under FreeBSD, and it took 6.8 seconds. This suggests that we run about twice as fast under unix as MS Windows, which is something we will have to look into. But in any case, the currently-released version of LT-XML (0.9.5) is far too slow on all platforms. The next version, which we hope to release by the end of the year, has a completely new parser and is roughly three times as fast. Why is the old version so slow? - It's written in yacc and lex. I didn't expect this to be slow, but profiling shows that it's spending most of its time in the yacc and lex internals, which we can't do much about. The new version is written in plain C, and I actually think it's much clearer. Yacc is not well-suited to the sort of context-dependent tokenising that is required in DTDs. We had to abandon lex anyway to handle 16-bit characters. - It does a malloc() and free() for every start tag, end tag, attribute name, attribute value, and pcdata. The new version only does that for attribute values and pcdata. Another reason that both versions are slower than the desperate C hacker's programs is that they maintain a stack of input sources to implement entity expansion. This adds an overhead even when entities are not being expanded. The figures above are all for 8-bit-character systems. The next release will have a compile-time option to support 16-bit characters. I expect the 16-bit version to be about 30% slower than the 8-bit version (for the same 8-bit data). We also plan to release the parser itself separately from the rest of the LT-XML/LT-NSL toolkit, for use in programs that just need an XML parser. I expect it be about 25% faster than the LT-XML version, just because a layer is removed. > >I was quite surprised that there was such a big performance difference > >between real, conforming XML processing that does well-formedness > >checking, and quick and dirty XML processing that does the minimum > >necessary to get the correct result. This doesn't seem right to me... It isn't, and we're hoping to reduce it. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|