[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML processing experiments

  • From: Richard Tobin <richard@c...>
  • To: xml-dev@i...
  • Date: Tue, 4 Nov 1997 15:46:00 GMT

lex internals
>I also tried LT XML (which is written in C). I didn't find a program that
>did nothing but parsing.  The fastest one I found was the sgcount
>program (which counts the number of each element type); it took about 11
>seconds.  That's much slower than I expected; I suspect there may be
>some Windows-specific performance problems.

It's true that we do our development under unix, and I don't have any
benchmarks for MS Windows.  I just ran "sgcount <ot.xml" on an AMD K5
PR-100 (supposedly equivalent to a 100MHz Pentium) under FreeBSD, and
it took 6.8 seconds.  This suggests that we run about twice as fast
under unix as MS Windows, which is something we will have to look
into.

But in any case, the currently-released version of LT-XML (0.9.5) is
far too slow on all platforms.  The next version, which we hope to
release by the end of the year, has a completely new parser and is
roughly three times as fast.

Why is the old version so slow?

- It's written in yacc and lex.  I didn't expect this to be slow, but
profiling shows that it's spending most of its time in the yacc and lex
internals, which we can't do much about.  The new version is written in
plain C, and I actually think it's much clearer.  Yacc is not well-suited
to the sort of context-dependent tokenising that is required in DTDs.  We
had to abandon lex anyway to handle 16-bit characters.

- It does a malloc() and free() for every start tag, end tag, attribute
name, attribute value, and pcdata.  The new version only does that for
attribute values and pcdata.

Another reason that both versions are slower than the desperate C
hacker's programs is that they maintain a stack of input sources to
implement entity expansion.  This adds an overhead even when entities
are not being expanded.

The figures above are all for 8-bit-character systems.  The next
release will have a compile-time option to support 16-bit characters.
I expect the 16-bit version to be about 30% slower than the 8-bit
version (for the same 8-bit data).

We also plan to release the parser itself separately from the rest of
the LT-XML/LT-NSL toolkit, for use in programs that just need an XML
parser.  I expect it be about 25% faster than the LT-XML version, just
because a layer is removed.

> >I was quite surprised that there was such a big performance difference
> >between real, conforming XML processing that does well-formedness
> >checking, and quick and dirty XML processing that does the minimum
> >necessary to get the correct result.  This doesn't seem right to me...

It isn't, and we're hoping to reduce it.

-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.