[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: JITTs and DOM


jitts
Hi Patrick,

> Thought there might be some interest in a very short note I have
> posted to the JITTs page on the impact of the JITTs paradigm on DOM
> performance.
>
> Jump from the JITTs page, http://www.sbl-site2.org/Extreme2002 or go
> there directly at:
> http://www.sbl-site2.org/Extreme2002/JITTs_and_DOM.html.
>
> Running on a laptop, gains of 26 to 39 times were observed. (As file
> sizes increases, so do the performance increases.) Note I do not
> characterize these as "tests," merely observations. Further work is
> needed.

(Very glad to see the well-formed XML example files :)

I'd be *very* careful about drawing any conclusions about speed up
from these observations. What you've done for these observations is
replace markup-significant characters (e.g. '<') with
markup-insignificant characters (i.e. '@'), effectively turning whole
regions of the document into plain text.

I'm certainly no parser writer, but having hacked around the internals
of Aelfred in order to create a parser for LMNL [1], I can see why
this would cause a speed-up -- all a parser has to do to process these
sections is scan through the text until it comes to a '<'; that
scanning process is very fast.

My understanding of JITTs is that you'd need a lot more sophistication
in the parser. It wouldn't be enough to just ignore all the tags that
the parser came across (which is what you've done in effect). Instead,
the parser would have to read the tag, look at the name of the tag,
check that against a list (from a DTD or schema) in order to work out
what to do, and then either generate a "start/endElement" event or
generate a "characters" event (to report the tag as a string)
depending on the tag's status. If anything, I imagine that this will
*add* time to the parsing of the document.

[I guess an alternative approach would be to run the character stream
through a pre-processor that changed particular '<' characters into
'@' characters, though unless you could be sure that *all* elements
called "foo" could be effectively ignored (i.e. that you didn't have
any local declarations of elements, whereby some should be ignored and
some taken into account), it's probably worthwhile taking the approach
above.]

On the positive side, filtering a document will lead to fewer events
being generated and fewer objects being created in the DOM, which
should make the DOM smaller and should make the process *after*
parsing that much quicker.

In other words, it's certainly the case that JITTs has promise in
terms of reducing DOM size and speeding up processing, but I don't
think that these observations are really an accurate representation of
what the effect will be. I don't know what you're planning for your
next step, but a good demonstration, in my opinion, would be to show
how the same XSLT transformation, say, is speeded up by working on a
JITTs DOM compared to a normal DOM.

Cheers,

Jeni

[1] http://www.lmnl.org/projects/LMNOP

---
Jeni Tennison
http://www.jenitennison.com/


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.