[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: JITTs and DOM
Hi Patrick, > Thought there might be some interest in a very short note I have > posted to the JITTs page on the impact of the JITTs paradigm on DOM > performance. > > Jump from the JITTs page, http://www.sbl-site2.org/Extreme2002 or go > there directly at: > http://www.sbl-site2.org/Extreme2002/JITTs_and_DOM.html. > > Running on a laptop, gains of 26 to 39 times were observed. (As file > sizes increases, so do the performance increases.) Note I do not > characterize these as "tests," merely observations. Further work is > needed. (Very glad to see the well-formed XML example files :) I'd be *very* careful about drawing any conclusions about speed up from these observations. What you've done for these observations is replace markup-significant characters (e.g. '<') with markup-insignificant characters (i.e. '@'), effectively turning whole regions of the document into plain text. I'm certainly no parser writer, but having hacked around the internals of Aelfred in order to create a parser for LMNL [1], I can see why this would cause a speed-up -- all a parser has to do to process these sections is scan through the text until it comes to a '<'; that scanning process is very fast. My understanding of JITTs is that you'd need a lot more sophistication in the parser. It wouldn't be enough to just ignore all the tags that the parser came across (which is what you've done in effect). Instead, the parser would have to read the tag, look at the name of the tag, check that against a list (from a DTD or schema) in order to work out what to do, and then either generate a "start/endElement" event or generate a "characters" event (to report the tag as a string) depending on the tag's status. If anything, I imagine that this will *add* time to the parsing of the document. [I guess an alternative approach would be to run the character stream through a pre-processor that changed particular '<' characters into '@' characters, though unless you could be sure that *all* elements called "foo" could be effectively ignored (i.e. that you didn't have any local declarations of elements, whereby some should be ignored and some taken into account), it's probably worthwhile taking the approach above.] On the positive side, filtering a document will lead to fewer events being generated and fewer objects being created in the DOM, which should make the DOM smaller and should make the process *after* parsing that much quicker. In other words, it's certainly the case that JITTs has promise in terms of reducing DOM size and speeding up processing, but I don't think that these observations are really an accurate representation of what the effect will be. I don't know what you're planning for your next step, but a good demonstration, in my opinion, would be to show how the same XSLT transformation, say, is speeded up by working on a JITTs DOM compared to a normal DOM. Cheers, Jeni [1] http://www.lmnl.org/projects/LMNOP --- Jeni Tennison http://www.jenitennison.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|