[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: JITTs and DOM


Re:  JITTs and DOM
Jeni,

Jeni Tennison wrote:

>Hi Patrick,
>
<snip>

>
>
>I'd be *very* careful about drawing any conclusions about speed up
>from these observations. What you've done for these observations is
>replace markup-significant characters (e.g. '<') with
>markup-insignificant characters (i.e. '@'), effectively turning whole
>regions of the document into plain text.
>
I said in my post that these were observations that suggest further 
investigation. The replacement was noted on the webpage as simulating 
the result of a JITTs parser. Yes, the operation of a JITTs parser would 
be to treat regions of the document into plain text. Sorry if that was 
not explicit in our earlier treatments of JITTs parsing. 

>
>I'm certainly no parser writer, but having hacked around the internals
>of Aelfred in order to create a parser for LMNL [1], I can see why
>this would cause a speed-up -- all a parser has to do to process these
>sections is scan through the text until it comes to a '<'; that
>scanning process is very fast.
>
>My understanding of JITTs is that you'd need a lot more sophistication
>in the parser. 
>
No.

>It wouldn't be enough to just ignore all the tags that
>the parser came across (which is what you've done in effect). Instead,
>the parser would have to read the tag, look at the name of the tag,
>check that against a list (from a DTD or schema) in order to work out
>what to do, and then either generate a "start/endElement" event or
>generate a "characters" event (to report the tag as a string)
>depending on the tag's status. If anything, I imagine that this will
>*add* time to the parsing of the document.
>
Parsers already build a tree from the DTD or schema in order to 
"recognize" the markup it encounters in the document. All JITTs would 
require is in the lookup step, where a parser now looks for the token in 
the tree is that upon failure, the parser starts reading input again. 
(That assumes you are using the suggested ignore option, with delete, it 
would drop the token from the imput string and continue reading input.)

<snip>

>
>In other words, it's certainly the case that JITTs has promise in
>terms of reducing DOM size and speeding up processing, but I don't
>think that these observations are really an accurate representation of
>what the effect will be. I don't know what you're planning for your
>next step, but a good demonstration, in my opinion, would be to show
>how the same XSLT transformation, say, is speeded up by working on a
>JITTs DOM compared to a normal DOM.
>
There are a number of demonstrations in the planning and XSLT will 
probably be one of the earlier ones.

Patrick

-- 
Patrick Durusau
Director of Research and Development
Society of Biblical Literature
pdurusau@e...




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.