[Home] [By Thread] [By Date] [Recent Entries]
> -----Original Message----- > From: Jeff Lowery [mailto:jlowery@s...] > Sent: Friday, January 25, 2002 1:32 PM > To: 'Sterin, Ilya' > Cc: 'Dare Obasanjo '; 'xml-dev@l... ' > Subject: RE: Push and Pull? > > > > > > > > pull- kXML: the parser says where it is, the DH tells the > > parser which > > > way > > > to go. In other words, the document handler has the parser > > > pull only the > > > data it's interested in; branches get skipped. The > document does not > > > have to > > > be in-memory > > > > I see where you are getting with this, but do you mean that > only the > > branches that are told to be relavant are kept in memory, > or neither. > > Because if nothing is cached, than this goes bac to the pull idea. > > Hmmm. The pull-parser sends events, same as the push-parser. > The data for the current node (name, plus maybe attributes if > the node is an element) is stored in the event. It may not be > much data, but it is "in-memory" at that point in time. I actually ment in memory after the parsing process is ended. The thing that I guess is confusing, is that DOM processors are in a way neither push nor pull, but are rather, as Joe pointed out, a slurp (in his words:-) model. The application has absolutely no control, other than possibly setting parameters before parsing, of the parsing process, and presents the structure afterwards. > > The document handler may take that event and store it in the > application's internal cache. Or it may just call the event's > toString() method that spits the event information out to, > say, stdout. In that case, the "in-memory" component is very > transitory. > > The main difference between push and pull is that the > push-parser iterates depth-first through every node in the > document. The pull parser can be directed to skip branches, > so you don't get the subevents generated for nodes on those > branches. But in practice the pull parser still iterates through each branch in the parser, but might only return certain parts. I guess this goes back to Joe's explanation again. pull - application notifies the parser. push - parser notifies the application Ilya If, say, you skip processing of an element, the > push parser just zips through the document content looking > for the skipped element's end tag, and then generates its > next event from the tag that follows. This can be more > efficient for grabbing document data that's sparsely > distributed. Less events generated = less overhead. > > So I don't think the two are in anyway equivalent, unless you > plan on visiting every node in the document anyway, in which > case they function pretty much the same. > > > > > > > > > > batch (or ??)- DOM: A single method loads the document > > in-memory. The > > > app > > > then navigates at its leisure > > > > Ok, but I would consider this pull as well. Why do you not > > think that the > > pull concept applies here. Instead of telling the parser > > what to load as in > > your pull explanation above, you are in a way telling it to > > load the whole > > document and make all nodes relavant. Then I think we are > in the pull > > concept area again? > > Right, but in a push-parser, you can choose what to cache; > you can grab parts of a document. With a DOM parse method, > you get the whole document. It's the difference between being > served a seven-course meal (batch) and choosing from a buffet (pull). > > > > > > > > > > > fully directed- (XQuery??)- the document handler builds > instructions > > > that the parser uses to navigate and return data from the > document. > > > > That's a new way to look at it for me. Have to give it a > > thought. I would > > think XQuery would fit into the DOM/pull (batch) category, > > since it just > > uses a different access syntax, but the document is accessed > > the same way. > > Similar to pull, but here you're telling the parser up front > what it is you're looking for, rather than directing it in > real time. The problem with pull parsing is that you may skip > over nodes that you're later interested in if you're not > careful (sloppy coding); with the fully-directed approach, > you get everything you intend to get because you specified it > up front. The directed parser can then order you "queries" so > that they are optimized for a single pass-through (unless > there are navigation dependencies on the document content > which may preclude a single pass-through). This assumes, in > both cases, you know in advance what you're looking for. > > Now, I've used three of these approaches in my own work. I > haven't used a fully-directed parser (maybe the term > pre-directed is better), although I envisage it as being > something similar to SQL queries on a database. That's why I > mention XQuery. The key difference is that a document in a > file is sequential access, not random access as in a > database, so I'm not fully cognizant of what optimizations > are possible in a sequential access situation. > > Take care, > > Jeff >
|

Cart



