[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML too hard for programmers?

  • To: rog@v...
  • Subject: Re: XML too hard for programmers?
  • From: Aleksander Slominski <aslom@c...>
  • Date: Mon, 24 Mar 2003 16:51:36 -0500
  • Cc: xml-dev@l...
  • In-reply-to: <907083ff912ee41c723462180ef360ad@v...>
  • References: <907083ff912ee41c723462180ef360ad@v...>
  • User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2) Gecko/20030210

skipsubtree
rog@v... wrote:

>>if you want minimal memory overhead (and not just create DOM and
>>navigate it) you can record XML context of one position in file (that
>>would include i-scope namespace declarations, stack of start tags,
>>attributes etc.)  and use it to move back parser and then restart
>>parsiing from this position though i have not seen parser that can do
>>this ...
>>    
>>
>
>My parser does that.  For example, when I parse an ebook, I lay it out
>a page at a time, and mark the position in the XML of the content that
>starts each page.  I then write an index file containing all the marks
>for each page.
>
hi,

that is hwat i was suspecting :-)

>Once a document has been indexed, it's very quick to, say, open the
>document and jump to the 200th page, or to jump back quickly page by
>page, without storing all the XML for each page.
>
>The drawbacks are that: a) if the document changes, you have to
>reindex everything and b) if any of the display attributes (e.g.  text
>size, line spacing, etc) changes, you have to reindex everything.
>
you are getting close to creating an untra lightweight XML database - 
you index interesting informationand need to retrieve it later so you 
could have some database abstraction layer to make this to happen 
automatic when you change any part of document (this layer could also 
transparently handle slicing XML input into fragments that are stored 
and modified independently to minimize need to re-indexing).

>All I record for a mark is the offset in the file, the read depth and
>the tags of each level of nesting.  I don't know anything about
>i-scope namespace declarations (I said I was hopelessly naive!)
>
it was supposed to be "in-scope" namespace delcarations - if you do not 
use namespaces then you do not need to record this.

>>and here is how it could be done in XmlPull (for details see: 
>>http://www.extreme.indiana.edu/~aslom/xmlpull/patterns.html#ANY_ORDER)
>>    
>>
>[...]
>  
>
>>			 wrapper.skipSubTree();
>>    
>>
>
>I think the advantage of having the nesting level explicit in the
>parsing is that the parser is in a position to deal reasonably
>robustly with malformed XML, without aborting.
>
>I started off aborting with an error on any mismatched tag, but I
>found that in practise, files I was finding on the net had a plethora
>of minor errors, and fixing them is much easier if the parser gives
>warnings for many errors in the same document (sometimes there are
>hundreds of errors) rather than aborting at the first one...
>
>Of course, skipSubTree could do something like that, but it has not
>got the option of ascending further up in the tree than the level at
>which it was called, which is sometimes the best thing to do
>(depending on your recovery heuristics, obviously).
>
in this particular example skipSubTree() will skip current sub tree but easily enough you canmove to one level up (and that could be wrapped into function moveOneLevelUp()):

while(parser.nextTag() == START_TAG) {
   skipSubTree()
}
parser.require(END_TAG, ...)

and after that parser is positioned one level up now.

alek

-- 
"Mr. Pauli, we in the audience are all agreed that your theory is crazy. 
What divides us is whether it is crazy enough to be true." Niels H. D. Bohr



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.