[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: half-baked parsers vs binary XML

  • From: David Megginson <david@m...>
  • To: XML List <xml-dev@i...>
  • Date: Sat, 27 Mar 1999 21:54:58 -0500 (EST)

lines from half baked
Gabe Beged-Dov writes:

[on a validating parser]

 > There would be a little speed difference from not having to check
 > for defaulted attributes.

Not a measurable one -- the parser just needs to set a boolean flag
when there are no default values available, then it doesn't have to
check each time.

 > The half-baked parser might also be able to directly point to the
 > xml input without having to copy it, i.e. use start-length pointers
 > for the tags and attrs.  This would be more cumbersome if there was
 > less of a one to one correspondence between the raw xml and what
 > you got after expansion and defaulting.

I think that James Clark does something like that with Expat, which
does read the prolog properly, though it doesn't expand external
entities by default.  At least, Expat can always return the exact
string where an event originated.

Most efficient XML parsers play pretty clever tricks with their input
buffers, even with entity expansion.

 > > There will be a small size difference, but it will be less
 > > exciting than you think -- the code to detect the prologue and
 > > load the module will make up much of the difference.
 > 
 > Detecting the prologue and loading an alternate module takes a few
 > lines of Java code.  

Well, a little more than that, because you'll have to pass the current
state on to the new module.

 > Prologue processing, entity expansion and attribute defaulting take
 > up a little more than that in the parsers that I've looked at.

The version of AElfred that I wrote was around 27K (uncompressed)
including full parsing of element, attribute, and entity declarations,
and expansion of external entities (including the external DTD
subset); even then, AElfred would have been about 7K smaller if I
hadn't written my own hashing, interning, buffer-handling etc. for
speed's sake.

I still believe that a 10K XML non-validating parser class in Java is
not out of reach, *including* parsing the prolog, if people are
willing to use the standard Java classes.

 > > doing the well-formedness checks for legal characters can take up
 > > a lot of code, but you're supposed to do that anyway (I cheated
 > > with AElfred).
 > 
 > I'm not sure I understand. Could you elaborate on how you cheated :-?

At least when I was maintaining it, AElfred didn't perform all of the
required well-formedness checks for different ranges of Unicode
characters allowed and not allowed in names, attribute values,
character data, etc.  I tried adding it, but it bloated the code by
about 7-8K (much more than parsing the prolog and DTD).


All the best,


David

-- 
David Megginson                 david@m...
           http://www.megginson.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.