[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: half-baked parsers vs binary XML
Gabe Beged-Dov writes: [on a validating parser] > There would be a little speed difference from not having to check > for defaulted attributes. Not a measurable one -- the parser just needs to set a boolean flag when there are no default values available, then it doesn't have to check each time. > The half-baked parser might also be able to directly point to the > xml input without having to copy it, i.e. use start-length pointers > for the tags and attrs. This would be more cumbersome if there was > less of a one to one correspondence between the raw xml and what > you got after expansion and defaulting. I think that James Clark does something like that with Expat, which does read the prolog properly, though it doesn't expand external entities by default. At least, Expat can always return the exact string where an event originated. Most efficient XML parsers play pretty clever tricks with their input buffers, even with entity expansion. > > There will be a small size difference, but it will be less > > exciting than you think -- the code to detect the prologue and > > load the module will make up much of the difference. > > Detecting the prologue and loading an alternate module takes a few > lines of Java code. Well, a little more than that, because you'll have to pass the current state on to the new module. > Prologue processing, entity expansion and attribute defaulting take > up a little more than that in the parsers that I've looked at. The version of AElfred that I wrote was around 27K (uncompressed) including full parsing of element, attribute, and entity declarations, and expansion of external entities (including the external DTD subset); even then, AElfred would have been about 7K smaller if I hadn't written my own hashing, interning, buffer-handling etc. for speed's sake. I still believe that a 10K XML non-validating parser class in Java is not out of reach, *including* parsing the prolog, if people are willing to use the standard Java classes. > > doing the well-formedness checks for legal characters can take up > > a lot of code, but you're supposed to do that anyway (I cheated > > with AElfred). > > I'm not sure I understand. Could you elaborate on how you cheated :-? At least when I was maintaining it, AElfred didn't perform all of the required well-formedness checks for different ranges of Unicode characters allowed and not allowed in names, attribute values, character data, etc. I tried adding it, but it bloated the code by about 7-8K (much more than parsing the prolog and DTD). All the best, David -- David Megginson david@m... http://www.megginson.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|