[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX2/Java: Towards a final form

  • From: Tyler Baker <tyler@i...>
  • To: Lars Marius Garshol <larsga@g...>
  • Date: Tue, 11 Jan 2000 15:44:40 -0500

sax1 sax2
Lars Marius Garshol wrote:

> * Stefan Haustein
> |
> | Is "no namespace" reported with a null or empty String (for interned
> | Strings, the equals problem does not exist)?
>
> * David Megginson
> |
> | Empty string sounds like a reasonable suggestion when Namespace
> | processing is being performed; null when it is not (so that a bugs
> | in code will show up sooner).
>
> There is a problem with this: SAX filters should be able to compare
> names without knowing whether namespace processing is on or not.
> Allowing parts of names to be null makes this much more complicated,
> since this is a comparison of two three-string tuples. So from a
> filter point of view it would be much better if no part of a name
> could ever be null. (I'm a bit unsure what to do with the raw name
> when there is no original raw name.)
>
> | That's a good question -- should SAX2 require that all names and
> | Namespace URIs be interned (i.e. == to the results of
> | java.lang.String.intern)?
>
> This sounds like it could cause a huge performance gap between
> implementations. I think the MSXML driver and the SAX1 adapter will
> have to intern every name-part string that is passed to them, which I
> assume would be very costly. (The alternative would be breaking
> applications, unless there is a cheaper way.)
>
> Also, many parsers already do their own interning and support for
> SAX2, and these would then require either the solution above or a
> (non-costly) change to the parser itself. This definitely sounds like
> something that is easily forgotten, thus causing incompatibilities.

Very true, but parsers can keep interned strings (the result of java.lang.String.intern() )
mapped in their own parser string table. So whenever you come across a new element or
attribute name for instance, your readName() method (assuming you have one) would:

- Check to see if the character sequence comprises a legal XML Name.
- Generate a hashcode of the XML Name characters.
- Look in your string map using the generated hashcode and character sequence to see if there
is an already stored interned String in it.
- If there is an interned string in the string map that is equal to the XML Name character
sequence, return the interned string.
- If there is no matching interned string, create a new String object using the read
characters, and call intern on it to retrieve an interned string. Store the interned string in
the string map, and return it as well.

I have found this to be a significant performance enhancement at the application level as your
case statements using XML names can safely test for identity and not equality (which is much
more expensive especially in a large case statement).

For SAX drivers, interning every string could cause big performance problems, but most parsers
support SAX natively now so worrying about drivers of XML parsers that have lackluster
performance in ther first place, should not be a big concern here anyways.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.