[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: String interning (WAS: SAX2/Java: Towards a final form)

  • From: Tyler Baker <tyler@i...>
  • To: Miles Sabin <msabin@c...>
  • Date: Mon, 17 Jan 2000 17:21:50 -0500

java string intern
Miles Sabin wrote:

> Tim Bray wrote,
> > Miles Sabin wrote:
> > > Anyhow, maybe the waters are getting a bit muddied. I'm
> > > assuming that all parsers will do interning of one sort or
> > > another internally. The issue for me is how much of that
> > > gets exposed via the SAX API. I don't want java-interning
> > > exposed, because that means my parser has no option but to
> > > use String.intern().
> >
> > Yes.   Given that *every* credible parser does this,
>
> No argument here (assuming that my 'all parsers' => your 'every
> credible parser').
>
> > ... it's a major convenience for programmers using the
> > API to be able to compare strings with ==, there is at some
> > level an argument that we ought to expose this fact.
> >
> > I'd go further; based on having written a parser, it seems to
> > me that the only sane tactic is for the parser to use
> > java.intern(), but only once for each unique name, with some
> > sort of internal char[] or equivalent table.  If this is
> > true, it's an even stronger argument for just saying "element > types and
> attribute names coming out of the parser are intern
> > ()ed, period".
>
> OK, this is David M's position.
>
> Sure, there's a case for this. But there's a case against too.
> There are at least two scenarios in which this would be a
> burden.
>
> One is where SAX isn't sitting on top of a parser (this is
> Arkin's worry). Instead it's generating SAX events from a DOM
> tree, java reflection, or some other data structure, a JDBC
> query perhaps.
>
> Unlike a parser, these event sources deliver Strings directly,
> so if there were no requirement to String.intern() they could
> simply pass Strings straight through the ContentHandler API. A
> requirement that SAX return String.intern()'d Strings rules
> that out tho', because none of DOM, reflection, or JDBC make
> any guarantees that the Strings they return are interned. The
> cost of interning (whether via a direct call on String.intern()
> or via a David M style lookup against a table of interned
> Strings) would be a significant additional overhead.
>
> You could argue that these aren't legitimate or central uses
> of the SAX API. But if you want to do that you should make it
> explicit, because it's likely to be quite a controversial
> line.

In a DOM package I wrote I faced exactly this problem if a user programmatically generated
a DOM document tree. If they generated a DOM document tree from a file, then all names
would be interned anyways as the parser would present the DOM document with only interned
names.

The way to get around this problem is somewhat complex but it is doable. What you need to
do is have a String table internal to the document. Whenever someone invokes:

Document.createElement(String name);

You just replace the argument String with an interned string. The other alternative,
though a little more expensive in some cases (such as multi-threaded situations) would be
to just call String.intern() every time the user invokes:

Document.createElement(String name);

I have not seen much of the popular DOM packages these days, but I am sure they have found
a similiar workaround as well.

> There seem to be two main points to your argument for String.
> intern()ing.
>
> 1. Reducing the amount of String object creation in parsers.
>
>    I don't think _anybody_ thinks that this isn't important.
>    the only issue is how best to do it. String.intern() is
>    one way. An internal parser data structure is another.

Most parsers do both. You don't need to Java intern your strings to reduce String object
allocation. But Java interning the Strings has nothing to do with decreasing object
allocation anyways.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.