[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

String interning (WAS: SAX2/Java: Towards a final form)

  • From: Miles Sabin <msabin@c...>
  • To: XMLDev list <xml-dev-digest@i...>
  • Date: Wed, 12 Jan 2000 18:04:04 -0000

equal java
I think we need to clarify a couple of ambiguities here.
There are two sorts of interning being talked about wrt SAX2:
the standard java interning performed by String.intern(); and
other, parser specific, mechanisms for ensuring that Strings
which are String.equal() are also ==. I'll call the former
java-interning and the latter app-interning.

I have a strong objection to SAX requiring that Strings
returned from it's methods be java-interned. I'm not bothered 
about requiring app-interning so long as the guarantees are 
weakened a little.

First, the problem with java-interning. The way this is
implemented (in all the JVM's I've seen the sources of) is
via a hash-lookup of the pre-interned String in a JVM-internal
table. Because this table is shared by all threads in a JVM
this lookup has to be synchronized. The upshot is that there
is a huge potential for lock-contention where many threads
are interning simultaneously. This is bad enough on a single
processor machine, but could seriously clobber performance on
a multi-processor box. I, for one, want to use multiple SAX
parser instances driven from multiple threads on SMP machines,
and I'd be a tad distressed if java-interning were a SAX

David Megginson has mentioned a way of reducing the overhead of
java-interning: here we have a parser-internal map from 
character sequences onto java-interned Strings ... if when you 
lookup on the char sequence you get non-null String back then 
that's the java-interned result; otherwise you convert the char 
sequence to a String, java-intern it and enter it in the table.

Whilst this might improve things a bit, it's still a 
performance hit: if the parser internal map is shared between 
parsers then we have the same contention problem back again 
(tho' this time in application code rather than the JVM); if it 
isn't (and hence is parser-/thread-local), then it has to be 
repopulated at least for each new parser instance, probably for 
each new document. Even tho' this only requires one java-intern 
for each distinct name it still provides plenty of 
opportunities for synchronization collisions.

App-interning could be fine tho' ... so long as it's defined in 
such a way that it can be implemented in a completely thread-
local way. Doing that means we'd have to,

1. Weaken the guarantees on the equivalence of String.equals() 
   and ==.

   To avoid synchronization issues we'd have to say that
   app-interning is done relative to a given parser call,

   ie. where foo and bar are both obtained via a callbacks from 
   the same call on XMLReader.parse()

     foo.equals(bar) iff foo == bar

   but if foo and bar are not both obtained via callbacks from 
   the same call on XMLReader.parse()

     foo.equals(bar) does not imply foo == bar

2. Adopt something like Lars proposal of a StringInterner 

   We'd need this to allow a SAX client to app-intern any
   literal Strings it wants to == test against in it's

This should get us what most people want: fast equality 
comparisons and shared representation within the implementation 
of a ContentHandler, but without any need for synchronization.

One point to bear in mind: none of the foregoing would 
_prevent_ a SAX implementor from using java-interning if they 
wanted to.



Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@c...          http://www.cromwellmedia.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.