String interning (WAS: SAX2/Java: Towards a final form)
I think we need to clarify a couple of ambiguities here. There are two sorts of interning being talked about wrt SAX2: the standard java interning performed by String.intern(); and other, parser specific, mechanisms for ensuring that Strings which are String.equal() are also ==. I'll call the former java-interning and the latter app-interning. I have a strong objection to SAX requiring that Strings returned from it's methods be java-interned. I'm not bothered about requiring app-interning so long as the guarantees are weakened a little. First, the problem with java-interning. The way this is implemented (in all the JVM's I've seen the sources of) is via a hash-lookup of the pre-interned String in a JVM-internal table. Because this table is shared by all threads in a JVM this lookup has to be synchronized. The upshot is that there is a huge potential for lock-contention where many threads are interning simultaneously. This is bad enough on a single processor machine, but could seriously clobber performance on a multi-processor box. I, for one, want to use multiple SAX parser instances driven from multiple threads on SMP machines, and I'd be a tad distressed if java-interning were a SAX requirement. David Megginson has mentioned a way of reducing the overhead of java-interning: here we have a parser-internal map from character sequences onto java-interned Strings ... if when you lookup on the char sequence you get non-null String back then that's the java-interned result; otherwise you convert the char sequence to a String, java-intern it and enter it in the table. Whilst this might improve things a bit, it's still a performance hit: if the parser internal map is shared between parsers then we have the same contention problem back again (tho' this time in application code rather than the JVM); if it isn't (and hence is parser-/thread-local), then it has to be repopulated at least for each new parser instance, probably for each new document. Even tho' this only requires one java-intern for each distinct name it still provides plenty of opportunities for synchronization collisions. App-interning could be fine tho' ... so long as it's defined in such a way that it can be implemented in a completely thread- local way. Doing that means we'd have to, 1. Weaken the guarantees on the equivalence of String.equals() and ==. To avoid synchronization issues we'd have to say that app-interning is done relative to a given parser call, ie. where foo and bar are both obtained via a callbacks from the same call on XMLReader.parse() foo.equals(bar) iff foo == bar but if foo and bar are not both obtained via callbacks from the same call on XMLReader.parse() foo.equals(bar) does not imply foo == bar 2. Adopt something like Lars proposal of a StringInterner interface. We'd need this to allow a SAX client to app-intern any literal Strings it wants to == test against in it's handlers. This should get us what most people want: fast equality comparisons and shared representation within the implementation of a ContentHandler, but without any need for synchronization. One point to bear in mind: none of the foregoing would _prevent_ a SAX implementor from using java-interning if they wanted to. Cheers, Miles -- Miles Sabin Cromwell Media Internet Systems Architect 5/6 Glenthorne Mews +44 (0)20 8817 4030 London, W6 0LJ, England msabin@c... http://www.cromwellmedia.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1 Please note: New list subscriptions now closed in preparation for transfer to OASIS.
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format