[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] String interning (WAS: SAX2/Java: Towards a final form)
I think we need to clarify a couple of ambiguities here. There are two sorts of interning being talked about wrt SAX2: the standard java interning performed by String.intern(); and other, parser specific, mechanisms for ensuring that Strings which are String.equal() are also ==. I'll call the former java-interning and the latter app-interning. I have a strong objection to SAX requiring that Strings returned from it's methods be java-interned. I'm not bothered about requiring app-interning so long as the guarantees are weakened a little. First, the problem with java-interning. The way this is implemented (in all the JVM's I've seen the sources of) is via a hash-lookup of the pre-interned String in a JVM-internal table. Because this table is shared by all threads in a JVM this lookup has to be synchronized. The upshot is that there is a huge potential for lock-contention where many threads are interning simultaneously. This is bad enough on a single processor machine, but could seriously clobber performance on a multi-processor box. I, for one, want to use multiple SAX parser instances driven from multiple threads on SMP machines, and I'd be a tad distressed if java-interning were a SAX requirement. David Megginson has mentioned a way of reducing the overhead of java-interning: here we have a parser-internal map from character sequences onto java-interned Strings ... if when you lookup on the char sequence you get non-null String back then that's the java-interned result; otherwise you convert the char sequence to a String, java-intern it and enter it in the table. Whilst this might improve things a bit, it's still a performance hit: if the parser internal map is shared between parsers then we have the same contention problem back again (tho' this time in application code rather than the JVM); if it isn't (and hence is parser-/thread-local), then it has to be repopulated at least for each new parser instance, probably for each new document. Even tho' this only requires one java-intern for each distinct name it still provides plenty of opportunities for synchronization collisions. App-interning could be fine tho' ... so long as it's defined in such a way that it can be implemented in a completely thread- local way. Doing that means we'd have to, 1. Weaken the guarantees on the equivalence of String.equals() and ==. To avoid synchronization issues we'd have to say that app-interning is done relative to a given parser call, ie. where foo and bar are both obtained via a callbacks from the same call on XMLReader.parse() foo.equals(bar) iff foo == bar but if foo and bar are not both obtained via callbacks from the same call on XMLReader.parse() foo.equals(bar) does not imply foo == bar 2. Adopt something like Lars proposal of a StringInterner interface. We'd need this to allow a SAX client to app-intern any literal Strings it wants to == test against in it's handlers. This should get us what most people want: fast equality comparisons and shared representation within the implementation of a ContentHandler, but without any need for synchronization. One point to bear in mind: none of the foregoing would _prevent_ a SAX implementor from using java-interning if they wanted to. Cheers, Miles -- Miles Sabin Cromwell Media Internet Systems Architect 5/6 Glenthorne Mews +44 (0)20 8817 4030 London, W6 0LJ, England msabin@c... http://www.cromwellmedia.com/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1 Please note: New list subscriptions now closed in preparation for transfer to OASIS.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|