|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Slowness of JDK 1.1.x String.intern() [was Re: SAX, Java, and Names
David Brownell wrote: > Tim Bray wrote: > > > > At 10:12 AM 2/5/99 -0800, Jeff Greif wrote: > > >JDK 1.1.7 intern is native, but is slow because it first converts the > > >characters in the string [to a canonical form] > > No comment ... that's not my code ... ;-) > > > Actually, the real reason that most XML parsers will *never* use > > built-in intern is because they probably have the name available in a > > character array, and can go look things up in the handcrafted > > table without String-i-fying it - thus skipping several steps > > of work that a built-in intern is going to have to do. E.g. Lark's > > symbol table is a double array, storing both the character-array > > and String version of each name - you lookup based on the > > character array and return the string if it's already there. The > > point is that you call new String() only once per unique name. > > This gives "per-parse" uniqueness, which is valuable to a fair > degree beyond the performance win of avoiding allocating a new > string. > > However, Sun's package currently goes one step further and actually > interns that string. It's such a small cost (on top of the cost > to check that array-to-string cache in the first place) that it's > barely measurable. (Anyone try "java -Xrunhprof:cpu=samples ..." on > JDK 1.2/SPARC?) This is what I do in an XML parser as well. The costs would only be relatively high if you had a only one instance of an element type for each element in the document. This in the real world will never happen as you will instead of have lots of repeated element and attribute Names which can be cached and interned the first time. > That provides "per-VM" uniqueness which has turned out to be handy > for things like stylesheet processing -- comparing strings in the > stylesheet and source document is quite fast, and that does add > up to a performance difference in template matching. This is very true. Some DOM implementations such as Docuverse's also do this for the DOM tree. You have a relatively low performance cost for interning Names in a document, but you could possibly get huge benefits when doing node iteration. As of JDK 1.1.7 the String.equals() method is now something of the form: public boolean equals(Object o) { if (s == this) return true; String s = (String)o; if (s.length != length) return false; // Do character matching } Actually, I think just about all DOM implementations in Java that I am aware of intern Names so a call to Node.getNodeName() will always return an interned string. It would be nice for applications if SAX stated that all Names are presented to the DocumentHandler interface as interned strings as Names are nothing more than symbols anyways and should be treated as such, with of course the exception of the weirdness of namespace declaration names appearing as attribute names (e.g. "xmlns:" + some prefix name)". Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||






