[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Slowness of JDK 1.1.x String.intern() [was Re: SAX, Java, and Names

  • From: Tyler Baker <tyler@i...>
  • To: David Brownell <db@E...>
  • Date: Fri, 12 Feb 1999 03:44:25 -0500

java array intern
David Brownell wrote:

> Tim Bray wrote:
> >
> > At 10:12 AM 2/5/99 -0800, Jeff Greif wrote:
> > >JDK 1.1.7 intern is native, but is slow because it first converts the
> > >characters in the string [to a canonical form]
>
> No comment ... that's not my code ... ;-)
>
> > Actually, the real reason that most XML parsers will *never* use
> > built-in intern is because they probably have the name available in a
> > character array, and can go look things up in the handcrafted
> > table without String-i-fying it - thus skipping several steps
> > of work that a built-in intern is going to have to do.  E.g. Lark's
> > symbol table is a double array, storing both the character-array
> > and String version of each name - you lookup based on the
> > character array and return the string if it's already there.  The
> > point is that you call new String() only once per unique name.
>
> This gives "per-parse" uniqueness, which is valuable to a fair
> degree beyond the performance win of avoiding allocating a new
> string.
>
> However, Sun's package currently goes one step further and actually
> interns that string.  It's such a small cost (on top of the cost
> to check that array-to-string cache in the first place) that it's
> barely measurable.  (Anyone try "java -Xrunhprof:cpu=samples ..." on
> JDK 1.2/SPARC?)

This is what I do in an XML parser as well.  The costs would only be
relatively high if you had a only one instance of an element type for each
element in the document.  This in the real world will never happen as you
will instead of have lots of repeated element and attribute Names which can
be cached and interned the first time.

> That provides "per-VM" uniqueness which has turned out to be handy
> for things like stylesheet processing -- comparing strings in the
> stylesheet and source document is quite fast, and that does add
> up to a performance difference in template matching.

This is very true.  Some DOM implementations such as Docuverse's also do
this for the DOM tree.  You have a relatively low performance cost for
interning Names in a document, but you could possibly get huge benefits when
doing node iteration.  As of JDK 1.1.7 the String.equals() method is now
something of the form:

public boolean equals(Object o) {
  if (s == this) return true;

  String s = (String)o;
  if (s.length != length) return false;

  // Do character matching
}

Actually, I think just about all DOM implementations in Java that I am aware
of intern Names so a call to Node.getNodeName() will always return an
interned string.

It would be nice for applications if SAX stated that all Names are presented
to the DocumentHandler interface as interned strings as Names are nothing
more than symbols anyways and should be treated as such, with of course the
exception of the weirdness of namespace declaration names appearing as
attribute names (e.g. "xmlns:" + some prefix name)".

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.