[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: String interning (WAS: SAX2/Java: Towards a final form)

  • From: Miles Sabin <msabin@c...>
  • To: David Megginson <david@m...>
  • Date: Fri, 14 Jan 2000 19:33:10 -0000

dom string.intern
David Megginson wrote,
> I was very concerned about this use case at first, but my 
> concerns lessened a bit once I started to consider 
> implementation details.
> If I'm writing a filter, where do the strings for the names 
> I'm passing on come from?

I'd put filters in a slightly different box. Layers of SAX
handlers feeding into XMLReaders feeding into handlers ...
will come out fine, because, as you say, everything is
String.intern()'d from source to sink.

> Iterating over a DOM, on the other hand, is a legitimate 
> problem. Every DOM implementation worth its salt will have 
> interned all element and attribute names (a DOM tree is big 
> enough already), but there's no way to be sure of that in the 
> general case, or to be sure that the names are == the results 
> of java.lang.String.intern().  

Not just a DOM. Quite a few people are sitting SAX on top of
all sorts of data-structures which don't necessarily make any
interning guarantees. And don't forget database queries.

> Too bad the DOM level one Java binding didn't require that.

Hmm ... similar issues. Some people layer DOM implementations
on top of non-DOM data structures, java-reflection and DB

> > The other scenario is mine (multiple parsers running over
> > arbitrary documents in multiple threads) where the global
> > String.intern() map is a point of contention. I won't bore
> > everyone with the details again.
> I'm much more skeptical about this one, because there are so 
> many preconditions:
> [snip: 4 conditions]
> If all of these conditions arise at the same time (and I 
> question #3 and #4), then perhaps over-all XML parsing might 
> slow down by 1-2%; if the actual XML parsing represents even 
> as much as 30% of the processing time (the rest is taken by 
> whatever the ContentHandler callbacks do with the 
> information), that's a 0.6% slowdown under these
> circumstances.

Those four conditions cover my situation pretty accurately.
You'll just have to swallow (3), but (4) is the single-
processor vs. multi-processor thing. 

> Granted, the potential speedup for other apps probably isn't 
> much greater, but since the vast majority of SAX apps will 
> not meet the above criteria, and since the penalty when one 
> does meet these criteria is so small, it makes sense not to 
> penalize everyone else.

OK, there's not much I can say to that. If I really am doing
something very far out then it'd be unreasonable for you to
twist the API to suit me.

I'm not convinced tho'. This isn't quite my app, but I can 
imagine people wanting serve HTML generated via XSL from 
heterogenous XML on a heavily loaded, multi-threaded HTTP server.
It'd be a shame if lock contention issues made it harder for
them to scale up to more users by sticking a couple more
processors in the box.

> If there's any real concern, I think, it's the DOM scenario.

Arkin? Comments?

> > [snip big case statement example]
> > To be honest, tho', I don't see any particular reason why 
> > the SAX API should be expected to support this sort of 
> > code.
> How about running in a tight loop?

I doubt that the difference between String.equals() and == would 
be critical even here if the code under the conditionals does
much work. But even if it _is_, adding an interning method
to XMLReader,

  String intern(String toBeInterned);

would do the trick,

  // In startDocument() or outside the ContentHandler
  // altogether

  RDF = r.intern("http://www.w3.org/1999/02/22-rdf-syntax-ns#");
  ABOUT = r.intern("about");
  ID = r.intern("ID");
  ABOUT_EACH = r.intern("aboutEach");

  XHTML = r.intern("http://www.w3.org/1999/xhtml");
  HREF = r.intern("href");
  CLASS = r.intern("class");
  NAME = r.intern("name");

  // In startElement()

  for (int i = 0; i < len; i++) {
    String name = atts.getName(i);
    if (atts.getURI(i) == RDF) {
      if (name == ABOUT) {
        do something
      } else if (name == ID) {
        do something
      } else if (name == ABOUT_EACH) {
        do something
    } else if (atts.getURI(i) == XHTML) {
      if (name == HREF) {
        do something
      } else if (name == CLASS) {
        do something
      } else if (name == NAME) {
        do something



Miles Sabin                       Cromwell Media
Internet Systems Architect        5/6 Glenthorne Mews
+44 (0)20 8817 4030               London, W6 0LJ, England
msabin@c...          http://www.cromwellmedia.com/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1
Please note: New list subscriptions now closed in preparation for transfer to OASIS.


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.