[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: String interning (WAS: SAX2/Java: Towards a final form)
On Thu, 13 Jan 2000, David Brownell wrote: > "Clark C. Evans" wrote: >> There are going to be lots of server side filter >> architectures using the SAX interface which may >> not do this. Indeed, I'd say that the "parser" >> interface is mis-named. It's really an "emitter". >> And I'd go so far to say that in a few years, >> 99% of the "emitters" out there won't be parsers! > > I sort of hope so. XML data models shouldn't be forced to stop right > above parsing; it's not always appropriate. It should certainly be > possible to assemble pipelines of components which may optionally > be sourced by a parser, but don't need to be. I've been doing alot of work building a filter architecture. And I've found the SAX 1.0 interface lacking. Below is a suggested modification for SAX2. Your comments would be cool. Here is the relevant post to the SML-DEV list (edited for here for clarity). The concepts are identical, just add attributes, and all of the other XML stuff.. ---------- Forwarded message ---------- Date: Thu, 13 Jan 2000 15:04:41 -0500 (EST) From: Clark C. Evans <clark.evans@m...> Reply-To: sml-dev@e... To: sml-dev@e... Subject: [sml-dev] Character Tugging Consider the following interface for "push" elements, and "pull" characters. public interface Handler { public void begin(String name) ; public void characters(CharTug value); public void end(String name); } public interface CharTug { public Reader toReader(); public String toString() public boolean hasObject() public Object getObject() } Thus, a handler would be pushed the "begin" event, for every SML start tag, and an "end" event for every SML end tag. This much is very similar to the SAX API. However, where it differs is "characters". For the characters event, most SAX implementations that I have read make a temporary copy of the relevant parts of input buffer in zero or more events to the handler. The "characters" event for the SAX interface has several problems: 1. The hander may receive two or more sequential characters() event calls when a element's content crosses a buffer boundary. Thus state must be maintained and the termination of a sequence of characters is determined by two other events, the begin or end. Hardly obvious. 2. Most of the time, the character array is converted into a string, thus the temporary memory is allocated and then immediately de-allocated. This is not optimal. Alternatively, the characters passed can be direct pointers into the parser's character buffer -- but the value may be stored, and this could cause unexpected problems. 3. If SAX events are put into a processor pipeline, then an application specific object, lets say "Currency" must be converted to characters and back for each stage of the processing. This is, to say the very least inneficient. 4. In the common case of building a string, the handler must put in special code. By passing a CharTug instead, most of these problems are solved. 1. If the application would rather 'read' the information directly, it can ask for a reader, getReader(). The parser is then like a FilteredReader, scanning for begin/end tags, and propery terminating the character sequence. 2. In the case of a getReader, no additioanl intermediate storage is needed. 3. For the pipeline case, a hasObject can be called to see if an application specific object, like Currency or Integer has already been built. If so, then it can ask for this instead -- rather than breaking down the currency into characters and then re-building them at the other end. 4. And, for the common case, toString() is a helper function which will return the characters as a String object. For the parser case, the parser would build the string directly from its input buffer. For the pipeline case, the previous stage could use the toString() method of its application specific object. If a reader is requested, then it can either build a custom reader, or it can return a StringReader from the toString() result. This case can be provided as a helper class. Note: If the handler wants to disregard the characters content, then at worst case, a tiny CharTug object (does it need any member variables?) will have been created and destroyed with far less usage than the corresponding char[] in a characters call. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ or CD-ROM/ISBN 981-02-3594-1 Please note: New list subscriptions now closed in preparation for transfer to OASIS.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|