[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX: New Idea for Entity Resolution
David Megginson wrote: > James Clark writes: > > > You could just have a class that encapsulates a structure with three > > members: > > > > - a CharacterStream > > - a ByteStream > > - a String > > > > At least one of the CharacterStream and ByteStream must be non-null. If > > the ByteStream is non-null the String can specify the encoding. > > [Read on to the bottom for a large-ish design change.] > > This implies, then, the following three interfaces: > > public interface ByteStream { > public abstract int read () > throws SAXException; > public abstract int read (byte b[], int start, int count) > throws SAXException; > } > > public interface CharacterStream { > public abstract int read () > throws SAXException; > public abstract int read (char ch[], int start, int count) > throws SAXException; > } > > public class InputSource { > // For each variable, imagine a get/set pair instead... > public ByteStream byteStream; > public CharacterStream characterStream; > public String encoding; > } > > The nice thing here is that all of these can live on separate systems > in a distributed environment: the InputSource can be a C-program on a > VAX, the CharacterStream can come a Python program running under alpha > Linux, and the parser can be running in Java on a Windows box. There > is no dependency on language- or system-specific features (except for > java.lang.String, which should be able to map predictably to other > languages). > > Now, why not take this a step further? > > public class InputSource { > // For each variable, imagine a get/set pair instead... > public String publicId; > public String systemId; > public ByteStream byteStream; > public CharacterStream characterStream; > public String encoding; > } > > We'd have to define rules of precedence: > > 1) if there is a character stream, use it; > > 2) if there is no character stream but there is a byte stream, use the > byte stream; > > 3) if there is neither a character stream nor a byte stream but there > is a system identifier, open a connection to the system identifier; > > 4) if there is no character stream, byte stream, or system identifier, > throw an exception (or invoke the ErrorHandler). > > Now, we can get away with only one parse() method in > org.xml.sax.Parser: > > public abstract void parse (InputSource source) > throws Exception; > > It might still be useful to keep two separate methods in > EntityResolver, though: > > public interface EntityResolver > { > public String resolveSystemId (String publicId, String systemId) > throws SAXException; > public InputSource openEntity (String systemId) > throws Exception; > } > > Comments? > > All the best, > > David This sounds like a great idea, however I think that InputSource should be immutable in general. Instead of : public class InputSource { // For each variable, imagine a get/set pair instead... public String publicId; public String systemId; public ByteStream byteStream; public CharacterStream characterStream; public String encoding; } public interface InputSource { String getPublicId(); String getSystemId(); ByteStream getByteStream(); CharacterStream getCharacterStream(); String getEncoding(); } In general, an input source should probably be immutable as the application will actually fill in the blanks as to how the input source should be retrieved. In this sense, the system ID may not help out the parser in the first place if the URL points to an inaccessible location source for the parser alone to read (some sort of encryption of the underlying stream may be present). In this case in your previous aforementioned rules of precedence: We'd have to define rules of precedence: 1) if there is a character stream, use it; 2) if there is no character stream but there is a byte stream, use the byte stream; 3) if there is neither a character stream nor a byte stream but there is a system identifier, open a connection to the system identifier; 4) if there is no character stream, byte stream, or system identifier, throw an exception (or invoke the ErrorHandler). should be changed to something like: We'd have to define rules of precedence: 1) if there is no character stream but there is a byte stream, use the byte stream; 2) if there is no byte stream but there is a character stream, use the character stream; 3) if there is both a character stream and a byte stream available, the parser may use the byte stream or the character stream, but not both at the same time (whichever suits the parser the best). 4) if there is neither a character stream nor a byte stream throw an exception I don't believe the parser should attempt to try and open a connection using the system identifier as the system identifier has no idea what steps to take in order to retrieve the data as a stream, let alone secure authorization to it in the first place. In Java you have URL's and URLHandlers where the URL prefix is used to lookup its corresponding URL prefix. Though programmatically convenient to just call URL.openStream(), other than through setting system properties that the standard URL handlers use for things like proxies or creating your own URLStreamHandlerFactory, there is no good way to control how a specific URL's content is actually retrieved which may need to be piped through a variety of filters before it again in its raw form.. I think it would be a mistake for SAX to inherit this flaw which assumes the parser has access to the specified system identifier in any environment. Force the application to provide a suitable ByteStream and/or CharacterStream for each InputSource provided. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|