[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX: New Idea for Entity Resolution
David Megginson wrote: > > James Clark writes: > > > You could just have a class that encapsulates a structure with three > > members: > > > > - a CharacterStream > > - a ByteStream > > - a String > > > > At least one of the CharacterStream and ByteStream must be non-null. If > > the ByteStream is non-null the String can specify the encoding. > > [Read on to the bottom for a large-ish design change.] > > This implies, then, the following three interfaces: > > public interface ByteStream { > public abstract int read () > throws SAXException; > public abstract int read (byte b[], int start, int count) > throws SAXException; > } > > public interface CharacterStream { > public abstract int read () > throws SAXException; > public abstract int read (char ch[], int start, int count) > throws SAXException; > } Why are the single character read calls there? They unnecessarily complicates the interface. > public class InputSource { > // For each variable, imagine a get/set pair instead... > public ByteStream byteStream; > public CharacterStream characterStream; > public String encoding; > } > > The nice thing here is that all of these can live on separate systems > in a distributed environment: the InputSource can be a C-program on a > VAX, the CharacterStream can come a Python program running under alpha > Linux, and the parser can be running in Java on a Windows box. There > is no dependency on language- or system-specific features (except for > java.lang.String, which should be able to map predictably to other > languages). > > Now, why not take this a step further? > > public class InputSource { > // For each variable, imagine a get/set pair instead... > public String publicId; > public String systemId; > public ByteStream byteStream; > public CharacterStream characterStream; > public String encoding; > } > > We'd have to define rules of precedence: > > 1) if there is a character stream, use it; > > 2) if there is no character stream but there is a byte stream, use the > byte stream; > > 3) if there is neither a character stream nor a byte stream but there > is a system identifier, open a connection to the system identifier; > > 4) if there is no character stream, byte stream, or system identifier, > throw an exception (or invoke the ErrorHandler). > > Now, we can get away with only one parse() method in > org.xml.sax.Parser: > > public abstract void parse (InputSource source) > throws Exception; I don't think this is a good idea: it makes SAX harder to use in the simple case of reading from a URL. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|