[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX: New Idea for Entity Resolution
James Clark writes: [my example omitted] > This is fine except that it should use byte streams not character > streams. What you get if you are reading from the net or from an > archive or a database or whatever is bytes not characters and it is > part of the function of an XML processor to manage the conversion > into bytes using the encoding declaration and the XML specified > mechanisms for encoding auto-detection. You could provide both, > but the fundamental one is a for a stream of bytes. Also the > EntityResolver needs to be able to indicate an externally specified > encoding (as with the additional argument for parse with a > SAXByteStream). In other words SAXEntityResolver needs to return > an object with two members: a SAXByteStream and a (possibly null) > String. I hope that people will at least admire my wisdom if I admit that I am not smart enough to figure this one out myself. I suspect that this will be the Last Great Issue with SAX before we can finalise it, so help will be appreciated. Here are what seem to me to be the costs and benefits of supporting character streams, byte streams, or both: * Character streams only Pro: - the application writer has specialised knowledge about the information source that the parser writer lacks; as a result, the application writer can better optimise the conversion, if necessary - information from dialogue boxes, internal buffers, and (eventually, with internationalisation) databases will all be characters rather than bytes - most programming languages are moving towards characters and away from processing raw bytes - many programming languages (such as Java) already have standard methods for converting byte streams to character streams, and application writers can use these if needed or desired Con: - the application may have to convert from bytes to characters itself if an input source is not available - the parser may have its own, internal, efficient mechanism for byte-stream conversion * Byte streams only Pro: - supports the minimum common denominator: all platforms have some concept of a byte stream - allows parsers to use their own, efficient, internal methods for byte-stream conversion Con: - adds serious inefficiencies, since characters (say, from a dialog box, an internal buffer, or a database with I18N support) will have to be decomposed back into bytes to be passed to the parser, then reassembled back into characters by the parser - requires a new SAX class encapsulating a ByteStream and its recommended encoding * Both Byte and Character streams Pro: - keeps everyone happy Con: - requires more interfaces - requires another method in the Parser interface - requires a new SAX class encapsulating a ByteStream and its recommended encoding (or perhaps the ByteStream interface will have a getEncoding() method) - will greatly complicate the EntityResolver mechanism (the application will need to be able to return a byte stream _or_ a character stream -- how could I handle this?) Thanks, and all the best, David -- David Megginson ak117@f... Microstar Software Ltd. dmeggins@m... http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|