[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX: New Idea for Entity Resolution

  • From: David Megginson <ak117@f...>
  • To: James Clark <jjc@j...>
  • Date: Fri, 17 Apr 1998 07:43:55 -0400

new sax
James Clark writes:

 [my example omitted]

 > This is fine except that it should use byte streams not character
 > streams.  What you get if you are reading from the net or from an
 > archive or a database or whatever is bytes not characters and it is
 > part of the function of an XML processor to manage the conversion
 > into bytes using the encoding declaration and the XML specified
 > mechanisms for encoding auto-detection.  You could provide both,
 > but the fundamental one is a for a stream of bytes.  Also the
 > EntityResolver needs to be able to indicate an externally specified
 > encoding (as with the additional argument for parse with a
 > SAXByteStream).  In other words SAXEntityResolver needs to return
 > an object with two members: a SAXByteStream and a (possibly null)
 > String.

I hope that people will at least admire my wisdom if I admit that I am
not smart enough to figure this one out myself.  I suspect that this
will be the Last Great Issue with SAX before we can finalise it, so
help will be appreciated.

Here are what seem to me to be the costs and benefits of supporting
character streams, byte streams, or both:


* Character streams only

  Pro: - the application writer has specialised knowledge about the
         information source that the parser writer lacks; as a
         result, the application writer can better optimise the
         conversion, if necessary
       - information from dialogue boxes, internal buffers, and
         (eventually, with internationalisation) databases will all be
         characters rather than bytes
       - most programming languages are moving towards characters and
         away from processing raw bytes 
       - many programming languages (such as Java) already have
         standard methods for converting byte streams to character
         streams, and application writers can use these if needed or
         desired

  Con: - the application may have to convert from bytes to characters
         itself if an input source is not available
       - the parser may have its own, internal, efficient mechanism
         for byte-stream conversion


* Byte streams only

  Pro: - supports the minimum common denominator: all platforms have
         some concept of a byte stream
       - allows parsers to use their own, efficient, internal methods
         for byte-stream conversion

  Con: - adds serious inefficiencies, since characters (say, from a
         dialog box, an internal buffer, or a database with I18N
         support) will have to be decomposed back into bytes to be
         passed to the parser, then reassembled back into characters
         by the parser
       - requires a new SAX class encapsulating a ByteStream and its
         recommended encoding


* Both Byte and Character streams

  Pro: - keeps everyone happy

  Con: - requires more interfaces
       - requires another method in the Parser interface
       - requires a new SAX class encapsulating a ByteStream and its
         recommended encoding (or perhaps the ByteStream interface
         will have a getEncoding() method)
       - will greatly complicate the EntityResolver mechanism (the
         application will need to be able to return a byte stream _or_
         a character stream -- how could I handle this?)


Thanks, and all the best,


David

-- 
David Megginson                 ak117@f...
Microstar Software Ltd.         dmeggins@m...
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.