[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Character streams vs. Byte Streams

  • From: Tyler Baker <tyler@i...>
  • To: Michael Kay <M.H.Kay@e...>
  • Date: Fri, 17 Apr 1998 05:34:07 -0400

character streams versus byte streams
Michael Kay wrote:

> James Clark:
> >This is fine except that it should use byte streams not character
> >streams.  What you get if you are reading from the net or from an
> >archive or a database or whatever is bytes not characters...
>
> I have enormous respect for James's arguments as always but on this one I
> beg to disagree. The reason I have asked for support for character streams
> is so that the parser can process not only stuff stored on disc but *the
> output of another program*. For  example, I have an application where the
> XML document is constructed as the result of an SQL query that pulls
> together fragments of XML stored in different places in a database. The SQL
> query, like most other programs I use and write, prefers  to output
> characters rather than bytes. That, after all, is the reason XML was
> designed to be human-readable.
>
> And I have to say that in my experience so far, the parsers are so lightning
> fast compared with the  application that generates the XML or consumes it,
> that an argument based on saving microseconds will not sway me much.

This is what I found out personally and why I decided to write my own parser for
my application.  When using SAX I found that my DocumentHandler implementation
was taking up about 75% of the processing time while the parser was only taking
25%.  The main reason for this I found is that using the String.equals() method
is quite expensive and is really the only good way in SAX for recognizing
elements.  When I switched to the Object framework I designed the parsing times
for my actual parser implementation were lower, but more importantly, the time
spent in the application handling the XML content was reduced to less than the
time spent in the parser which was a big surprise.

> I don't think there is a real problem with the XML spec. This defines the
> syntax of XML in terms of characters. It requires the parser to accept
> certain encodings of the character stream as a byte stream, but it permits
> the parser to accept other encodings and therefore by implication to
> delegate the decoding of the byte stream to another object in the  system.
> In fact it explicitly recognises that an "external transport protocol" might
> have a say in the matter, and that is a term we could interpret very widely.

Another reason why a CharacterStreamFactory I feel is a good idea.  It separates
the low-level encoding aspect of characters from the rest of the parser which I
feel should only really need to use one type of encoding format in the first
place.  If there was a default CharacterStreamFactory implementation the
following I feel are important issues...

- The default implementation should support all of the character encoding formats
defined in the XML 1.0 spec
- The default implementation should have a way to add in support for custom
character encoding formats (like with DB's).
- The default implementation should have a mechanism to replace implementations
for various encoding streams if the parser writer chooses to do so either for
optimization purposes he/she feels is necessary or some other reason.

The alternative I feel is never ending code bloat like in the case with current
major word processors where they all have endless amounts of kludgy code for
reading each others proprietary document formats and in the end just bloat the
application's resource consumption significantly.

Tyler


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.