Re: SAX: Character Stream vs. Byte Stream proposal...

From: Tyler Baker <tyler@i...>
To: David Megginson <ak117@f...>
Date: Fri, 17 Apr 1998 05:00:57 -0400

Play the video

David Megginson wrote:

> Tyler Baker writes:
>
>  > Why not simply have a standard factory that takes any type of
>  > InputStream (UTF-16, UTF-8, etc) similiar to how the parse method
>  > works and it returns a type (say CharacterStream) which can then be
>  > passed to either the parser or the application.  In this case the
>  > implementations for doing all of this low level character reading
>  > from bytes could be standardized for each platform.
>
> The problem is that SAX is an API, not an architecture -- that is, it
> attempts to impose the fewest possible constraints on implementations.
> There are several good reasons for this approach:
>
> 1. SAX is one of (possibly) many APIs that an XML parser will
>    implement, and other APIs may make conflicting demands.

In this case it usually makes sense to have a separate parser for each API set
rather than having code like this:

if (parserAPICode == SAX) {
// Do SAX parsing
}
else if (parserAPICode == Foo) {
// Do foo parsing
}

Conditionals like this will greatly depreciate the speed of your code if every
method is littered with them.  Better to just write a new parser for every new
API.  Nevertheless, having a standard way for each parser to get at the low level
stuff makes sense from a code-reuse as well as consistency standpoint.

> 2. XML parsers need to compete on speed, memory usage, etc., and to do
>    so, they need to be free to take different approaches.

I was suggesting that you would still have an interface, but a default
implementation for byte to character encoding in the SAX package I feel is
perfectly reasonable.  I may get flames for this, but I think most parsers will
compete on how they solve an application's XML handling problems (the design) not
on whether one parser is 1% faster than another.  In this case, a default solid
implementation for character encoding would allow parser writers to concentrate
on coming up with new and interesting ways to allow applications to model XML
content, instead of having to worry about bit shifting all over the place.

Typically, low-level stuff such as this I feel should be implemented once and
then reused over and over again.  There are only so many ways to write character
encoders / decoders and I would wager that most parsers out there pretty much
have very similiar implementations for reading from byte streams.  XML's beauty
is not in the fact the spec defines support for about 6 or so different character
encoding formats, it is in everything else.  If another character encoding format
comes out, then every SAX parser will have to possibly do a rewrite.  If people
could agree upon one good efficient dependable implementation, then no one (other
than the people doing the 600 or so lines of character encoding implementation
code) will have to do a thing.  Of course, people could plug in their own
character encoder / decoder implementations if they so choose, but at least they
would have the choice.

I really think it would of made a hell of a lot more sense for XML to have one
standard encoding format, say UTF-16 or UTF-8 instead of actually defining in the
spec the actual legal encoding formats.  It would make much more sense I feel to
just convert everything to a UTF-8 or UTF-16 format if documents were indeed in a
different format, rather than to force parser writers to handle just about every
major character encoding format known to man.  One example would be databases
which may store XML content in a proprietary character format.  An XML parser for
the database will need to do this translating anyways from the native character
format to something defined in the XML spec (unless you want to deviate from it).

Anyways, just some suggestions...

Tyler

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)

Follow-Ups:
- Re: SAX: Character Stream vs. Byte Stream proposal...
  - From: David Megginson <ak117@f...>

References:
- Character Stream vs. Byte Stream proposal...
  - From: Tyler Baker <tyler@i...>
- SAX: Character Stream vs. Byte Stream proposal...
  - From: David Megginson <ak117@f...>

Prev by Date: Character streams vs. Byte Streams
Next by Date: Re: Inheritance in XML (was Re: Problems parsing XML)
Previous by thread: SAX: Character Stream vs. Byte Stream proposal...
Next by thread: Re: SAX: Character Stream vs. Byte Stream proposal...
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >