[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX: Character Stream vs. Byte Stream proposal...
David Megginson wrote: > Tyler Baker writes: > > > Why not simply have a standard factory that takes any type of > > InputStream (UTF-16, UTF-8, etc) similiar to how the parse method > > works and it returns a type (say CharacterStream) which can then be > > passed to either the parser or the application. In this case the > > implementations for doing all of this low level character reading > > from bytes could be standardized for each platform. > > The problem is that SAX is an API, not an architecture -- that is, it > attempts to impose the fewest possible constraints on implementations. > There are several good reasons for this approach: > > 1. SAX is one of (possibly) many APIs that an XML parser will > implement, and other APIs may make conflicting demands. In this case it usually makes sense to have a separate parser for each API set rather than having code like this: if (parserAPICode == SAX) { // Do SAX parsing } else if (parserAPICode == Foo) { // Do foo parsing } Conditionals like this will greatly depreciate the speed of your code if every method is littered with them. Better to just write a new parser for every new API. Nevertheless, having a standard way for each parser to get at the low level stuff makes sense from a code-reuse as well as consistency standpoint. > 2. XML parsers need to compete on speed, memory usage, etc., and to do > so, they need to be free to take different approaches. I was suggesting that you would still have an interface, but a default implementation for byte to character encoding in the SAX package I feel is perfectly reasonable. I may get flames for this, but I think most parsers will compete on how they solve an application's XML handling problems (the design) not on whether one parser is 1% faster than another. In this case, a default solid implementation for character encoding would allow parser writers to concentrate on coming up with new and interesting ways to allow applications to model XML content, instead of having to worry about bit shifting all over the place. Typically, low-level stuff such as this I feel should be implemented once and then reused over and over again. There are only so many ways to write character encoders / decoders and I would wager that most parsers out there pretty much have very similiar implementations for reading from byte streams. XML's beauty is not in the fact the spec defines support for about 6 or so different character encoding formats, it is in everything else. If another character encoding format comes out, then every SAX parser will have to possibly do a rewrite. If people could agree upon one good efficient dependable implementation, then no one (other than the people doing the 600 or so lines of character encoding implementation code) will have to do a thing. Of course, people could plug in their own character encoder / decoder implementations if they so choose, but at least they would have the choice. I really think it would of made a hell of a lot more sense for XML to have one standard encoding format, say UTF-16 or UTF-8 instead of actually defining in the spec the actual legal encoding formats. It would make much more sense I feel to just convert everything to a UTF-8 or UTF-16 format if documents were indeed in a different format, rather than to force parser writers to handle just about every major character encoding format known to man. One example would be databases which may store XML content in a proprietary character format. An XML parser for the database will need to do this translating anyways from the native character format to something defined in the XML spec (unless you want to deviate from it). Anyways, just some suggestions... Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|