[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Character streams vs. Byte Streams
Michael Kay wrote: > James Clark: > >This is fine except that it should use byte streams not character > >streams. What you get if you are reading from the net or from an > >archive or a database or whatever is bytes not characters... > > I have enormous respect for James's arguments as always but on this one I > beg to disagree. The reason I have asked for support for character streams > is so that the parser can process not only stuff stored on disc but *the > output of another program*. For example, I have an application where the > XML document is constructed as the result of an SQL query that pulls > together fragments of XML stored in different places in a database. The SQL > query, like most other programs I use and write, prefers to output > characters rather than bytes. That, after all, is the reason XML was > designed to be human-readable. > > And I have to say that in my experience so far, the parsers are so lightning > fast compared with the application that generates the XML or consumes it, > that an argument based on saving microseconds will not sway me much. This is what I found out personally and why I decided to write my own parser for my application. When using SAX I found that my DocumentHandler implementation was taking up about 75% of the processing time while the parser was only taking 25%. The main reason for this I found is that using the String.equals() method is quite expensive and is really the only good way in SAX for recognizing elements. When I switched to the Object framework I designed the parsing times for my actual parser implementation were lower, but more importantly, the time spent in the application handling the XML content was reduced to less than the time spent in the parser which was a big surprise. > I don't think there is a real problem with the XML spec. This defines the > syntax of XML in terms of characters. It requires the parser to accept > certain encodings of the character stream as a byte stream, but it permits > the parser to accept other encodings and therefore by implication to > delegate the decoding of the byte stream to another object in the system. > In fact it explicitly recognises that an "external transport protocol" might > have a say in the matter, and that is a term we could interpret very widely. Another reason why a CharacterStreamFactory I feel is a good idea. It separates the low-level encoding aspect of characters from the rest of the parser which I feel should only really need to use one type of encoding format in the first place. If there was a default CharacterStreamFactory implementation the following I feel are important issues... - The default implementation should support all of the character encoding formats defined in the XML 1.0 spec - The default implementation should have a way to add in support for custom character encoding formats (like with DB's). - The default implementation should have a mechanism to replace implementations for various encoding streams if the parser writer chooses to do so either for optimization purposes he/she feels is necessary or some other reason. The alternative I feel is never ending code bloat like in the case with current major word processors where they all have endless amounts of kludgy code for reading each others proprietary document formats and in the end just bloat the application's resource consumption significantly. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|