[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Character Stream vs. Byte Stream proposal...
Why not simply have a standard factory that takes any type of InputStream (UTF-16, UTF-8, etc) similiar to how the parse method works and it returns a type (say CharacterStream) which can then be passed to either the parser or the application. In this case the implementations for doing all of this low level character reading from bytes could be standardized for each platform. This way you could have a lot of different parsers that don't have redundant character converting implementations in the parsers that as I have seen add to almost 50% code bloat in some instances. Yes this would mean a concrete implementation for all of these types of streams in a CharacterStream factory would have to be agreed upon for each language, but I feel this is absolutely essential to SAX as it makes writing parsers a ton easier since you don't have to worry about very low-level encoding formats that can take years to learn. Java would not be successful at all if all the low level stuff was simply defined as interfaces and not as concrete implementations. If XML adds or removes encoding formats or other low-level specifications in the future), many parser writers may not have the time or expertise to redo everything all the time. The closest analogy I can think of is if everyone had to write their own Java version of System.arraycopy(). Having about 5 billion different byte to character translation implementations would be akin to having 5 billion java.util.Vector implementations. Nevertheless, the standard factory could be represented as an interface so that parsers which absolutely need to do their own byte to character translation implementations for the parser could do so. The closest analogy I can think of to this is the pluggable sockets framework in JDK 1.1 and beyond. Any ideas. I don't want to see SAX turn into an interface explosion, nor do I feel all parsers should do the most redundant activities possible at the I/O level. Last but not least, some parsers (such as the one I have) could of course benefit immensely by having a concrete default implementation for these character streams as for people like me, low-level byte to character I/O is not my personal forte. The parser I have written uses its own proprietary XML Object framework which I feel is more efficient in some respects for modeling data in Java than an event based parser like SAX. It is non-validating right at the moment (unfinished), and it seems to parse 200% faster than Aelfred right now for my documents which was a huge surprise - 220 milliseconds parsing vs. Aelfred's 459 milliseconds after several tests. Spitting out XML data in a tree like form took under consistently 20 milliconds. Please take these numbers with a grain of salt as the parser is currently pretty much non-validating as well as the fact that the XML documents were not large enough I feel to do any true comparison. The main goal of the framework was to eliminate the common if-then-else handling in an event based parser which may be part of the speed increase. Simply having a fast parser I feel is not useful if the way it spits out data to applications requires signigicant overhead to handle. This approach I feel has significant advantages to event based parsing, however it also has significant drawbacks as well that are hard to elaborate on unless I go in depth about how the parser works. For the actual application I have modeling data in an event based way has maintenance problems and the Element factory concept of parsers like MSXML I feel are very resource hungry since they essentially construct I symbolic tree at runtime (at least that is my understanding). I would of preferred not to have to do any XML parser writing at all, but I just felt that for my particular application, event-based parsing, or a parser that represents elements as an XML tree, I feel were flawed in design for the needs my application has. I would make the XML Object Framework free since its design is totally removed from the application itself and we will never actually try and make money off of, but the startup I am with is in the process of incorporating itself and until that happens I cannot just hand out stuff for free for legal reasons other than under my personal name (-: For those interested, it handles both input and output of XML data in a very similiar method to how Object serialization works in Java. In fact the application I am developing needs to represent its content in both formats for various technical and political reasons. Oh well enough of the self-aggrandizement... In summary, I think this would immensely help out all parser writers, not just the ones who have event based parsers as it would significantly reduce code bloat for SAX parsers (and therefore the applications) as well as allow all parsers to use an efficient default byte to character factory rather than have to muddy themselves with bit shifting of octets. If there is even a dream of dynamically loading various parsers at runtime, I think it should be a priority to eliminate as many possible redundancies between parsers as possible, not just for the parser writers sake, but the actual people who use XML in their applications. Byte to character encoding via a default factory interface (with a default implementation that comes with SAX) I think would be a good start. Tyler P.S. - My comments about my parser in comparison to Aelfred are in no ways meant as a challenge to Aelfred at all as I have the greatest respect for David. In fact, my parser in the end with validation and such may in fact be much more inefficient than Aelfred or any other major parser. I guess when I finally finish it up, I will be able to see what the true results are. Nevertheless, I think my approach will significantly improve performance by the application in handling XML documents even if the parser itself is inefficient. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|