[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Fast text output from SAX?
John Cowan wrote: > Robin Berjon scripsit: >>I think what Dennis is looking for is for something to fairly compare >>the output from XBIS et al. with that of XML properly written at the end >>of a SAX stream. Properly written may or may not involve (depending on >>how paranoid you want to be -- I'd go for maximal because broken XML >>isn't XML anymore): transcoding, checking that Names are Names, blowing >>up if they contain characters that can't be transcoded to the target >>encoding, checking that comments and PI data don't contain -- or ?>, >>checking that text contains no forbidden character, that namespaces are >>properly used, that you're using the proper repertoires for the version >>of XML you said you were using, etc. > > Most of these checks are representation-independent: I can barely imagine > that anyone would bother to develop an optimized representation that > depended on whether Names were Names, for example. (Yeah, you could > save 1 bit by relying on the fact that there are exactly 35122 > valid Name characters in XML 1.0, but really!) > > In practice, an XML writer and an ORX (newly coined generic acronym > for "optimized representation of XML") writer would be suitable for > comparison purposes if they did the same set of checks. If you go read what I said, you'll notice that I wasn't comparing XML with an ORX (I like the name :), simply listing a few things that I thought Dennis -- and certainly I -- would look for in a quality XML serialiser. Just dumping bytes "by hand" works when you know the kind of data you'll be dumping -- just as using regexen on XML is fine if you really know what your input will look like -- but it's not acceptable as a general use approach. Since you bring the topic up however, I agree that you are right for some ORX but not all, and the serialisation method is a large part of determining the trade-offs you may or may not wish to make. Many ORX would use a single text encoding for instance, not requiring one to check a few things in that area. Schema-based ones would only need to check names when reading the schema, not when serialising. If you encode {ns,ln} pairs instead of QNames you also skip a few checks. I'm not making assumptions as to which choices are the best, or even if they are worth being made (though empirical data would seem to suggest they are), simply showing that there are potential targets for optimisation worth exploring. -- Robin Berjon
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|