[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fast text output from SAX?


fast xml writer
John Cowan wrote:
> Robin Berjon scripsit:
>>I think what Dennis is looking for is for something to fairly compare 
>>the output from XBIS et al. with that of XML properly written at the end 
>>of a SAX stream. Properly written may or may not involve (depending on 
>>how paranoid you want to be -- I'd go for maximal because broken XML 
>>isn't XML anymore): transcoding, checking that Names are Names, blowing 
>>up if they contain characters that can't be transcoded to the target 
>>encoding, checking that comments and PI data don't contain -- or ?>, 
>>checking that text contains no forbidden character, that namespaces are 
>>properly used, that you're using the proper repertoires for the version 
>>of XML you said you were using, etc.
> 
> Most of these checks are representation-independent: I can barely imagine
> that anyone would bother to develop an optimized representation that
> depended on whether Names were Names, for example.  (Yeah, you could
> save 1 bit by relying on the fact that there are exactly 35122
> valid Name characters in XML 1.0, but really!)
> 
> In practice, an XML writer and an ORX (newly coined generic acronym
> for "optimized representation of XML") writer would be suitable for
> comparison purposes if they did the same set of checks.

If you go read what I said, you'll notice that I wasn't comparing XML 
with an ORX (I like the name :), simply listing a few things that I 
thought Dennis -- and certainly I -- would look for in a quality XML 
serialiser. Just dumping bytes "by hand" works when you know the kind of 
data you'll be dumping -- just as using regexen on XML is fine if you 
really know what your input will look like -- but it's not acceptable as 
a general use approach.

Since you bring the topic up however, I agree that you are right for 
some ORX but not all, and the serialisation method is a large part of 
determining the trade-offs you may or may not wish to make. Many ORX 
would use a single text encoding for instance, not requiring one to 
check a few things in that area. Schema-based ones would only need to 
check names when reading the schema, not when serialising. If you encode 
{ns,ln} pairs instead of QNames you also skip a few checks.

I'm not making assumptions as to which choices are the best, or even if 
they are worth being made (though empirical data would seem to suggest 
they are), simply showing that there are potential targets for 
optimisation worth exploring.

-- 
Robin Berjon

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.