Re: MSXML DOM Special Chars Less Than 32
On Sat, 2002-03-23 at 12:17, Tim Bray wrote: > Amelia A Lewis wrote: > > > In short, the C0 characters have no universal interpretation; > > interpretation depends upon the application. It seems reasonable, then, > > that the application can encode the bloody things too. Can't use XML > > mechanisms. Base64, the usual suggestion, incurs an immense overhead. > > > I agree with the leading sentences. As for the last, Base64 encodes > 3 bytes as 4, thus incurring exactly 33% overhead. Whether that > is considered "immense" depends on your application scenario. -Tim A little more than that, actually, in a correct base64 implementation. Each 57 bytes become 76 bytes. Add two more for CRLF. Plus the final padding, which is generally but not always negligible. Lessee ... I'd do the math, but I'm not working today, so it's lazy time: original + 1/3 + 1/57. For decoding, 1 + 1 + 1/3 + 1/57, most likely, as you prolly can't discard as you decode. If "immense" is overwrought, could we agree on "significant"? Tricks like quoted-printable and encoded-word (and XML unicode numeric entities) are attractive largely because the characters they encode are *rarely* encountered, meaning that the cost is significantly less than base64. Amy! (who's spent the last two weeks writing MIME-related code, and is probably being hideously pedantic) -- Amelia A. Lewis amyzing@t... alicorn@m... Yankees are compelled by some mysterious force to imitate Southern accents and they're so damn dumb they don't know the difference beween a Tennessee drawl and a Charleston clip. -- Rita Mae Brown, "Rubyfruit Jungle"
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format