[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fast text output from SAX?


sax output text file
At 2:25 PM -0400 4/15/04, Stephen D. Williams wrote:


>You know, like Jpeg, Tiff/Group4, Word documents (!), PowerPoint, 
>zip files, tar/cpio, jar files, gziped HTML, etc.

I routinely deal with broken JPEGs and Word documents. In fact, I was 
thinking about Word when I wrote the bit about the fragility of 
binary formats. A bad Word document can crash a system. It's been a 
while since I've encountered a bad zip or tar file, but I have seen 
them. I'm not sure what changed to make these less common. Maybe the 
software got better over time?

>When you get a corrupted XML document, you can always magically 
>recover just the right missing tags and information?  Wow, where is 
>that method in the spec?

It's a hell of a lot easier to find the information that is there 
than it is to find it in a broken Word document or zip archive. Of 
course, you can't recover what's actually missing, but text files are 
simply more accessible.

>We're realistically talking about bugs or deficiencies in code, 
>configuration, mismatch between applications, etc., not 'fragile 
>things that break' from any perspective but schema co-evolution, 
>configuration management, and programmer error, isn't that right?

No, it isn't. As well as outright bugs, you can have data corrupted 
or partially transmitted across the network, disks that develop bad 
sectors, and deliberate creation of bad data as a component of a 
denial of service attack. Do you want your system to crash because 
some hacker flipped a couple of bytes in the right place?

>You can add forward error correction, b64 or quoted text encoding, 
>and other methods to prevent corruption, but the only cure for 
>user/programmer/operator error is early error detection and clear 
>warning.  When these have already been taken care of, through 
>earlier testing in once sense or another, or other methods, it is 
>not an issue.

There are multiple layers of corruption possible. Using check sums to 
verify the data helps at one layer, but does not protect against the 
same things well-formedness checking does. Well-formedness checking 
does not prevent attacks at the semantic layer though some validity 
checks might.  Validity cannot prevent most social engineering 
attacks. Attacks take place at different points in the stack. Error 
correction (which is mostly handled by TCP anyway) is only one a 
shiedl against one kind fo attack.
-- 

   Elliotte Rusty Harold
   elharo@m...
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.