|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML Torture Test: Parsers Fail
At 3:51 PM -0700 4/6/99, David Brownell wrote: >Chris, these aren't errors ... unless there are references >to those entities (&baseball; and &season;) in the document, >which is not currently done. > >If IE5 is treating those as errors, it shouldn't. > >- Dave > > >Chris Lovett wrote: >> >> The problem appears to be in braves.dtd. You have the following: >> >> <!ENTITY baseball SYSTEM "braves/baseball.dtd"> >> <!ENTITY season SYSTEM "braves/season.dtd"> >> >> and these DTD's exist - so you have general parsed entities pointing to DTD >> information which is not right. >> >> Once these two lines are removed from braves.dtd everything loads fine in >> IE5. >> That does seem to be the problem. Once I fixed that, IE 5.0 could load the document from my local hard drive, but it still failed to load it from the Web site. I don't yet know why. I think what this whole mess is showing, given the widely varying problems with so many parsers, is that validation is not nearly as simple as it seems, especially when the validators are asked to handle large files. A couple of decades ago a lot of bugs were exposed in various compilers for various languages when the output of various program generators like lex and yacc were thrown at them. While these compilers could handle anything a human programmer was likely to write, they failed when faced with automatically generated code. The compilers made too many assumptions about what code looked like that weren't part of the language specs. I suspect we're seeing something like that here. These files and the DTDs containing the entity references were all created by a program that pulled data out of a database. Only the basic structure of the document was designed by hand. Pouring a database into a custom designed XML vocabulary is not unusual, but programmatically creating the entity references does seem to be unusual. I worry about what's going to happen when we start writing programs that not only generate the data and entity references but also the vocabulary. We're likely to uncover even more bugs and underlying assumptions about what XML files look like. This one document uncovered verifiable, repeatable problems in four separate independently developed parsers. What's interesting is that these were four completely different problems. We may be able to learn something from the more formal, verifiable approach to compiler design that's taken hold over the last 20 years. We need to think about a more formal specification of XML, and perhaps provably correct parsers. At the very least there needs to be more connection between the spec and validating parsers. The BNF grammar is straight-forward (though at least one parser doesn't seem to be relying on it) but the validity constraints are a mess. The various schema proposals may present an opportunity to fix this. We should consider very carefully whether a given schema grammar can be easily (preferably autamtically) translated into a parser for schemas based on the grammar and documents based on particular schemas. +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@m... | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML: Extensible Markup Language (IDG Books 1998) | | http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://sunsite.unc.edu/javafaq/ | | Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/ | +----------------------------------+---------------------------------+ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








