[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Reality Check (was: Why the Infoset?)
Sean McGrath wrote: > > All, > > Of the 6 parsers listed as of today on > http://xmlconf.sourceforge.net none > of them fully conform to XML 1.0. > > Our debates on this list so often > pre-suppose XML compliant > tools. What does it matter what we decide to > put in/leave out of an infoset when there are > no tools capable of generating it anyway:-( I think this may be a little misleading for some readers. It may give the idea that the bugs make it impossible to make reliable XML systems. Looking quickly through the test results, it seems to me that the data says people are very, very well-served if they send standalone WF XML that is * well-formed with no encoding errors * conservative with whitespace and name characters * avoids complex uses of parameter entities Developers writing systems that receive XML should pay attention to whitespace (newline substitution, stripping leading or trailing whitespace in attribute values, whitespace in mixed content next to elements). At this point, 95% of XML developers can say "Oh, I'm OK and leave", probably! Other perspectives on interpreting the data are welcome. I would much prefer the test report to be categorized into 1) error on good document 2) no error on bad document Lets roughly categorise the WF test results into 4 groups: * instance misparsing * prolog/subsets/entity errors * whitespace errors * Incorrect or missing diagnostic errors. If we look at things in those terms, we find for the WF tests (I apologise for any mistakes, this is a quick count) Parser succeed instance prolog whitespace diagnostics*** ------------------------------------------------------- Sun 1066 0 0 0 6 (xml:lang) Aelfred 1062 0 7 0 3 XP 1057 0 9 5 0 Xerces/C 1043 3 26 0 0 Xerces/J 1020 0 0 46 6 (xml:lang) MS200* 963 60 30 21 5 IE5* ** 943 similar but more ZZZZ * The high numbers here do not indicate a hugely greater number of bugs than the other WF parsers. The same class of errors is being caught repeatedly. The whitespace errors mainly concern normalization of attribute values. Most of the instance errors are related to handling bad characters or handling or not handling non-ASCII name characters. ** IE5 seemed to have pretty much the same bugs as MS2000. Like MS2000, many of the bugs relate to the parser being too generous in what it accepts. It seems their dialect is a little friendlier for HTML-ish mistakes (but they should attend to it, or provide a dual mode parser "xml" and "html" or best "xml" and "sgml"). However, both IE5 and MS2000 do fail sometimes when they should not: I didn't look at the test to find out why but it is probably important. *** Some of the tests represent trivial problems: for example, that xml:lang="123" is not treated as an error-- this is a sanity check of the creator of the document rather than the parser! Of course, the test suite is correct to test it: but when reading the numbers one should realize that not all errors need to be weighted equally. I note that the tests of the Validating parsers seem to have mostly the same errors. One cannot say that supporting validation introduces a significant set of bugs. Sun's parser is clearly the parser of choice for conformance. Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|