[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

XML Torture Test: Parsers Fail

  • From: Elliotte Rusty Harold <elharo@m...>
  • To: xml-dev@i...
  • Date: Mon, 5 Apr 1999 10:06:39 -0500

rxp xml
Without intending to do so, I have devised an XML document that exposes
many problems in almost all XML validating parsers and non-validating
parsers that resolve external entity references.  You will find this
torture test at

http://metalab.unc.edu/examples/players/index.xml

It has broken every parser I've thrown at it in one way or another
including the one in IE5  with the single exception of RXP.  However RXP
reports some warnings that do not appear to be errors, and missed some
problems involving the lack of encoding declarations in the text
declarations in an earlier version that xml4j 2.0.4 (but not 1.1.14) picked
up. These have now been fixed.

As best I can tell this document is both well-formed and valid. It's hard
to say for sure when many different parsers all fail to process it, mostly
after either giving up completely or generating incorrect error messages.
Until I'm more confident the document is correct, I'm simply defining a
broken parser as one that

1. describes a valid documbent as invalid  (Microsoft?, xml4j?)
2. describes an invalid document as valid (RXP)
3. describes an invalid document as invalid but gives the wrong reason.
(Microsoft?, xml4j?)

Once I've conclusively determined whether my document is valid, I should be
able to determine whether Microsoft, xml4j and xml4j fit into category 1 or
3 or both.

What's torturous about this example is that it defines over 1000 separate
external general  entity references in several dozen different DTDs.
Currently only one of those entities is actually used in the main document,
but I plan to expand it to use all 1000+ entities.  Thus it's likely to
become even more difficult to parse properly.  Leaving aside the question
of whether this is the proper design for this document, it's nonetheless
the case that parsers should be able to handle it.  Parser authors may wish
to investigate further. The assistance of anyone who can spot by eye
mistakes I made that the parsers may be incorrectly reporting is
appreciated.



+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@m... | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|        XML: Extensible Markup Language (IDG Books 1998)            |
|   http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://sunsite.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/     |
+----------------------------------+---------------------------------+



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.