|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Options in XML 1.0
Since I'm once again writing a chapter on interoperability issues among XML parsers, I'm pondering the division between validating and non-validating parsers and how that differs from the division between valid and well-formed documents. (The gist of the chapter is in slides at: http://www.simonstl.com/articles/interop/) It seems like the option for non-validating parsers to ignore external DTD subsets and entities came into the spec pretty late, so at some point it might have made sense for all parsers, validating or non-validating, to be able to understand the contents of the DOCTYPE declaration. Before that option appeared, document creators could count on both validating and non-validating parsers to return the same information from a document. This reasonably justified the requirement that non-validating parsers should 'speak' DTD, even the tricky parts. Once non-validating parsers were freed of that option, validating and non-validating parsers could return different results from the same document, but it's not even that consistent. Some non-validating parsers do read the external subset, etc. Developers are forced to look to finer and more obscure criteria than the main divide between validating and non-validating parsers, and users confronted with missing information in applications are bound to be confused. (The standalone declaration can only be used to identify documents which don't require external resources, not document which do require external resources, and is widely underused in any event. There's no trigger in XML for warning document consuming applications that they'd better have a parser which retrieves external resources.) At this point, I have a hard time accepting the line drawn between validating and non-validating parsers, or the justification for making all non-validating parsers understand and process whatever DTDs they happen to encounter. It seems it would have been wiser to make non-validating parsers behave consistently, either by always reading all of the DTD content or by ignoring it entirely. I spent a long time preferring the first option, but at this point I'm leaning toward the second. As fond as I have been of DTDs (believe it or not), I think it's well past time to extract them from the initial parsing process, and make them a post-processing tool, something like schemas. The document contains whatever it contains, and DTD or schema processing is considered an addition to the document, not content at the same level as the actual document content. This is tough stuff to deal with, and I don't see it changing any time soon, but I'd like to suggest that we at least consider why the lines are drawn as they are and consider alternatives that might produce more comprehensible results. Simon St.Laurent XML Elements of Style / XML: A Primer, 2nd Ed. XHTML: Migrating Toward XML http://www.simonstl.com - XML essays and books
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








