Re WF, V, and MSXML
Peter Murray-Rust wrote: >Note that an internal subset may be present for other reasons than validation (adding attribute values and types, as required for XML-LINK, for example). Therefore I do not think the author's intentions can be deduced from the presence of an internal subset. Presumably a pointer (SYSTEM) to an external DTD is likely to refer to a DTD which can be used for validation, but I'm not sure whether this is explicit. Yes, I think there is a somewhat different information model in XML than in SGML, and this parser (whether it's doing all the right things or not) is useful for learning and thinking about the differences. I, too, think that my "palmy" input document is invalid but WF. Thus, if MSXML is parsing to validate, it is (due to a bug or two) doing error recovery (and should be fixed on this point not to do so). I can also see some gotchas for early adopters, such as that a WF document that makes reference to the wrong DTD is still WF. And the WF-parser will check the WFness of the element declarations (even in the right DTD) even if it isn't going to use them, at least in the internal subset. Also, the internal subset is part of the XML document, and, as the spec is written, the parser must parse the subset and deliver it as part of the output (as MSXML does), even though the same is not true of an external subset. (Right?) Doesn't it seem as though the reasons for conveying the internal subset information to the application (such as those you mention) are also reasons for extracting the same information from the external subset and conveying it to the application, too? whether the document is dealt with as WF or not? IOW, an SGML parser such as nsgmls combines both subsets into a DTD and deals with information following as another unit, the "document instance set" (if I have the terminology right, per 8879 production 2), which is the part of an SGML document entity *following* the prologue. But for an XML parser, the boundaries are shifted, because it has to deal with an XML document that *includes* the prologue (XMLlang production 23, where "element" corresponds to the SGML "document instance set", I think). I don't know whether this is a good idea or not, just trying to understand it as an early adopter. (I also notice now that per productions 23 and 27, white space after the end of the end-tag of the root element is also part of the document, which is okay by me; but this seems not to be dealt with explicitly s.v. 2.8, "White Space Handling." I read that section to mean that such white space must be passed to the application by a WF-parser [the language referring to "processors which ... read the DTD" or not should be changed, because, as we see, a WF parser must read at least the internal subset part of the DTD], whereas a validating parser must not pass such white space to the application.) Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format