Re: SAX and whitespace (was Re: Problems with whitespace and msxml)
[I think this discussion is another good reason why SAX is urgently needed] At 09:57 01/01/98 -0500, David Megginson wrote: > > > An XML processor must always pass all characters in a document > > > that are not markup through to the application. A validating > > > XML processor must distinguish white space in element content > > > from other non-markup > >What the PR means to say here is that a DTD-driven XML parser has to >treat whitespace in element content differently than whitespace in >mixed content -- this, of course, has nothing to do with xml:space. >If there is no DTD, then all element types are assumed to allow mixed >content, so a DTD-driven XML parser ("validating XML processor") would >report all whitespace as significant. I would agree with this interpretation and prefer the phrase "DTD-driven XML parser (?processor?)". I interpret this to mean: "a processor which uses any DTD information given in the document, and which uses it to do as much validation as it and the document are capable of." However, having read the spec more carefully, I am having great difficulty in deciding *where* it allows whitespace in element content. Take the document: <!ELEMENT FOO (BAR)> <!ELEMENT BAR EMPTY> ... <FOO> <BAR> </BAR> </FOO> My reading of the spec suggests that this is an *invalid* document. Please show me where I have gone wrong... FOO has declared element content [3.2.1]. "... elements of that type must contain only child elements ***(no character data)*** [my asterisks]..." for BAR: [3.2] An element is valid if there is a declaration matching elementdecl where the Name matches the element type and ... 1. the declaration matches EMPTY and the element has ***no content*** the context of content is  STag content ETag <!-- no S? ---> and its definition is:  (element | CharData | Reference | CDSect | PI | Comment)* Again there is no place for whitespace. Therefore I cannot see where (apart from [2.10] which raises the whitespace question) whitespace is can be defined as 'non-significant'. IOW whitespace ***in the content of an element*** is only formally allowed as CharData in mixed content, and in mixed content it must be significant. I am *sure* I've missed something here as the WG has debated this for ages, but I can't see where. > >What should SAX do with ignorable whitespace? Assuming that ignorable WS is found only in element content... > >1) Report it as a distinct event, like Ælfred does? >2) Treat it as regular character data? >3) Ignore it (as in regular SGML)? > >(1) seems to be what the PR requires. Either (2) or (3) could cause >strange results. (3) is forbidden - it has to be passed through. I think it has to be (2) and (1) simultaneously. IOW in an event mode you must report whitespace (space, 3 tabs, one newline, 10 spaces) occurs "now"; in tree mode you report "I have made you an element/node consisting of PCDATA, all whitespace - it's up to you to keep/destroy it..." P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format