|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] White Space
In message <2.2.32.19970412033137.00f0f4e0@j...> James Clark writes: [...] > > One might think so, but since C has mixed content and no white-space in > mixed content is automatically ignored, the white-space following the D and > E elements will be data and hence constitute pseudo-elements. Thus > ID(F)PREVIOUS(-2) will actually designate E. Having read this (helpful) reply pointed out a problem I had overlooked, I've gone back to the XML-LANG spec (2.8) to clarify my thoughts and failed to do so :-(. Regardless of how desirable the present policy is (and I'm sympathetic to those trying to formulate a policy) I can't put a precise meaning on 2.8. Please forgive my normal blundering through this. para 2: 'An XML processor which does not read the DTD must always pass all characters that are not markup through to the application'. The implication is that the processor (== 'parser' at this stage?) must recognise mixed content, so that [without a DTD]: <C> <D/> </C> is mixed content and contains 3 elements (the first and third being pseudoelements consisting of a newline). [My naive understanding of SGML is that there would only be one element, since start and end newlines are ignored in mixed content. Since all SGML applications require a DTD, SGML and XML give 'different' results here.] 'An XML processor which *does* read the DTD must always pass all characters in mixed content that are not markup through to the application.' [Presumably the newlines are not markup?] 'It may also **choose** to pass white space occurring in element content to the application. If it does so, it must signal to the application that...' [and the rest of the sentence appears to have been truncated in the public drafts; please can we have it back :-)] Presumably this latter occurs if something like: <!ELEMENT C (D)> has been included, making it clear that C does not contain mixed content. My reading is that the *parser* can decide (choose) what to do with this whitespace, so that different *parsers* can give different results here. The *application* (e.g. browser) has to be prepared for differing inputs from the same document according to the parser used... The treatment of DEFAULT|PRESERVE is that the parser simply passes this flag to the *application* but takes no special action itself so that all parsers should behave identically. Presumably a parser without a DTD has to create pseudoelements when it encounters characters that are not part of markup. (Is the term pseudoelement used in the spec?) So according to whether the parser finds a DTD or not it will create different numbers of elements/pseudoelements for the application. It is under no obligation to tell the application how it arrived at what it is passing to it :-) So that the occurrence of pseudoelements consisting of newlines do not imple mixed content since they may have occurred in element content and the parser chose to pass them through. My current hope would be that this is a problem which we could separate into parser and application and that parsers could hide some of the intracies from the application developer (including those writing generic browsers like me.) I'm not clear whether (a) this distinction is clear in the spec. (b) whether current parser writers all agree on what should be done. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








