|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Specification Questions
In message <199708050949.KAA07792@a...> "Neil Bradley" writes: > > > Reply-to: Peter@u... (Peter Murray-Rust) > > > Some additional - hopefully constructive - thoughts on whitespace. > > > > The XML-lang spec does not ( and I suspect will not) give detailed guidance > > on how whitespace will be managed. My impression is that it is up to > > implementers and/or groups like this to come up with particular solutions. > > My worry is that these will be inconsistent and not inter-operable. > > I agree totally. This was my original concern. > > > *** > > Therefore I propose that those on XML-DEV who care about this problem come > > up with some guidelines for implementers. > > *** > > I very much hope this happens. > [...] > > I think all applications should be expected to either or both > characters in sequence as a line end signal, so that platform > dependancies can be eliminated. If there is no good reason to omit > this taks from the XML-processor itself, I think it should be done > there. > > [...] > > I believe so. In addition, can we not put 'XML-SPACE > (PRESERVE|IMPLIED) "PRESERVE" in an attribute declaration for an ^^^^^^^ I think you meant DEFAULT - #IMPLIED is when no value is given. > element which will always have reserved content. It is common > practice for a DTD to have some kind of pre-formatted element, such > as HTML's '<pre>'. > > > > If so, we can propose that the DEFAULT mode for any whitespace processing is > > something along the lines (similar to HTML?). Within an element with > > XML-SPACE="DEFAULT" > > > > > All whitespace sequences are mapped into a single space character. > Agreed. > > > All whitespace pseudo-elements are ignored (i.e. whitespace between markup) > > Ummm. what about 'the <b>bold</b> <i>italic</i> styles...'? > > > All leading and trailing whitespace in #PCDATA is ignored. > > I think all applications should remove leading and trailing CR and LF > characters in a mixed content element. But not SP or HT, as this would > be undesirable in the following fragment: > > A<emph> bold </emph>word. > > Although an unusual layout, some people may use it, and it would be > unfortunate if it resulted in 'Aboldword'. > OK - I had overlooked this. Taking account of other posts on this subject here and elsewhere, there seems to be a positive view that a set of Guidelines/Best Practice/Gerally Agreed Conventions should be developed, and that XML-DEV is probably the right place. It's also clear that the more of this that can be done before the XMLProcessor output gets to the *specific* application - e.g. a browser or transformer - the better. We seem to be looking at a filter or layer immediately after/on_top_of the XMLProcessor. At the ESIS stream level we could have: Document ->[Parser] -> ESIS -> [XMLWhitespace] -> NewESIS -> [Application] and at the API level something that either sits on top of the EventStream or the final TreeFactory (or whatever it's called). (There is a difficulty in filtering any document, in that XPtrs in XML-LINK would appear to have to operate on the unfiltered document (although this is not specifically stated, it's implied). So it might have to be that the stream or tree contained 'significant' and 'non-significant' whitespace, and that the application would have to be able to recognise the flag. All Xptr activity has to take place on *all* whitespace (although I don't think this is pretty). The current switch PRESERVE is clear (everything goes through). It would go against the spec if it didn't do this. That means (I suppose) that CR+LF is different from LF - that's the price paid for PRESERVE. The other option DEFAULT cannot map onto a set of actions that we all agree for all documents. Therefore we have to give DEFAULT some hints at the *document* level - presumably through PIs. Can we propose, therefore. a set of PIs that would control whitespace processing? I would hope that we could keep this to a very small number (ca. 3-4). Is it too simple to suggest that there are two types of markup (STRUCTURE and TEXT) that need to normalise whitespace? the former would deal with things like: <PRETTY> <PRINT> </PRINT> </PRETTY> where the author did not intend there to be any whitespace, and the second would deal with <P> This is a long space in a <B>paragraph</B>. </P> where all whitespace would be normalised to a single space as in HTML? Where a document contained both, the author could use a PI to switch between them. If we could come up with a very simple set of options, it might make it sufficiently simple that a standard filter could be devised, or the application programmer had a much simpler strategy. Is consensus possible? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








