|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Specification Questions
Reply-to: Peter@u... (Peter Murray-Rust) > Some additional - hopefully constructive - thoughts on whitespace. > > The XML-lang spec does not ( and I suspect will not) give detailed guidance > on how whitespace will be managed. My impression is that it is up to > implementers and/or groups like this to come up with particular solutions. > My worry is that these will be inconsistent and not inter-operable. I agree totally. This was my original concern. > *** > Therefore I propose that those on XML-DEV who care about this problem come > up with some guidelines for implementers. > *** I very much hope this happens. > XML does NOT treat whitespace like SGML and does NOT behave like HTML > (although it can be configured to do so). As far as I see them, the rules > are: > > 'All characters that are not markup are passed to the application'. (This > is independent of any value of XML-SPACE (see below), processing instructions, > stylesheets, etc.) These characters include HT, CR, LF, SP, and probably > a number of other Unicode 'whitespace' characters. What the application > does with them is *undefined* in XML-lang. > > Note that this means that CR and LF are passed as separate characters. No > normalisation takes place. Therefore > > Line one\n\rline two > > is different from > > Line one\nline two > > even if they are visually similar on various text editors/displays, etc. > (My impression was that SGML normalised these two strings to the same > ESIS output - is that right?). > > This means that the author/processor 'contract' has to be aware of this. I think all applications should be expected to either or both characters in sequence as a line end signal, so that platform dependancies can be eliminated. If there is no good reason to omit this taks from the XML-processor itself, I think it should be done there. > *** In some cases the document author and the application author are both > aware of this problem and so the whitespace characters inserted by the > author will be processed in the way that they expect. However, in most cases > I suspect this will NOT be true and that authors will inadvertently create > documents that are processed differently *** > > XML provides an attribute XML-SPACE (local to an element BUT inherited by > its children) which can have three values: > - #IMPLIED (no signals about whitespace handling) > - PRESERVE (applications preserve all the whitespace) > - DEFAULT (the *application's* default white-space processing modes > are acceptable fro this element). > > PRESERVE seems clear. All whitespace is passed to the application. The > others seem to be dangerous unless there are some general conventions. > If possible, we should propose a *general* default mechanism for whitespace > handling for XML-SPACE="DEFAULT". If everyone adopts this, it will greatly > reduce this problem. Is this a reasonable strategy? I believe so. In addition, can we not put 'XML-SPACE (PRESERVE|IMPLIED) "PRESERVE" in an attribute declaration for an element which will always have reserved content. It is common practice for a DTD to have some kind of pre-formatted element, such as HTML's '<pre>'. > If so, we can propose that the DEFAULT mode for any whitespace processing is > something along the lines (similar to HTML?). Within an element with > XML-SPACE="DEFAULT" > > All whitespace sequences are mapped into a single space character. Agreed. > All whitespace pseudo-elements are ignored (i.e. whitespace between markup) Ummm. what about 'the <b>bold</b> <i>italic</i> styles...'? > All leading and trailing whitespace in #PCDATA is ignored. I think all applications should remove leading and trailing CR and LF characters in a mixed content element. But not SP or HT, as this would be undesirable in the following fragment: A<emph> bold </emph>word. Although an unusual layout, some people may use it, and it would be unfortunate if it resulted in 'Aboldword'. > Example: > <FOO XML-SPACE="DEFAULT"> > <BAR> this > <!-- comment --> > is<!-- comment -->a DID YOU INTEND A SPACE SOMEWHERE BETWEEN 'is' AND 'a'? > bar > </BAR></FOO> > > folds to: > <FOO XML-SPACE="DEFAULT"><BAR>this is a bar</BAR></FOO> > > I think it's important to address this, since otherwise I predict we shall > have considerable confusion, especially when implementors of authoring or > processing software have not thought this through completely. Again, I agree, and I think it will be possible to achieve this with a bit more discussion in this forum. > Peter Murray-Rust, domestic net connection Neil. ----------------------------------------------- Neil Bradley - Author of The Concise SGML Companion. neil@b... www.bradley.co.uk xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








