[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Whitespace rules (v2)
<HI/> In message <199708170743.IAA28970@a...> "Neil Bradley" writes: > > Peter Murray-Rust wrote: > >I think - along with TimB - that it is unrealistic to come up with s single >set of rules that will server every application. There was an enormous amount >of discussion on the XML group last year and I take it as axiomatic that we >cannot produce a set of rules which everyone agrees are: > - simple to state > - unambiguous > - intuitive and easy to learn > - universal (i.e. cover every situation) Axiomatic? Call me stubborn (you won't be the first), but I, for one, retain some hope. :-) > >I think that XML will include applications beyond 'browsers and typesetting >systems' although these will be the commonest. MathML and CML will have >chunks of material which contains whitespace not used primarily as part of >text. Here's a simple example: ><MOL> > <ATOMS> >[HT]C H N Cl[CR][LF] >[HT]O P Br[CR][LF] > </ATOMS> ></MOL> >where the whitespace is used (a) for visual effect and potential ease in >editing (b) as a delimiter (within ATOMS) [HT]=tab, for example. > >What I am after here is a convention that I can state which instructs the >processor how to treat this whitespace. ***I do not wish to have to devise >a specific convention for CML***. I want to be able to indicate that that >the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS content >is normalisable and used only as a delimiter of tokens. > >I expect that many other applications will use a similar approach, so I want >to share the effort with them. Examples of metadata in XML have often been >portrayed as prettyprinted and I expect that CML could use the same conventions. >[BTW I think that there will be more human editing of XML files than is often >assumed - and metadata is a good example. Prettyprinting is a useful tool >in those cases.] > >I think that we can aim for a set of options that could be used by a post-parser >processor. Different applications (**or document authors**) could choose between >them. Examples might be: > - normaliseCRLF (Neil's Rule 1) > - discardAllWS > - normaliseToSingleSpace > >An author or application could then state which of these it was using. > >It might be that in the first instance we can only agree on (say) Rule 1, but >this would be a useful start. > >> >> > I agree with Liam - I didn't understand 'blockness'. I also think that whatever >> > is done here has to be independent of stylesheets and DTDs. The average hacker >> > like me simply won't undertsand the subtleties. >> >> I am merely trying to distinguish in-line elements from other >> elements. An in-line element implies no line-breaks above or below >> it. A 'Block' element therefore DOES imply such a break. I do not use >> the terms element and mixed content here, because it is not quite the >> same thing. As I have said before, a Para element is a 'block' >> element, and has mixed content, but an Emph element is an 'in-line' >> element, yet also has mixed content. All style sheets, including >> CSS, understand the concept of in-line and block elements. Any >> whitespace surrounding a block element MUST be irrelevant. > >It looks like the context, rather than the content is the significant >feature. > >> >> Liam raised the issue of a half-way element type, such as a header >> which implies a line-break before it, but not after, so that >> following text will appear on the same line. This one is tricky. >> Suggestions anybody? > <FormattingSpecificDiscussionOfWhitespace> The idea of a "half-way" element type just highlights the fact that element nesting does not necessarily map nicely to block/paragraph structure in formatting applications. I like to say that block formatting _trancends_ element nesting -- there is no direct mapping. In my experience, a pair of lower-level concepts (eg. "block start" and "block end") has proven quite useful. In the current discussion, the "blockness" of the elements might be described as follows: "block start" "block end" ----------------------------------------- Para Yes Yes Emph No No Hn Yes No where: "block start" - means start a block at the start of the element "block end" - means end a block at the end of the element </FormattingSpecificDiscussionOfWhitespace> <GeneralDiscussionOfWhitespace> A notation for describing whitespace handling must communicate the notion that whitespace processing is modal, and provide words for each mode and phrases for the transitions. Let's consider Peter's tentative rules: > - normaliseCRLF (Neil's Rule 1) Please correct me if I am wrong, but this looks like a document-wide setting whose behaviour/interpretation isn't affected by the application type. A simple on/off PI setting could be used to set this. The rest of the rules, though, could be applied on a per-element basis: > - discardAllWS > - normaliseToSingleSpace I would add: - keepAllWS (I haven't read every word of every post in this thread. Has this third one been discarded as a reasonable option? Even if it has, the rest of my discussion here isn't affected) Assuming that the three, mutually-exclusive rules (or _modes_) can be applied to any element, how can we specify this? Would being able to specify one of the three modes on a per-element basis be powerful enough? If we used PIs to do this then some HTML tags, for example, might be listed as follows (just a hypothetical notation example, _not_ a final suggestion for notation): <?XML-SPACE-DISCARD HTML, HEAD, BODY, ... ?> <?XML-SPACE-COLLAPSE TITLE, P, H1, H2, ... ?> <?XML-SPACE-KEEP PRE, XMP, LISTING, ... ?> Notes: - HTML applications could just imply these rules. - Any elements that aren't listed would just use the current mode, which depends on the context. - If the desired whitespace mode depends on something other than the current element (an attribute, say) then this mechanism won't be powerful enough. - Specifying the whitespace mode on a per-element basis should make this technique well-suited to architectural forms, though. </GeneralDiscussionOfWhitespace> - Russ PS - Should whitespace be blacklisted? ;-) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|