[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Whitespace rules (v2)

  • From: Russell Chamberlain <russc@w...>
  • To: xml-dev@i...
  • Date: Mon, 18 Aug 1997 18:17:29 -0400

blockness
<HI/>

In message <199708170743.IAA28970@a...> "Neil Bradley"
writes:
> 
> Peter Murray-Rust wrote:
> 
>I think - along with TimB - that it is unrealistic to come up with s single
>set of rules that will server every application.  There was an enormous
amount 
>of discussion on the XML group last year and I take it as axiomatic that we
>cannot produce a set of rules which everyone agrees are:
>	- simple to state
>	- unambiguous
>	- intuitive and easy to learn
>	- universal (i.e. cover every situation)

Axiomatic? Call me stubborn (you won't be the first), but I, for one,
retain some hope. :-)

>
>I think that XML will include applications beyond 'browsers and typesetting 
>systems' although these will be the commonest. MathML and CML will have 
>chunks of material which contains whitespace not used primarily as part of
>text.  Here's a simple example:
><MOL>
>  <ATOMS>
>[HT]C H N    Cl[CR][LF]
>[HT]O P Br[CR][LF]
>  </ATOMS>
></MOL>
>where the whitespace is used (a) for visual effect and potential ease in 
>editing (b) as a delimiter (within ATOMS) [HT]=tab, for example. 
>
>What I am after here is a convention that I can state which instructs the 
>processor how to treat this whitespace.  ***I do not wish to have to devise
>a specific convention for CML***.  I want to be able to indicate that that 
>the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS
content 
>is normalisable and used only as a delimiter of tokens.
>
>I expect that many other applications will use a similar approach, so I want
>to share the effort with them.  Examples of metadata in XML have often been 
>portrayed as prettyprinted and I expect that CML could use the same
conventions.
>[BTW I think that there will be more human editing of XML files than is often
>assumed - and metadata is a good example. Prettyprinting is a useful tool
>in those cases.]
>
>I think that we can aim for a set of options that could be used by a
post-parser
>processor. Different applications (**or document authors**) could choose
between
>them. Examples might be:
>	- normaliseCRLF (Neil's Rule 1)
>	- discardAllWS
>	- normaliseToSingleSpace
>
>An author or application could then state which of these it was using. 
>
>It might be that in the first instance we can only agree on (say) Rule 1, but
>this would be a useful start.
>
>>  
>> > I agree with Liam - I didn't understand 'blockness'.  I also think
that whatever
>> > is done here has to be independent of stylesheets and DTDs.  The
average hacker
>> > like me simply won't undertsand the subtleties.
>> 
>> I am merely trying to distinguish in-line elements from other 
>> elements. An in-line element implies no line-breaks above or below 
>> it. A 'Block' element therefore DOES imply such a break. I do not use 
>> the terms element and mixed content here, because it is not quite the 
>> same thing. As I have said before, a Para element is a 'block' 
>> element, and has mixed content, but an Emph element is an 'in-line' 
>> element, yet also has mixed content. All style sheets, including 
>> CSS, understand the concept of in-line and block elements. Any 
>> whitespace surrounding a block element MUST be irrelevant.
>
>It looks like the context, rather than the content is the significant
>feature.
>
>> 
>> Liam raised the issue of a half-way element type, such as a header 
>> which implies a line-break before it, but not after, so that 
>> following text will appear on the same line. This one is tricky. 
>> Suggestions anybody?
>

<FormattingSpecificDiscussionOfWhitespace>

The idea of a "half-way" element type just highlights the fact that element
nesting does not necessarily map nicely to block/paragraph structure in
formatting applications. I like to say that block formatting _trancends_
element nesting -- there is no direct mapping.

In my experience, a pair of lower-level concepts (eg. "block start" and
"block end") has proven quite useful. In the current discussion, the
"blockness" of the elements might be described as follows:

           "block start"   "block end"
    -----------------------------------------
    Para       Yes            Yes
    Emph       No             No
    Hn         Yes            No

where:

  "block start" - means start a block at the start of the element
  "block end"   - means end a block at the end of the element

</FormattingSpecificDiscussionOfWhitespace>

<GeneralDiscussionOfWhitespace>

A notation for describing whitespace handling must communicate the notion
that whitespace processing is modal, and provide words for each mode and
phrases for the transitions. 

Let's consider Peter's tentative rules:

>	- normaliseCRLF (Neil's Rule 1)

Please correct me if I am wrong, but this looks like a document-wide
setting whose behaviour/interpretation isn't affected by the application
type. A simple on/off PI setting could be used to set this.

The rest of the rules, though, could be applied on a per-element basis:

>	- discardAllWS
>	- normaliseToSingleSpace

I would add:

    - keepAllWS

(I haven't read every word of every post in this thread. Has this third one
been discarded as a reasonable option? Even if it has, the rest of my
discussion here isn't affected)

Assuming that the three, mutually-exclusive rules (or _modes_) can be
applied to any element, how can we specify this?

Would being able to specify one of the three modes on a per-element basis
be powerful enough? If we used PIs to do this then some HTML tags, for
example, might be listed as follows (just a hypothetical notation example,
_not_ a final suggestion for notation):

    <?XML-SPACE-DISCARD  HTML, HEAD, BODY, ... ?>
    <?XML-SPACE-COLLAPSE TITLE, P, H1, H2, ... ?>
    <?XML-SPACE-KEEP     PRE, XMP, LISTING, ... ?>

Notes:

- HTML applications could just imply these rules.

- Any elements that aren't listed would just use the current mode, which
depends on the context.

- If the desired whitespace mode depends on something other than the
current element (an attribute, say) then this mechanism won't be powerful
enough.

- Specifying the whitespace mode on a per-element basis should make this
technique well-suited to architectural forms, though.

</GeneralDiscussionOfWhitespace>

 - Russ

PS - Should whitespace be blacklisted? ;-)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.