[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Specification Questions

  • From: "Neil Bradley" <neil@b...>
  • To: xml-dev@i...
  • Date: Mon, 4 Aug 1997 11:29:26 +0000

xml leading whitespace


Reply-to:      Peter@u... (Peter Murray-Rust)

> Some additional - hopefully constructive - thoughts on whitespace.
> 
> The XML-lang spec does not ( and I suspect will not) give detailed guidance
> on how whitespace will be managed.  My impression is that it is up to 
> implementers and/or groups like this to come up with particular solutions.
> My worry is that these will be inconsistent and not inter-operable.

I agree totally. This was my original concern.

> ***
> Therefore I propose that those on XML-DEV who care about this problem come
> up with some guidelines for implementers. 
> ***

I very much hope this happens.

> XML does NOT treat whitespace like SGML and does NOT behave like HTML 
> (although it can be configured to do so).  As far as I see them, the rules
> are:
> 
> 'All characters that are not markup are passed to the application'.  (This
> is independent of any value of XML-SPACE (see below), processing instructions,
> stylesheets, etc.)  These characters include HT, CR, LF, SP, and probably
> a number of other Unicode 'whitespace' characters.  What the application
> does with them is *undefined* in XML-lang.
> 
> Note that this means that CR and LF are passed as separate characters. No
> normalisation takes place.  Therefore
> 
> Line one\n\rline two
> 
> is different from
> 
> Line one\nline two
> 
> even if they are visually similar on various text editors/displays, etc.
> (My impression was that SGML normalised these two strings to the same 
> ESIS output - is that right?).
> 
> This means that the author/processor 'contract' has to be aware of this.

I think all applications should be expected to either or both 
characters in sequence as a line end signal, so that platform 
dependancies can be eliminated. If there is no good reason to omit 
this taks from the XML-processor itself, I think it should be done 
there.


> *** In some cases the document author and the application author are both
> aware of this problem and so the whitespace characters inserted by the
> author will be processed in the way that they expect.  However, in most cases
> I suspect this will NOT be true and that authors will inadvertently create
> documents that are processed differently ***
> 
> XML provides an attribute XML-SPACE (local to an element BUT inherited by
> its children) which can have three values:
> 	- #IMPLIED (no signals about whitespace handling)
> 	- PRESERVE (applications preserve all the whitespace)
> 	- DEFAULT (the *application's* default white-space processing modes
> 		are acceptable fro this element).
> 
> PRESERVE seems clear.  All whitespace is passed to the application.  The 
> others seem to be dangerous unless there are some general conventions. 

> If possible, we should propose a *general* default mechanism for whitespace
> handling for XML-SPACE="DEFAULT".  If everyone adopts this, it will greatly
> reduce this problem.  Is this a reasonable strategy?

I believe so. In addition, can we not put 'XML-SPACE 
(PRESERVE|IMPLIED) "PRESERVE" in an attribute declaration for an 
element which will always have reserved content. It is common 
practice for a DTD to have some kind of pre-formatted element, such 
as HTML's '<pre>'.


> If so, we can propose that the DEFAULT mode for any whitespace processing is
> something along the lines (similar to HTML?).  Within an element with
> XML-SPACE="DEFAULT"
> 

> All whitespace sequences are mapped into a single space character.
Agreed.

> All whitespace pseudo-elements are ignored (i.e. whitespace between markup)

Ummm. what about 'the <b>bold</b>  <i>italic</i> styles...'?

> All leading and trailing whitespace in #PCDATA is ignored.

I think all applications should remove leading and trailing CR and LF
characters in a mixed content element. But not SP or HT, as this would
be undesirable in the following fragment:

A<emph>  bold  </emph>word.

Although an unusual layout, some people may use it, and it would be
unfortunate if it resulted in 'Aboldword'.


> Example:
> <FOO XML-SPACE="DEFAULT">
> <BAR> this
> <!-- comment -->
> is<!-- comment -->a 
DID YOU INTEND A SPACE SOMEWHERE BETWEEN 'is' AND 'a'?
> bar
> </BAR></FOO>
> 
> folds to:
> <FOO XML-SPACE="DEFAULT"><BAR>this is a bar</BAR></FOO>
> 
> I think it's important to address this, since otherwise I predict we shall
> have considerable confusion, especially when implementors of authoring or
> processing software have not thought this through completely.

Again, I agree, and I think it will be possible to achieve this with 
a bit more discussion in this forum.

> Peter Murray-Rust, domestic net connection

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@b...
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.