[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: SAX and whitespace (was Re: Problems with whitespace and msxml)

  • From: Peter Murray-Rust <peter@u...>
  • To: xml-dev@i...
  • Date: Thu, 01 Jan 1998 18:07:39

sax mixed content
[I think this discussion is another good reason why SAX is urgently needed]

At 09:57 01/01/98 -0500, David Megginson wrote:
> > >   An XML processor must always pass all characters in a document
> > >   that are not markup through to the application. A validating
> > >   XML processor must distinguish white space in element content
> > >   from other non-markup
>
>What the PR means to say here is that a DTD-driven XML parser has to
>treat whitespace in element content differently than whitespace in
>mixed content -- this, of course, has nothing to do with xml:space.
>If there is no DTD, then all element types are assumed to allow mixed
>content, so a DTD-driven XML parser ("validating XML processor") would
>report all whitespace as significant.

I would agree with this interpretation and prefer the phrase "DTD-driven
XML parser (?processor?)". I interpret this to mean: 
	"a processor which uses any DTD information given in the document, and
which uses it to do as much validation as it and the document are capable of."

However, having read the spec more carefully, I am having great difficulty
in deciding *where* it allows whitespace in element content. Take the
document:
<!ELEMENT FOO (BAR)>
<!ELEMENT BAR EMPTY>
...
<FOO>
  <BAR>
  </BAR>
</FOO>

My reading of the spec suggests that this is an *invalid* document. Please
show me where I have gone wrong...

FOO has declared element content [3.2.1]. "... elements of that type must
contain only child elements ***(no character data)*** [my asterisks]..."

for BAR:
[3.2] An element is valid if there is a declaration matching elementdecl
where the Name matches the element type and ...
	1. the declaration matches EMPTY and the element has ***no content***

the context of content is [39]
	STag content ETag   <!-- no S? --->
and its definition is: [43]
	(element | CharData | Reference | CDSect | PI | Comment)*

Again there is no place for whitespace.

Therefore I cannot see where (apart from [2.10] which raises the whitespace
question) whitespace is can be defined as 'non-significant'. IOW whitespace
***in the content of an element*** is only formally allowed as CharData in
mixed content, and in mixed content it must be significant.

I am *sure* I've missed something here as the WG has debated this for ages,
but I can't see where.
>
>What should SAX do with ignorable whitespace?

Assuming that ignorable WS is found only in element content...

>
>1) Report it as a distinct event, like Ælfred does?
>2) Treat it as regular character data?
>3) Ignore it (as in regular SGML)?
>
>(1) seems to be what the PR requires.  Either (2) or (3) could cause
>strange results.

(3) is forbidden - it has to be passed through. I think it has to be (2)
and (1) simultaneously. IOW in an event mode you must report whitespace
(space, 3 tabs, one newline, 10 spaces) occurs "now"; in tree mode you
report "I have made you an element/node consisting of PCDATA, all
whitespace - it's up to you to keep/destroy it..."

	P.

Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.