[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML vs the Dreaded Whitespace

  • From: Peter Murray-Rust <peter@u...>
  • To: xml-dev@i...
  • Date: Thu, 11 Dec 1997 14:37:39

store whitespace xml
At 06:41 11/12/97 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
>
> > As a corollary: Is anyone testing the ESIS output of the current crop of
> > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model
> > or the value of xml:space they should all produce identical ESIS (right?)
> > If not, then one or more is wrong. And all applications should (IMO) be
> > prepared to work with ESIS which I think is isomorphous with a WF XML
> > document.
>
>There are quite a few more XML parsers out there, including at least
>one in TCL -- see 
>
>  http://www.sil.org/sgml/XML.html#xmlSoftware

Apologies to anyone I missed. I am a great fan of tcl and wrote costwish in
it to sit on top of Joe English's CoST...

>
>As for ESIS, there are some problems that we'd have to overcome first:

Are there? How does a WF document differ from the corresponding ESIS
stream? IOW if I do the transformation:
WF -> ESIS -> WF shouldn't I be able to recover the original?

>
>1) How should empty elements be represented?  Right now, Ælfred generates a
>   startElement event immediately followed by an endElement event.

Yes - and JUMBO is happy with that. As far as JUMBO os concerned
<FOO></FOO> and <FOO/> are processed in the same way and I will need a very
clear argument to convince me that it should do different.

>
>2) How should the XML declaration be represented?  Should it appear as
>   a processing instruction, or should it be ignored?

JUMBO regards it as a PI. I hang all PIs off the preceding ELEMENT (not
PCDATA). In that way the tree can be processed with these intact. JUMBO
understands namespace PIs, <?JUMBO ...?> PIs and will also store the
others. It's useful to store them in case one wants to compare trees. BTW -
although it is nowhere stated most people seem to create PIs as name-value
pairs and JUMBO expects this.


>
>3) How should space in element content be handled?  According to the
>   spec, a DTD-aware parser should handle whitespace in element
>   content differently from whitespace in mixed content (Ælfred just
>   ignores whitespace in element content right now).

This is a critical area for the parser writers to agree on. I assume that
for the DTD-aware stuff there has to be a validating parser (i.e. one that
matches contentspec against element content). I am not sure what algorithms
are being used - JUMBO wants a java one for its birthday, please - but I
can imagine that with certain contentspecs they might get different answers.

>
>4) DTD-aware and non-DTD-aware parsers will handle whitespace in
>   attribute values differently.  Non-DTD-aware parsers will treat all
>   attributes as CDATA, but DTD-aware parsers will treat tokenised
>   attributes specially, by stripping all leading an trailing
>   whitespace, and normalising internal whitespace to single spaces.


In this case presumably only the TYPE in the ATTLIST is needed.

	P.


Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.