|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML vs the Dreaded Whitespace
At 06:41 11/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > As a corollary: Is anyone testing the ESIS output of the current crop of > > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model > > or the value of xml:space they should all produce identical ESIS (right?) > > If not, then one or more is wrong. And all applications should (IMO) be > > prepared to work with ESIS which I think is isomorphous with a WF XML > > document. > >There are quite a few more XML parsers out there, including at least >one in TCL -- see > > http://www.sil.org/sgml/XML.html#xmlSoftware Apologies to anyone I missed. I am a great fan of tcl and wrote costwish in it to sit on top of Joe English's CoST... > >As for ESIS, there are some problems that we'd have to overcome first: Are there? How does a WF document differ from the corresponding ESIS stream? IOW if I do the transformation: WF -> ESIS -> WF shouldn't I be able to recover the original? > >1) How should empty elements be represented? Right now, Ælfred generates a > startElement event immediately followed by an endElement event. Yes - and JUMBO is happy with that. As far as JUMBO os concerned <FOO></FOO> and <FOO/> are processed in the same way and I will need a very clear argument to convince me that it should do different. > >2) How should the XML declaration be represented? Should it appear as > a processing instruction, or should it be ignored? JUMBO regards it as a PI. I hang all PIs off the preceding ELEMENT (not PCDATA). In that way the tree can be processed with these intact. JUMBO understands namespace PIs, <?JUMBO ...?> PIs and will also store the others. It's useful to store them in case one wants to compare trees. BTW - although it is nowhere stated most people seem to create PIs as name-value pairs and JUMBO expects this. > >3) How should space in element content be handled? According to the > spec, a DTD-aware parser should handle whitespace in element > content differently from whitespace in mixed content (Ælfred just > ignores whitespace in element content right now). This is a critical area for the parser writers to agree on. I assume that for the DTD-aware stuff there has to be a validating parser (i.e. one that matches contentspec against element content). I am not sure what algorithms are being used - JUMBO wants a java one for its birthday, please - but I can imagine that with certain contentspecs they might get different answers. > >4) DTD-aware and non-DTD-aware parsers will handle whitespace in > attribute values differently. Non-DTD-aware parsers will treat all > attributes as CDATA, but DTD-aware parsers will treat tokenised > attributes specially, by stripping all leading an trailing > whitespace, and normalising internal whitespace to single spaces. In this case presumably only the TYPE in the ATTLIST is needed. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








