[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML and Using It With Whitespace
Since no one has responded publicly Chris, here's my take on your concerns... At 02:06 05/01/98 -0500, Chris Smith wrote: > >Sorry if the subject is confusing, but it's a really concise proposal >for getting whitespace through - guaranteed. > >I earlier posted a query about the behaviour of various parsers >surrounding whitespace. I guess I'm not as hopeful as I was earlier, >at least based on the answers I received. Thanks to those who took the >time to reply. > >Essentially, I had hoped that using   to replace a space would >allow for the creation of a 'magic' difference, the same way that >the < and < are treated differently. Ideally all spaces could >become   and we could use the (invalid!) xml:space="none", leaving >only the   behind. I am not sure this buys you anything. The   is presumably occurring in content. If it occurs in mixed content or ANY it will be emitted by the parser as " " and would look the same as if you had put ordinary spaces in the document. If it occurs in element content (no character data allowed as children) then if the parser accepts it as whitespace it will be treated as if it was a " ". [I still have my concerns as to *where* the spec explicitly allows " " in element content...] [...] > >I think parsers can still correctly read such files. But it points to >a more general problem. If I read such a file with a parser, how can I >write it out again exactly (and I mean *exactly*) the way it was read? >If the parser doesn't indicate clearly where substitutions with >entities were done, then I can't put them back in the file. The same >problem occurs with empty elements. Although the XML spec wants to >imply that <tag></tag> and <tag/> are the same, some might see them as >the difference between a zero-length content and null content. Either >way, if the original XML contains <tag><tag/>, then that is what >should go back out. If it later contains <tag/> then the both >references should remain different from each other and unchanged. There has been discussion on this and my understanding that the unequivocal policy is that <TAG></TAG> and <TAG/> result in exactly the same events or grove and there is NO way of distinguishing which the original document contained. Some people regret this, but the decision is clear. > >To wrap up the options, I'll run through the same paragraph using >three different techniques. > >2....Using character entities - still my favourite, since they work in >attributes as well. Out of all of them, this, to my eyes, looks like >it could easily have been placed in the XML 1.0 spec without breaking >anything else that is in the spec, simply by adding the >xml:space="none". &spc; could be   and &lf; is so no new >entities would have to be added. xml:space="none" is NOT allowed in the XML spec. > ><p xml:space="none">Finally,&spc;the&spc;other&spc;idea&spc;is&spc;the >&spc;one&spc;at&spc;the&spc;bottom&spc;-&spc;use&spc;elements&spc;for&lf; >spaces,&spc;tabs,&spc;and&spc;lineends.&spc;&spc;There&spc;is&spc;a&spc; >single&spc;attribute&spc;n&spc;to&spc;indicate&lf;repeat&spc;counts.</p> Assuming that you have something like: <!ENTITY spc " "> Then the paragraph above will be result in the same parser output as if they had been spaces (except that it might report the internal entity events). > >3.....With only elements. > ><p xml:space="none">Finally,<s/>the<s/>other<s/>idea<s/>is<s/>the<s/> >one<s/>at<s/>the<s/>bottom<s/>-<s/>use<s/>elements<s/>for<l/> >spaces,<s/>tabs,<s/>and<s/>lineends.<s n="2"/>There<s/>is<s/>a<s/>single ><s/>attribute<s/>n<s/>to<s/>indicate<l/>repeat<s/>counts.</p> If you really care about every character this is a reasonable way of doing it, but it will generate a large number of events or (in a tree) require a lot of nodes to be created. Both will impact performance. Part of the problem arises from the requirement (which I strongly support) that "XML documents should be human legible and reasonably clear". In some cases something has to be sacrificed and it looks like you are happy to let this one go... > >Clearly, you must have the DTD to make sense of the last one! However, >I see a rather interesting side-effect, namely that this one could >likely be added using a namespace. (Tangent: any parsers experimenting >with namespaces?) Parsers are NOT allowed to experiment with namespaces :-). Parsers must recognise ":" as a valid name character. That's all. Humans can experiment with namespaces. So can applications. PaulG has pointed out that the latest namespace proposal is confidential, so discussion of that is inappropriate. However, going on the information in the public domain (e.g. the RDF draft) JUMBO has implemented a namespace experiment. For what you are doing, I suspect stylesheets would be more valuable. > >In summary, the distinction is, as a reply noted, between "wanted" >whitespace and "unwanted" whitespace. The XML specification wants to >leave it to the application because there are far more 'whitespace >convention sets' than it is desirable to put in the spec. However, >there are far more applications than there are 'whitespace convention >sets', and the application designer wants to pick one, not reinvent >the wheel. I fully agree with this, and if no one else makes proposals... But we need to concentrate on SAX at the moment. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|