Re: Specification Questions
In message <199708020838.JAA11135@a...> "Neil Bradley" writes: [...] [Paul Prescod] > > The spec makes no special provision for whitespace at the beginning > > and end of elements. I believe that this is intended to be one of > > its simplifications over "regular" SGML. This seeming > > incompatibility is mitigated by an an SGML TC which will allow XML > > to remain compatible with (post-TC) SGML. The spec is consistent over this, I think, and says that all characters that are not markup should be passed to the application. This includes whitespace. My personal view is that without some central guidance at least, the XML treatment of whitespace will cause problems and incompatibility for two groups of people: - those who are familiar with SGML - those who are not familiar with SGML. The first group are accustomed to SGML parsers (primarily James Clark's) carrying out consistent operations on whitespace. This includes: - removing line-ends immediately after and before markup - translating markup into a small number of platform-independent codes (e.g. ' ' and '\n'). The second group will be familiar with HTML where all whitespace is normalised according to various rules of varying consistency between useragents/browsers. Apart from characters within <PRE> and related markup, all whitespace is normalised to single spaces, which and line-ends are inserted according to the user-agent software, not the document's content. Treatment of 'special' characters (e.g.   and other escaped characters or entities) is probably inconsistent. However, in general, whitespace is not a current concern of the second group. ***Both groups are in for a serious problem with XML unless there is some central guidance. Otherwise we are at the mercy of any software implementor. *** <QUESTION> What whitespace characters can be passed to the application? Regardless of what is done with it, is CR+LF treated in the same way as LF or CR alone in a document? </QUESTION> If not, we shall appear to be in for variations according to what platforms the document is created on. It will be no use telling people that this is what the spec says - I had always assumed that one of the attractions of SGML was that it removed platform-dependent documents. But reading XML-lang  suggests that CR and CR+LF produce different results. The result of parsing, therefore, passes original whitespace to the application. Thus: <P>two spaces</P> and <P>two spaces</P> are different documents. So are: <P>no line feeds</P> and <P> no line feeds </P> The first will confuse anyone accustomed to HTML only. The second will also confuse them, and in addition will confuse some current users of SGML. > > > > Paul Prescod > > Is it up to the application to decide what to do with any leading line > ending code in these positions then? > > I am pleased to be rid of the 'record' concept (using RS and RE) > defined for SGML, particularly as I have tended to use Mac and UNIX > systems which use a single character to end a line (albeit different > ones!). However, I still think there is too little information on the > effect of line ending codes in mixed content. Obviously the safe thing > to do is to make the content of all elements with a mixed content > model fit on a single line, as in: > > <p>This is a <b>long</b> paragraph.........................</p> > > But with large text blocks, created using text editors, people will > continue to use line ending codes to make it readable on-screen. > Normally, a break between words would be interpreted as a space when > the block is paginated: > > <p>This is a <b>long</b> paragraph that is broken over two > lines, with an implied space between 'two' and 'lines'.</p> Yes. Most people will want to work this way. Very long lines are a menace for many types of software. We must assume (and in many cases encourage) people will read and even edit XML documents with non-XML tools. > > Yet what happens when a comment or processing instruction > appears on its own line? > > <p>This is a long paragraph that is broken over two > <!-- comment --> > lines, with an implied space between 'two' and 'lines'.</p> > > Is this interpreted as "two <!-- comment --> lines...", which reduces > to "two lines"? No. it reduces (I think) to: "...two lines..." If there is one single 'obvious' issue which will prevent the take-up of XML by 'ordinary' people (like myself) it is whitespace. The present position on whitespace is: - the rules are clear but not prescriptive - the rules are non-intuitive to most people - the rules allow many different ways of processing a given document - the role of whitespace in a given document will depend on the software used to process it The philosophy of the XML-lang authors is consistently: - whitespace is a problem for the application, not the spec. - there is no generic way of treating whitespace [I should make it clear that this isssue has been debated at great length, and that the present position is the considered opinion of many experts. I accept it, although I think it will be difficult to work with in practice.] Without consistent treatment, a document author has to ask 'which application is going to process my document?' It means, for example, that the way that whitespace is treated in MathML may be different from that in CML and FooML and ... It effectively destroys the possibility of (sub)document re-use, without a generally agreed convention. I know that XML-lang authors read this group and may therefore take some of these points on board. P. > > > Neil. > > > ----------------------------------------------- > Neil Bradley - Author of The Concise SGML Companion. > neil@b... > www.bradley.co.uk > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@i... the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@i...) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format