[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML and whitespace: lets just dump CR and LF!
> From: Eric Baatz <eric.baatz@E...> > > XML applications should ignore *ALL* CR and LF as a bad joke. > > That doesn't seem reasonable from my point of view, although an option to do > so might be reasonable. For example, my XML application, which reads text > and speaks it, is likely to be fed existing text that is only lightly marked > up with XML and that uses CR/LF (or newlines) and whitespace to convey > important information. My application needs to see that information to > operate in an acceptable manner. For example, input could be narrative > paragraphs denoted by adjacent newlines (or CR/LF's), poetry (lots of > prosodic information is in the the breaks and whitespace), or columns of > text (such as newspapers) and numbers (such as spreadsheets) that have not > been reduced to a single logical flow of characters. Under the current proposals, white-space is preserved or defaulted. (This relates to labelling data for applications, not on how the application presents it.) So there is no way to indicate whether newlines are hard returns or soft returns. I think this hearkens back to XML last year, when the idea was around that XML without declarations would be mainly used for closed-systems, where the recieving end had been built with a specific DTD in mind. Now it seems that this is not a big factor in the WG's mind, as the XML-ATTRIBUTE discussion show: the WG wants to support systems that work with many DTDs, even if they are not declared. (I, of course, think this is a mistaken change in direction for XML, but I bow to collective wisdom.) Under a closed-system approach, it made sense to say "default" or "preserve", since "default" and "preserve" might have some determinate meaning. Under the new all-singing-all-dancing direction for XML, I think they make little sense. If XML-SPACE is just "preserve" or "default", then document instance's newline coventions must be tailored for each application. But what if we are processing against an architectural form? Then every instance must use the the newline conventions belonging to the meta-Document Type Definition. And what if you have different AFs active at different parts of the document, or even applicable concurrently on some elements? Then all the meta-DTD's newline conventions must match, or you must adopt different conventions at different parts of the document. A hard return should be explicitly marked up: whether it is an attribute or a PI or a <BR/> element or 
, it should not be stuck outside the element in CSS or DSSSL--it is part of the data, not an artifact of formatting. (I suppose that the Remappers will think it desirable to define a new standard XML attribute that specifies which convention you use (PI, attribute, <BR>, character reference, entity reference) to signify hard returns, and then provide other attributes to let us cope with existing DTDs that have churlishly adopted their own, prior, conventions. But I think it is simpler to merely say "The only way to signify hard returns in XML is 
" ) If you have gotten rid of hard returns, then next we need to sort out newlines that are soft returns in data from newlines that are in (or "attributable to") markup or element content. For this distinction, XML-SPACE may be good enough, in a brutish way. But I think that the Interleaf option, of making newlines not significant for presentation, is superior, for the reasons given before. I would also add another: it may simplify indexing into character strings--if you decide "CR and LF are not significant for presentation or indexing" then you get rid of the problem of documents needing to tell you which newline conventions they have adopted: you don't care, and the users are free to translate between different conventions without impacting indexes into documents (all other things being equal). Rick Jelliffe P.S. An Omnimark program to markup an existing well-formed HTML-in-XML document would be merely to add to a XML normaliser: TRANSLATE "%n" WHEN ANCESTOR IS PRE OUTPUT "
%n" TRANSLATE "%n" OUTPUT "%n " This does not seem too complex at all. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|