|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: tabs to indent for pretty-printing (is it correct?)
At 14:48 2000 10 31 +0100, David Valera [and D.Megginson] wrote: >> > If I read the XML spec correctly, adding tabs and spaces >> to increase >> > readability is often done, but it is not intended to be >> saved as such >> > (significant whitespace excluded of course). >> >> That's application-specific; i.e. the parser passes all of the >> characters (including whitespace) to the application, then the >> application decides what is and isn't significant for its purposes. > >This means that if I open an XML file with an (XML)editor, add an element >and save it back again, it is possible that not only the element I inserted >is added, but also the whitespaces that the application 'decides' to >include. > >I am asking this because I have opened numerous XML files in XMLeditors and >without changing the content I was surprised to see all kinds of tabs and >spaces added to the saved document (just to make it easy readable). This is actually a somewhat interesting and less-than-trivial issue. As DavidM points out, there are XML processors (which should obey the XML 1.0 spec) and everything else--including XML editors--which are applications and can therefore exhibit application-specific behavior. The issue you raise here is really "what changes to my document does my editing environment consider to be 'insignificant' (i.e., things that it will do 'on its own')." And that depends on the main goal of the editor. If you want total control over every bit of your document, use an editor that lets you get down to the bit-level of your file. Even "ascii" editors do some interpretation (e.g., at the character encoding level). Many XML editors provide a higher level interpretation (e.g., they interpret the markup and provide GUIs to interact with it). It is common for such XML editors to "change" your document in ways other than the specific edits you requested. This is because such editors make an internal representation of your document that maintains only some of the information represented by your input document and then re-serializes that information later. Distinctions that such an editor considers "insignificant" in terms of the information content of your document may not be retained. Among the most common sorts of things are specifics of line breaks and white space. [Other things include order of attributes, whether a ' or " was used to delimit a literal, whether ' or ' was used within text (where either would be allowable), and so forth.] Some white space is clearly insignificant per the XML 1.0 spec: that within markup, and that which gets normalized out of attributes per the spec. An XML editor that add or deletes such insignificant space is not changing the information content of your document as defined by the XML 1.0 spec. Of course, you might still wish it didn't make such changes. As far as you are concerned, that "insignificant" white space may be "information" you want to preserve. One person's insignificant stuff is another person's valuable information. But most people would agree that it's fine for an XML editor to add/delete whitespace that is deemed insignificant by the XML 1.0 spec. (If this doesn't satisfy you, you'll need to edit your documents using another tool.) Now many XML editors are optimized for a specific task such as the authoring and editing of documents that will be published eventually either as [X]HTML or as composed documents, and these optimizations often drive what additional whitespace these editor applications consider to be "insignificant". Since most composition processes do their own line breaking within runs of character data (except in "preformatted" or "verbatim" regions), then the specifics of line breaks in the input file (except in such special regions) are insignificant. Therefore, it is reasonable for XML editors optimized for such applications to turn input file line ends into spaces and some spaces into line ends upon writing back out the file. (What would not be quite as reasonable for such editors is to introduce white space where none exists and where compliant XML processors would consider it significant.) HTML browsers appear to have a set of "rules" for ignoring leading white space on an input line (and perhaps other places), despite the fact that HTML is defined as a subset of SGML in which some such space would be significant. XML editors optimized to create [X]HTML sometimes treat such space as insignificant. In fact, such XML editors sometimes add such "indenting" space to "pretty-print" the resulting file. The resulting file still displays "properly" in HTML browsers. The problem is that such "indenting" space is significant as far as XML processors are concerned--specifically, it is space that would and should be significant to an XML composition system. So, if you edit your XML in an editor that "indents" and then run your XML through a composition system, you may well get unwanted space in your output. (Specifically, space inserted in mixed content, such as in the indented example: <h2 align="left"> The h2 heading text </h2> is significant and will cause the h2 text to look like it isn't aligned correctly because it will start with a significant space when it is properly composed.) If you find such "indenting" behavior on the part of an XML editor inappropriate, you might see if there is a switch/mode in the product that allows you to disable such. If not, then you will need to find another way to edit your documents. paul
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








