[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Another way to present XML data

  • From: "John P. McCaskey" <mailbox@johnmccaskey.com>
  • To: xml-dev@lists.xml.org
  • Date: Wed, 13 Sep 2017 11:52:46 -0400

Re:  Another way to present XML data

For background on whitespace and mixed content in text encodings such as TEI, see https://wiki.tei-c.org/index.php/XML_Whitespace.


On 9/13/2017 11:42 AM, Peter Flynn wrote:

On 13 September 2017 11:32:10 yamahito <yamahito@gmail.com> wrote:

> The case I often find processors screwing up is:
> <p><u>underlined</u> <i>italic</i></p>
> Note the significant whitespace between the <u/> and <i/>

This case is extremely common, and once of the places we messed up.

I argued long with Sebastian over it: he maintained that because the application must [apparently "must"; I never understood why] always receive the same information from the parser -- regardless of whether the parser has used a DTD/Schema or not -- the rule of removing white-space-only nodes had to be honoured in all cases.

I respectfully disagreed, holding that iff the DTD (in the case we were discussing) made it clear that the context was Mixed Content, then white-space-only nodes were *significant* and *must* be passed intact to the application (ie neither normalized nor annulled).

Sebastian was shocked that I would expect different results to be passed into the application depending on whether a DTD/Schema as used or not; I attempted to persuade him that a FIXED attribute or a REQUIRED attribute with a default value would be a case in point, but we never resolved the matter satisfactorily.

It's easily fixed in the classes of text document with which I usually deal, at the cost of a few cycles: in every XSLT template which matches an element type in Mixed Content, make the first action a call to a named template which checks if the immediately-preceding node is an element node of a type which would normally be spaced in the class of text documents you handle; if so, add a space token to the result tree.

This needs more refinement if, for example, you deal with TEI documents containing character-level element markup *within* words (eg lingustic or editorial markup) where adding space would be an error. But in the conventional run of textual material (eg XHTML, DocBook, JATS...) I have found this rarely causes a problem.

It would, of course, be much better if we fixed the problem and went back to the rule that space in Mixed Content is significant, and all else is insignificant, when it is possible to identify the context as Mixed Content. But that would cause too much pain at this stage; it's hard enough as it is to persuade text owners to consider XML as things currently stand -- to change punts in mid-stream would not help.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.