[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: normalize-space() except ...

Subject: Re: normalize-space() except ...
From: "Wendell Piez wapiez@xxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 11 Mar 2015 14:33:29 -0000
Re:  normalize-space() except ...
Hi,

On Tue, Mar 10, 2015 at 5:40 PM, Flynn, Peter pflynn@xxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> I do almost exactly this in several applications. I think it's fairly
> common.
>
>>     watch for
>>     <p>The man wore<i> black </i>socks</p>
>>     which is not unlikely in XML made from word processing software.
>
> Slightly more common would be <p>The man wore <i>black </i>socks</p>
> where a double-click highlight in the WP software included the trailing
> space on the word (someone just told me Word has just stopped doing
> this: can anyone confirm?).
>
> More pernicious is the erroneous elision of white-space-only nodes in
> mixed content:
>
> <p>The man wore <b>black socks<b> <i>only</i> on Tuesdays.</p>
>
> resulting in The man wore black socksonly on Tuesdays. due to a faulty
> xsl:strip-space (white-space-only nodes between subelements in mixed
> content should probably never be removed, which is sometimes hard to
> explain to people unaccustomed to document-class XML).

Indeed.

Usefully, current versions of Saxon offer the option of refererring to
a DTD or schema to determine where stripping of whitespace-only text
nodes is safe (i.e., not in mixed content). But this is on the
boundaries of XSLT (which doesn't say much about how inputs may be
pre-processed), and not standardized AFAIK.

For many projects, having an XSLT that does nothing but normalize
whitespace can be useful. Such an XSLT needs to make distinctions
between three types of elements: those that contain elements only;
those that contain text-only or mixed content; and those such as HTML
'pre' where all whitespace is significant (not only as "white space"),
or descendants of those. (That is, in HTML, pre/b works differently
from p/b.) However, it's a different order of problem to generalize
this transformation across document types; the logic will be different
based on your authority for these these distinctions (whether schema
or data set), as well as what you actually consider to be "pretty" in
the result.

Cheers, Wendell

-- 
Wendell Piez | http://www.wendellpiez.com
XML | XSLT | electronic publishing
Eat Your Vegetables
_____oo_________o_o___ooooo____ooooooo_^

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.