[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: XML apparently cannot be used for general text mar

Subject: Re: XML apparently cannot be used for general text markup: whitespace gripe
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Tue, 19 Mar 2002 10:37:21 -0500
xml preserve spaces in text
[Chad Jones]
>
>  I've noticed a lot of xml-derived web pages out there have screwed up
> whitespace (words crammed together or an incorrect space before ending
> punctuation).
>
>  My conclusion is that blocks straight text (such as paragraphs) cannot be
> further marked up with XML without screwing up spacing.
>
>  For example, can anyone get this simple document into HTML without either
> removing required spaces or adding inappropriate spaces?
>
>   <?xml version="1.0"?>
>   <book>
>      <par>
>       Is his name really <first>John</first>      <last>Doe</last>?
>     </par>
>   </book>
>

You have to distinguish between several different cases.

1) What you see in a browser.  Normally (except text in special elements
like <pre>) a browser collapses multiple whitespace character sequences down
to a single space.  The spaces present in the source file display as single
spaces.

2) What the xml parser does by default (or by instruction).  This affects
the whitespace that is passed to the stylesheet processor, and specifically
whitespace-only nodes.  If whitespace-only nodes are removed, you could get
the run-together words you have seen.

Microsoft's msxml3 processor (to name one) removes such nodes by default.
If you are using it in such a way that you can't tell it to preserve the
whitespace-only nodes, you can get the same effect by including an
xml:space='preserve' attribute in the root element of the xml file.  Then
your spaces will remain.

3) What the xslt processor does.  This is controlled by xsl:preserve-space
or xsl:strip-space elements, which also operate on whitespace-only nodes.
By default the whitespace-only nodes are preserved.

The result is controlled by the default or instructed behavior of the parser
and the presence or absence of the other instructions.  For the Microsoft
parser, the whitespace-only nodes are removed unless you instruct otherwise,
for Saxon they stay.  I have noticed that the xml:space attribute in the
source file has priority over xsl:strip-space='preserve'  in the stylesheet
(at least for msxml3 and Saxon), but I don't know if that is specified
somewhere or not (Mike Kay will no doubt give us the definitive answer
here).

Cheers,

Tom P


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.