[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Re: why whitespace counts as a node?

  • From: Mukul Gandhi <gandhi.mukul@gmail.com>
  • To: xml-dev@lists.xml.org
  • Date: Sun, 14 Nov 2010 21:01:54 +0530

Re:  Re: why whitespace counts as a node?
I think the issue of treating white-spaces in XML documents get's
interesting when XML documents are validated by XML schema's.

Here are the various cases I can think of (with significance to white-spaces) :

1) If the XML document is parsed by a SAX parser, then the call-back
method "characters" (which get's notification of character data) will
get all the characters in character data (including the white spaces).

When XML documents are parsed by a DOM parser, text nodes still
contains all white-space contents.

Therefore XML parsing preserves white-space contents in the infoset
instance the parsing process produces. I think this is desirable in
plain XML parsing process, since applications may want to do something
with white-spaces too.

2) Things get little interesting when XML documents are validated by
say XML schema documents. Here are few examples:

a)
<x>
   100
</x>

Here the content of element "x" is numeric, but there are boundary
white-spaces around the numeric value 100.

This will be successfully validated by the following XML schema fragment,

<xs:element name="x" type="xs:integer" />

b)
<x>
   hello world
</x>

Here there are boundary white-spaces within element "x".

c)
<x>hello world</x>

Here there are no boundary white-spaces within "x".

The following XML schema fragment,

<xs:element name="x">
    <xs:simpleType>
	 <xs:restriction base="xs:string">
	      <xs:maxLength value="11" />
	 </xs:restriction>
    </xs:simpleType>
</xs:element>

would report XML document (b) as invalid while (c) as valid. This is
because with the schema type xs:string, white-space contents in XML
documents are considered significant (and that effects validity of
character content), while with numeric types such as xs:integer
white-spaces are not considered significant (and that's ignored by say
an XML schema validator).

On Sun, Nov 14, 2010 at 6:41 PM, Michael Kay <mike@saxonica.com> wrote:
>
>> Ok, so it does serve a purpose.  However, even in xhtml, if you want
>> white space in a paragraph of text, then you can put that whitespace
>> between tags.  I'm sure it's my lack of experience, but, for example,
>> when do you need that white space?
>>
> Once you accept the usefulness of inline markup like this:
>
> <p>I just <i>love</i> <place>London</place></p>
>
> then you have to accept that the space between "love" and "London" is just
> as significant as the one between "I" and "just".
>
> Some of the XML specs do try and recognize that whitespace in mixed content
> needs to be treated differently from whitespace in "element-only content"
> (like database dumps). But part of the XML philosphy is that XML instances
> can be used without having a schema or DTD, which means you don't always
> know whether it's mixed content or not. So you have to treat it as
> significant.
>
> This is one of the reasons it's best to avoid "non-standard" uses of mixed
> content like this:
>
> <date-of-birth>
> <source>birth-certificate</source>
>  1920-03-04
> </date-of-birth>
>
> Michael Kay
> Saxonica




-- 
Regards,
Mukul Gandhi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.