[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: why whitespace counts as a node?
I think the issue of treating white-spaces in XML documents get's interesting when XML documents are validated by XML schema's. Here are the various cases I can think of (with significance to white-spaces) : 1) If the XML document is parsed by a SAX parser, then the call-back method "characters" (which get's notification of character data) will get all the characters in character data (including the white spaces). When XML documents are parsed by a DOM parser, text nodes still contains all white-space contents. Therefore XML parsing preserves white-space contents in the infoset instance the parsing process produces. I think this is desirable in plain XML parsing process, since applications may want to do something with white-spaces too. 2) Things get little interesting when XML documents are validated by say XML schema documents. Here are few examples: a) <x> 100 </x> Here the content of element "x" is numeric, but there are boundary white-spaces around the numeric value 100. This will be successfully validated by the following XML schema fragment, <xs:element name="x" type="xs:integer" /> b) <x> hello world </x> Here there are boundary white-spaces within element "x". c) <x>hello world</x> Here there are no boundary white-spaces within "x". The following XML schema fragment, <xs:element name="x"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:maxLength value="11" /> </xs:restriction> </xs:simpleType> </xs:element> would report XML document (b) as invalid while (c) as valid. This is because with the schema type xs:string, white-space contents in XML documents are considered significant (and that effects validity of character content), while with numeric types such as xs:integer white-spaces are not considered significant (and that's ignored by say an XML schema validator). On Sun, Nov 14, 2010 at 6:41 PM, Michael Kay <mike@saxonica.com> wrote: > >> Ok, so it does serve a purpose. Â However, even in xhtml, if you want >> white space in a paragraph of text, then you can put that whitespace >> between tags. Â I'm sure it's my lack of experience, but, for example, >> when do you need that white space? >> > Once you accept the usefulness of inline markup like this: > > <p>I just <i>love</i> <place>London</place></p> > > then you have to accept that the space between "love" and "London" is just > as significant as the one between "I" and "just". > > Some of the XML specs do try and recognize that whitespace in mixed content > needs to be treated differently from whitespace in "element-only content" > (like database dumps). But part of the XML philosphy is that XML instances > can be used without having a schema or DTD, which means you don't always > know whether it's mixed content or not. So you have to treat it as > significant. > > This is one of the reasons it's best to avoid "non-standard" uses of mixed > content like this: > > <date-of-birth> > <source>birth-certificate</source> > Â 1920-03-04 > </date-of-birth> > > Michael Kay > Saxonica -- Regards, Mukul Gandhi
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|