[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: nbsp fails transformation

Subject: Re: nbsp fails transformation
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Wed, 10 Aug 2011 17:08:16 +0200
Re:  nbsp fails transformation
If someone sends you a document that isn't well-formed XML, the best strategy is to get the people who produced it to mend their ways.

True. However, having &nbsp; in an XML file and finding out that all of a sudden XML is not XML anymore must be among the most frequent unpleasant surprises fresh XML programmers have to deal with. I believe it was among one of my first questions to this list as well. And my first reaction was: that cannot be, everybody knows &nbsp;, how can it _not_ be XML?


The thing is, XML is a very generic and expandable language, and entities is one thing that can be expanded upon (above the five that are always allowed: &lt; &gt; &amp;, &apos and &quot;). This is done by declaring entities in DTD declarations like Patrick suggested, or can be done by using an external DTD file and link to it.

If your input comes from XHTML or HTML, this happens often. The fix is to use the original doctype declaration and make sure that the DTD's it refers to are available. That way other entities like &mdash;, &uml; &copy; are also recognized in the majority of cases.

You can find the declaration of all these entities here: http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Latin-1_characters, it also shows a typical declaration for use in XML. Download the file at http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent, use it locally to refer to it and you can work with almost all XHTML/HTML input, as long as the rest is well-formed.

Kind regards,
Abel Braaksma



------------------------------------------------------------------------
From: 	Michael Kay <mike@xxxxxxxxxxxx>
Sent: 	Wednesday, August 10, 2011 10:19:17 AM
To: 	xsl-list
Cc: 	
Subject: 	Re:  nbsp fails transformation




Now since i can't even transform those files i can't throw those
entities out.

How do i handle this !?

If someone sends you a document that isn't well-formed XML, the best strategy is to get the people who produced it to mend their ways. Once you start accepting bad XML (or non-XML, as I prefer to call it), all the benefits of using XML for interchange quickly become lost, and you might as well revert to using some proprietary interchange format.

Michael Kay
Saxonica

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.