[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Andrew Welch" <andrew.j.welch@g...>
  • To: "Karr, David" <david.karr@w...>
  • Date: Mon, 29 Sep 2008 16:40:37 +0100

2008/9/29 Karr, David <david.karr@w...>:
> I pointed out to a client that they're seeing failures parsing XML because
> some of the element content that they're producing contains characters
> illegal in XML content, like "&" (unencoded).  They acknowledged that should
> be fixed, but they also said they could instead enclose all content with
> CDATA blocks.  That seems bizarre to me, but I'm not sure I can immediately
> come up with all the cogent arguments against that.  Can someone summarize
> specifically why you should NOT do that?

You often get this problem when people write XML as a string rather
than using a proper XML Writer...

For example:

xmlStr = "<foo>" + someVal  + "</foo>";
write(xmlStr);

The are several problems with this approach, one being that ampersands
won't be escaped properly.

The answer they usually go for is to replace all occurrences of & with
&_amp; but then you see double escaping  &_amp;amp; of character and
entity references.

Then you get the string &_amp;amp; in the result, which appears as
"&_amp;" in the browser, so they attached a post processing step to
convert "&_amp;amp;" to &_amp;

....and so on and so on.  (you also see these pre- and post-processing
steps to get around encoding issues)

The root cause of all of this, is that someone wrote XML as string
rather than using an XML Writer.    So I would suggest finding out how
they create the XML, and go from there.




-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member