[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: CDATA vs. Escaped characters
Beyer,Nathan wrote: > Are there any advantages, disadvantages or limitations in using CDATA > sections as opposed to escaped character sequences? Or Vice Versa? A couple of things I can think of offhand: 1. CDATA sections cannot contain "]]>" (the sequence that marks the end of the CDATA section). If you can't guarantee that your data won't contain that sequence, it might be more trouble than it's worth to make the necessary workarounds. 2. Consider parsing differences and the effect on memory: Using <doc>foo<![CDATA[bar]]>baz</doc>, the DOM gives you 3 children of element 'doc' ...or in SAX, 3 separate calls to characters(): % python >>> from xml.dom.minidom import parseString >>> d = parseString('<doc>foo<![CDATA[bar]]>baz</doc>') >>> d.childNodes[0].childNodes [<DOM Text node "foo">, <DOM Text node "bar">, <DOM Text node "baz">] And using <doc>foobarbaz</doc, you get 5 children: >>> d = parseString('<doc>foobarbaz</doc>') >>> d.childNodes[0].childNodes [<DOM Text node "foo">, <DOM Text node "b">, <DOM Text node "a">, <DOM Text node "r">, <DOM Text node "baz">] So that's 5 nodes using escaped text, vs 3 when using a CDATA section. Obviously the ratios will depend on your actual data; I'm just giving an example. On xsl-list we often scold people who try to misuse CDATA sections with disable-output-escaping, telling them that CDATA sections are purely lexical and that evidence of the original document's 'physicality' never makes its way across the parsing divide. This is not entirely true, as you can see, although in the case of XSLT processing, the XPath/XSLT data model swallows up the difference. 3. If human readability/editability is important, there are times when having a few large CDATA sections can be quite helpful. 4. Simple byte counts might be an issue. Using a CDATA section can cut down the space required to store or transmit those portions of a document that would otherwise be riddled with numerous escaped characters. On the other hand, many small, unnecessary CDATA sections can add unnecessary bulk to the size of the document. There are no easy answers. Just some food for thought. Personally, I wouldn't mind seeing CDATA sections just go away. - Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ ^ (yes, i am looking for work)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|