RE: extracting data in CDATA block of a XML document
Good answer Mike, but probably not that useful. I had the same answer some months ago from someone else... Sometimes there is valid XML in the CDATA section. That was the case in my situation. The solution in that case was to write an extension function (we're using Xalan-C++) to extract the contents of the specified node within the CDATA section passed as a parameter. Annoying, roundabout, kludgy, but serviceable. c -----Original Message----- From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Mike Brown Sent: 23 August 2002 17:10 To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx Subject: Re: extracting data in CDATA block of a XML document Srinivas Ch wrote: > Now I need to extract all the elements between the > <![CDATA[ and ]]> and write it into a new xml file. This is a FAQ, but we all like to give long-winded answers rather than point you to www.dpawson.co.uk. The other answers to your question so far have been trying to tell you: 1. What you want is not possible with XSLT, at least not in a way that is reliable. We aren't going to tell you the unreliable way because you need to approach this problem differently if you don't want to get burned. 2. It was a poor design decision to embed structured markup in the character data content of an XML element. Character data is by definition NOT MARKUP. 3. CDATA sections are a convenience for document authors and are relevant for input only. They just keep you from having to escape "<" and "&" in character data. It means "this looks like markup but it isn't really". The idea is that <foo><![CDATA[<bar/>]]></foo> and <foo><bar/></foo> mean exactly the same thing: An element named 'foo' containing the 6 characters '<bar/>'; NOT an element named 'foo' containing an empty element named 'bar'. If you wanted the latter, you'd have written <foo><bar/></foo>. In XPath/XSLT you deal with a node tree that is set up quite similarly: element 'foo' in no namespace | |__text '<bar/>' The text node is going to be what you see there, regardless of whether you used a CDATA section in the original document. Since you want XML output, your question is how do you produce a result tree that looks like this element 'bar' in no namespace And the answer is, that's pretty darn difficult because you would have to mimic the duties of an XML parser, tearing apart the string in the text node in order to build the right nodes in the result tree. The workaround that some idiot is going to suggest with a "hey it works for me!" but not realizing how unportable it is, is going to involve leaving the text node unchanged but flagging it as an exceptional case for unmodified serialization, so that it will be emitted as a string of what could very well be total garbage in the middle of proper, well-formed XML. And that's assuming you're serializing the result tree, which isn't always a good assumption (in a browser-based processor you're likely to be passing it as a DOM). - Mike ____________________________________________________________________________ mike j. brown | xml/xslt: http://skew.org/xml/ denver/boulder, colorado, usa | resume: http://skew.org/~mike/resume/ XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format