[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Handling CDATA element
On 2/9/06, Thorsten Scherler <thorsten@xxxxxxxxxx> wrote: > Hi all, > > I have a question regarding the CDATA element. *cough* not an element *cough* Look in the archives of this list for CDATA and I'm sure you'll find plenty of people dealing with this problem one way or another. > > My problem is the following. I have a rss feed like: *shudders* embedded html in RSS is always a pain. Of course, most newsreaders would probably burst into flames if you used namespaces or the like. (Can't remember if that's even allowed. Some sites have multiple feed types, bless them). > That looses the markup information but result in well-formed markup. I > prefer well-formed over well-presented, but best would be both. ;-) In other words, you want to have your cake and eat it too. Well, there's no good way to do this in XSLT as far as I know. It requires the input to be well-formed. You could attempt some odd several pass solution. The exact method would depend on the quality of the html. If you can't trust the html to even be corrct, you could generate an in-between format that clearly marks the html section. Then create several html files from them. Convert the html files using something like tidy, then re-assemble the files into one file. It would be a pain though. If you think the input is mostly well-formed sgml you can skip the tidy part, have it generate the inbetween format as a smgl file, and use a converter to make that XML, then rerun it through a XSLT processor. This will take care of things like <br> and convert them to <br />. Otherwise you'll need to create an sgml parser in XSLT. Good luck. Jon Gorman
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|