RE: resolve html entities
I would suggest parsing the HTML using John Cowan's TagSoup parser. This looks to the XSLT processor just like an XML parser, so you can probably integrate it directly - depending on the XSLT processor that you are using. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Maximilian Gdrber [mailto:max@xxxxxxxxxx] > Sent: 31 October 2005 08:40 > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: resolve html entities > > Hi, > > I know this is a common question but I could not find a > specific answer > to this: > > I am exporting texts from a database that contains html markup. Now I > need to transform > the html to something usable in a DTP application. > > The tags are not the problem because I am only allowing a > subset of html > but the html entities > (german umlauts, special characters) would need to be transformed to > plain Unicode (UTF-8) > characters. > > What is the best way to achieve this? > > Thanks, > > Max Gaerber
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format