[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: resolve html entities

Subject: Re: resolve html entities
From: Maximilian Gärber <max@xxxxxxxxxx>
Date: Mon, 31 Oct 2005 10:42:44 +0100
tagsoup html entities
Thanks for the suggestion.

I hoped this would be easier, since these are all "standard" html entities. I thought of two possible
approaches:


1.) the ugly one: do a string-replace

2.) get a fitting dtd/schema which maps these entities to unicode characters

Would either one be a good starting point?

Thanks,
Max



Michael Kay wrote:

I would suggest parsing the HTML using John Cowan's TagSoup parser. This
looks to the XSLT processor just like an XML parser, so you can probably
integrate it directly - depending on the XSLT processor that you are using.

Michael Kay
http://www.saxonica.com/




-----Original Message-----
From: Maximilian Gdrber [mailto:max@xxxxxxxxxx] Sent: 31 October 2005 08:40
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: resolve html entities


Hi,

I know this is a common question but I could not find a specific answer to this:

I am exporting texts from a database that contains html markup. Now I need to transform
the html to something usable in a DTP application.


The tags are not the problem because I am only allowing a subset of html but the html entities
(german umlauts, special characters) would need to be transformed to plain Unicode (UTF-8)
characters.


What is the best way to achieve this?

Thanks,

Max Gaerber

Current Thread

Back To School Sale!

Save 30% off all Stylus Studio 2008 Products when you purchase from our Online Shop.

Offer ends August 31, 2008.

Coupon Code
TRTY-C4JV-OFF

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.