[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Christopher R. Maden" <crism@l...>
  • To: xml-dev@l...
  • Date: Thu, 21 Sep 2000 12:49:20 -0700

At 15:20 21-09-2000 +0100, Dylan Walsh wrote:
>Hi. I am working on preparing examples of HTML, created by professional web
>designers, for use with XSL transformations. Obviously the markup needs to
>be made well-formed for this purpose. I am familiar with Tidy from the W3C,
>however that utility goes beyond what we need, as it makes changes to
>conform to their standards. This unfortunately results in visual anomalies
>in the output.
>
>I understand the arguements behind XHTML etc., however the markup we are
>given is designed to work look good with older browsers, on different
>platforms. Is there any software out there that converts HTML to be
>well-formed XML, but does not make changes beyond that, e.g. to obey the
>HTML or XHTML standards?

Is the HTML valid SGML (i.e., complies with one of the HTML DTDs)?  If so, 
you can use James Clark's sx (<URL:http://www.jclark.com/sp/>), which will 
do a straight SGML-to-XML translation with no knowledge of HTML's semantics.

If the HTML is tag soup, then you may be SOL; try Perl with the 
HTML::Parser module, or something.

-Chris
--
Christopher R. Maden, Senior XML Analyst, Lexica LLC
222 Kearny St., Ste. 202, San Francisco, CA 94108-4510
+1.415.901.3631 tel./+1.415.477.3619 fax
<URL:http://www.lexica.net/> <URL:http://www.oreilly.com/%7Ecrism/>


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member