|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Need a tool that converts HTML into well-formed XML,and nothing more
> >conform to their standards. This unfortunately results in visual anomalies > >in the output. I would be interested in what problems you encounter. I have used Tidy quite exclusivly, and have never had a problem. As far as I can see apart from adding a namespace declaration and a public identifier, it does convert the original HTML to 'pure' XML. And you can always write a simple script to strip the namespace declaration and the identifier. Frank ----- Original Message ----- From: "Matt Sergeant" <matt@s...> To: "Christopher R. Maden" <crism@l...> Cc: <xml-dev@l...> Sent: Thursday, September 21, 2000 5:56 PM Subject: Re: Need a tool that converts HTML into well-formed XML, and nothing more. > On Thu, 21 Sep 2000, Christopher R. Maden wrote: > > > At 15:20 21-09-2000 +0100, Dylan Walsh wrote: > > >Hi. I am working on preparing examples of HTML, created by professional web > > >designers, for use with XSL transformations. Obviously the markup needs to > > >be made well-formed for this purpose. I am familiar with Tidy from the W3C, > > >however that utility goes beyond what we need, as it makes changes to > > >conform to their standards. This unfortunately results in visual anomalies > > >in the output. > > > > > >I understand the arguements behind XHTML etc., however the markup we are > > >given is designed to work look good with older browsers, on different > > >platforms. Is there any software out there that converts HTML to be > > >well-formed XML, but does not make changes beyond that, e.g. to obey the > > >HTML or XHTML standards? > > > > Is the HTML valid SGML (i.e., complies with one of the HTML DTDs)? If so, > > you can use James Clark's sx (<URL:http://www.jclark.com/sp/>), which will > > do a straight SGML-to-XML translation with no knowledge of HTML's semantics. > > > > If the HTML is tag soup, then you may be SOL; try Perl with the > > HTML::Parser module, or something. > > If HTML::Parser will work on the malformed HTML, you can use XML::PYX, > which comes with a pyxhtml script, so you can go: > > pyxhtml <file> | pyxw > <newfile> > > pyxhtml uses HTML::Parser (actually HTML::TreeBuilder, but ultimately > refers to HTML::Parser). > > I believe the Python PYX tools have a similar facility too. > > -- > <Matt/> > > Fastnet Software Ltd. High Performance Web Specialists > Providing mod_perl, XML, Sybase and Oracle solutions > Email for training and consultancy availability. > http://sergeant.org | AxKit: http://axkit.org >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








