Re: html to xml
> So the conclusion > is, I guess, "clean up the HTML minimally even before running tidy". > I was afraid someone would say that. My problem is that the task is to > convert our existing web pages (6196 documents, at last count) to (TEI DTD I wasn't sure quite what your context was. Surely grabbing floating PCDATA and sticking it in a paragraph element is something easily done in the post tidy XSL transformation to TEI. Grabbing html section heads into TEI/docbook style section containers is always a pain but you can do it in XSL with the usual "grouping" techniques. It's made a bit easier if you know that the H? elements all appear in "correct" sequence, not jumping from h1 to h3. If you use ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section levels automagically if you set the appropriate parameter entity. David XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format