[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: html to xml
> So the conclusion > is, I guess, "clean up the HTML minimally even before running tidy". > I was afraid someone would say that. My problem is that the task is to > convert our existing web pages (6196 documents, at last count) to (TEI DTD I wasn't sure quite what your context was. Surely grabbing floating PCDATA and sticking it in a paragraph element is something easily done in the post tidy XSL transformation to TEI. Grabbing html section heads into TEI/docbook style section containers is always a pain but you can do it in XSL with the usual "grouping" techniques. It's made a bit easier if you know that the H? elements all appear in "correct" sequence, not jumping from h1 to h3. If you use ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section levels automagically if you set the appropriate parameter entity. David XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|