[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: html to xml

Subject: Re: html to xml
From: David Carlisle <davidc@xxxxxxxxx>
Date: Fri, 27 Oct 2000 11:02:23 GMT
xml grabbing html
> So the conclusion
> is, I guess, "clean up the HTML minimally even before running tidy".
> I was afraid someone would say that. My problem is that the task is to
> convert our existing web pages (6196 documents, at last count) to (TEI DTD

I wasn't sure quite what your context was.
Surely grabbing floating PCDATA and sticking it in a paragraph element
is something easily done in the post tidy XSL transformation to TEI.

Grabbing html section heads into TEI/docbook style section containers is
always a pain but you can do it in XSL with the usual "grouping"
techniques. It's made a bit easier if you know that the H? elements all
appear in "correct" sequence, not jumping from h1 to h3. If you use
ISO-HTML DTD then the SGML parser (eg sx ) will add any missing section
levels automagically if you set the appropriate parameter entity.

David




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.