[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Non-xml source documents

Subject: Re: Non-xml source documents
From: David Carlisle <davidc@xxxxxxxxx>
Date: Wed, 5 Jan 2005 15:28:00 GMT
read in html file
   XSL is designed expressly for transforming XML documents. You won't
   have any luck in using it to transform something that isn't XML. I
   usually find that Perl is a very handy programming language for
   working with text documents and I have often used it to reformat
   non-XML documents into XML for further work.
   -- 
   Charles Knell
   cknell@xxxxxxxxxx - email



The OP said he could use XSLT2 which means that you can use the
unparsed-text()
function to get the input file as a string and then the fairly extensive
unicode-aware Regexp handling of XSLT2 to transform this to XML.
The text string handling still isn't up to perl's power, although offset
against that is the ease of integration of the XML generation of the
output that you get from xslt2.

I use this technique here
http://www.dcarlisle.demon.co.uk/htmlparse.xsl
that will read in html file (as plain text) and parse it using regexp
and produce an xhtml file (after applying some hueristics to fix up teh
element heirarchy)

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.