RE: Re: Any Doc to XML converter ?
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm produces very clean XML for me; in what sense is it "mostly garbage"? You're not thinking of the "save as HTML" or whatever that is built-in, are you? You can flip on all sorts of extra options with this tool that add more extra "garbage", but using the simple options faithfully represents the structure and does a good job with scenario #1 that you listed below. > -----Original Message----- > From: Peter Flynn [mailto:peter@xxxxxxxxxxx] > Sent: Wednesday, June 20, 2001 3:51 PM > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Subject: Re: Re: Any Doc to XML converter ? > > On Tue, 19 Jun 2001, Dmitri wrote: > > Bob DuCharme wrote: > > > > > In his latest 'XML Deviant' column in XML.com > > > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds > describes > > > and points to a recent thread on the topic. > > > > >From a recent MSDN article 'Export a Word Document to XML' by Kevin > McDowell > > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm) > > > > 'The XML output by this application is very straightforward and very > similar to the > > HTML output by Word itself, but it fully accounts for all styled text, > tables, and > > lists. ' > > Which may very well be true, but the output is largely garbage. > This whole discussion misses the major points: > > 1) Iff your Word document is formatted 100% exclusively with > named styles, robust conversion to meaningful XML is easily > possible with a number of packages, eg Enigma's DynaTag. > > 2) If your Word document uses arbitrary manual styling, no > amount of footling around with conversions is going to > produce anything other than an XML-syntax'd representation > of all the styles. You still have to undertake the hardest > part, which is interpreting all the styling cruft into some > meaningful markup. XSLT could certainly be used at this > stage. > > This assumes you do want meaningful markup. If all you need is > the XML representation of the manual styling, then there are > several solutions already discussed. > > It may be instructive that a someone last year wrote a short VB > script to turn any DOC file into XML, extracting all the style > info into a CSS stylesheet in a single pass...and it was written > on a laptop in the bus on the way to the airport after a > meeting. I'm sure it has long been superseded but this is not > rocket science. > > ///Peter > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format