[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Re: Any Doc to XML converter ?

Subject: RE: Re: Any Doc to XML converter ?
From: "Joshua Allen" <joshuaa@xxxxxxxxxxxxx>
Date: Wed, 20 Jun 2001 17:41:07 -0700
word doc to xml representation
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm

produces very clean XML for me; in what sense is it "mostly garbage"?
You're not thinking of the "save as HTML" or whatever that is built-in,
are you?  You can flip on all sorts of extra options with this tool that
add more extra "garbage", but using the simple options faithfully
represents the structure and does a good job with scenario #1 that you
listed below.


> -----Original Message-----
> From: Peter Flynn [mailto:peter@xxxxxxxxxxx]
> Sent: Wednesday, June 20, 2001 3:51 PM
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re:  Re: Any Doc to XML converter ?
> 
> On Tue, 19 Jun 2001, Dmitri wrote:
> > Bob DuCharme wrote:
> >
> > > In his latest 'XML Deviant' column in XML.com
> > > (http://www.xml.com/pub/a/2001/06/13/deviant.html), Leigh Dodds
> describes
> > > and points to a recent thread on the topic.
> >
> > >From a recent MSDN article 'Export a Word Document to XML' by Kevin
> McDowell
> > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm)
> >
> > 'The XML output by this application is very straightforward and very
> similar to the
> > HTML output by Word itself, but it fully accounts for all styled
text,
> tables, and
> > lists. '
> 
> Which may very well be true, but the output is largely garbage.
> This whole discussion misses the major points:
> 
>   1) Iff your Word document is formatted 100% exclusively with
>      named styles, robust conversion to meaningful XML is easily
>      possible with a number of packages, eg Enigma's DynaTag.
> 
>   2) If your Word document uses arbitrary manual styling, no
>      amount of footling around with conversions is going to
>      produce anything other than an XML-syntax'd representation
>      of all the styles. You still have to undertake the hardest
>      part, which is interpreting all the styling cruft into some
>      meaningful markup. XSLT could certainly be used at this
>      stage.
> 
> This assumes you do want meaningful markup. If all you need is
> the XML representation of the manual styling, then there are
> several solutions already discussed.
> 
> It may be instructive that a someone last year wrote a short VB
> script to turn any DOC file into XML, extracting all the style
> info into a CSS stylesheet in a single pass...and it was written
> on a laptop in the bus on the way to the airport after a
> meeting. I'm sure it has long been superseded but this is not
> rocket science.
> 
> ///Peter
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.