Re: Re: Any Doc to XML converter ?
I don't enjoy defending MS, but in all fairness: > But from my perspective that is grossly misleading. The HTML that > Word exports is trash full of Microsoft specific extensions that > most web sites don't want. So if they're XML is similar, that's > not saying much. Kevin McDowell's MSDN article admits that. Though I don't see why "Microsoft specific extensions" (= namespaces??) in XML should be any worse than anybody else's duly defined namespaces and/or elements (e.g. those used in OpenOffice XML output). And they do their job: full two-way tripping between Word and HTML without loss of information or formatting. Users wanted that, and they've now got it. It could be done more cleanly (OpenOffice does so) but what couldn't? The trash in the "save as HTML" output from Word 2K lies in crass goofs like unquoted attributes etc etc. The whole point is that McDowell describes a method of getting much better results by another route. I personally don't really want to go down that route, but I'm glad to see it on offer. Like Bob, I can't get the example code to work reliably on anything other than the example data, but that's not all that unusal. With a bit more work, the routines in the cited article could probably be made to output XML that was more or less as you wished. So credit where it's due, I say... Michael --------------------------------------------------------- Michael Beddow http://www.mbeddow.net/ XML and the Humanities page: http://xml.lexilog.org.uk/ --------------------------------------------------------- ----- Original Message ----- From: <sara.mitchell@xxxxxxxxx> To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx> Sent: Tuesday, June 19, 2001 6:35 PM Subject: RE: Re: Any Doc to XML converter ? > Well, I understand why Microsoft thinks this (although I violently > disagree): > > > From a recent MSDN article "Export a Word Document to XML" by > > Kevin McDowell > > (http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm) > > > > "The XML output by this application is very straightforward > > and very similar to the > > HTML output by Word itself, but it fully accounts for all > > styled text, tables, and > > lists. " > > > > and > > > > "Conclusion > > This solution provides a starting point to build an XML > > parser for Word documents. > > In addition to the XML functionality, it discusses how to > > build custom objects to > > handle sequential instances of all styles and graphics and > > how to loop through > > tables and lists. Remember, documents shouldn't be converted > > to XML merely for the > > sake putting them in XML. The best document to convert to XML > > is one that makes use > > of styles and will be reused in other ways." > > > And it --completely-- ignores something even more fundamental. Which > is that most people using Word to create documents could care less > about good structure, consistency, and much of the modelling that > makes information truly reusable. If you start with trash, guess what > you end up? So the XML from Word isn't going to get people the benefit > they think (and that this article implies). > > Sara Mitchell > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format