[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: One texdocument in and several xmldocuments out?

Subject: RE: One texdocument in and several xmldocuments out?
From: "Stuart Celarier" <stuart@xxxxxxxxxxx>
Date: Mon, 6 May 2002 09:33:20 -0700
convert .doc to docbook
You can convert a Word document to HTML using File / Save As... and
selecting HTML or Filtered HTML. The difference between these two is
HTML preserves all of Word's information such as <span> tags to mark
spelling and grammar issues, whereas Filtered HTML drops the
Word-specific tags. Then follow the advice already provided here (e.g.,
Tidy) to ensure that the HTML is well-formed XML.

Cheers,
Stuart

-----Original Message-----
From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
[mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of Robert
Koberg
Sent: Monday, May 06, 2002 06:43
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re:  One texdocument in and several xmldocuments out?

Hi,

Zack Brown wrote:

>On Mon, May 06, 2002 at 01:28:51PM +0200, Tove Nilstun wrote:
>  
>
>>Hi
>>
>>I am a total beginner when it comes to XML, but in order to start
working
>>with it, there are two things I need to sort out.
>>
>>I have a user guide (written in MS Word) with both text and pictures.
I
>>would like to 1. convert this document to several xml documents, one
per
>>headline and 2. create an additional xml file containing an index of
the
>>files created in step one.
>>
>>Is this possible?
>>    
>>
>
>Absolutely. Just create one XSLT file for each output file you desire.
>Then run the XML through your parser once for each XSLT file you've
>created
>

You do not need an XSLT file for each page.

First you have to get the MSWord doc into XML. THere are a few products 
out there that convert Word to docbook or some other XML. A neat trick 
we found when building our MSIE-based editor was that you could paste a 
MSWord doc into an element that has contentEditable="true". IE converts 
this to HTML. We use JS to convert it to XML on the client, but you 
could use Tidy to get well-formed HTML (XML). Then hopefully there are 
clean separations to indicate where a new page should start. 
Apply-templates (loop) on each page division and (you can) use extension

functions built into Saxon or Xalan to create multiple output documents 
from one source.

best,
-Rob


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.