[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Word and XML (was: XML standards coherency and so forth)
>> From: "Rick Jelliffe" <ricko@a...> >> Date: Sun, 24 Jan 1999 16:15:36 +1100 >> Subject: Re: Word and XML (was: XML standards coherency and so forth) >> >> From: Biron,Paul V <Paul.V.Biron@k...> > > [snip] > Wow! I've been so busy lately that I haven't been able to keep up with > XML-DEV and had no idea my "innocent" post on Word and HTML/XML had been so > long lived! > > [snip] > > In truth, we've spent a great deal of time writting tools (a big daisy chain > of FrontPage v1.1 -> hand-roled perl script 1 -> hand-roled perl script 2 -> > etc.) just to HTML output from Word '97. What has made this all the more > fustrating for us is that the HTML is not really what we want in the end. > We just want a "clean" HTML version so that the transformation to the XML > DTD that we're interested in is "easier". The BOLD and ITALIC that our > authors see actually represent more "semantic" XML elements, e.g., <allergy> > and <medication>. Such is life. I don't know how far down this route you've gone Byron, but can I suggest using rtf2xml (http://www.sesha.com/omlette/rtf2xml/) - it uses the limited version of Omnimark http://www.omnimark.com as an engine and does a very good job of RTF -> XML conversion. It uses Word paragraph and character styles to convert the RTF into well-formed and valid XML, eg <p stylename="List Bullet" color="1"><pntext>·&tab;</pntext><string color="1">Almanack & Administration Information </string><string charstyname="URL" fontsize="20" italic="on" color="1">http://nme.ncl.ac.uk/almanack/</string><string color="1"> </string></p> (you can see that additional, formatting, information that was in the original Word document is provided too). I then pass this through another omnimark program to get to (be aware that it's perfectly possible to create invalid and badly-formed XML at this stage!!); ... <subsubsection> <titleinfo class='subsubsection' level='3'> <title class='subsubsection'>On-line Resources</title> <sg_title>Organisation of Tissues</sg_title> </titleinfo> <subheading>Student Support and Tutoring (Computer Mediated Communication) Tools:</subheading> ... <item><text>Almanack & Administration Information </text><a xml:link='simple' href='http://nme.ncl.ac.uk/almanack/'>http://nme.ncl.ac.uk/almanack/</ a><text> </text></item> ... </subsubsection> >From this XML, the conversion to another HTML (or RTF etc.) format is (relatively) easy. I tried using the 'HTML' that Word 'emits' and had to have a lie down...this scheme of using RTF and well marked up original documents seems to be helping us along in our up-conversion process (whoever chose that term knew what they were talking about - it's like climbing, rather inching up, a vertical cliff face going backwards with no ropes...great fun) hth tone ------ Dr Tony McDonald, FMCC, Networked Learning Environments Project The Medical School, Newcastle University Tel: +44 191 222 5888 Fingerprint: 3450 876D FA41 B926 D3DD F8C3 F2D0 C3B9 8B38 18A2 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|