[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Word and XML (was: XML standards coherency and so forth)

  • From: Tony McDonald <tony.mcdonald@n...>
  • To: xml-dev@i...
  • Date: Thu, 11 Feb 1999 08:54:58 +0000

pntext
>> From: "Rick Jelliffe" <ricko@a...>
>> Date: Sun, 24 Jan 1999 16:15:36 +1100
>> Subject: Re: Word and XML (was: XML standards coherency and so forth)
>>
>> From: Biron,Paul V <Paul.V.Biron@k...>
>
> [snip]
> Wow!  I've been so busy lately that I haven't been able to keep up with
> XML-DEV and had no idea my "innocent" post on Word and HTML/XML had been so
> long lived!
>
> [snip]
>
> In truth, we've spent a great deal of time writting tools (a big daisy chain
> of FrontPage v1.1 -> hand-roled perl script 1 -> hand-roled perl script 2 ->
> etc.) just to HTML output from Word '97.  What has made this all the more
> fustrating for us is that the HTML is not really what we want in the end.
> We just want a "clean" HTML version so that the transformation to the XML
> DTD that we're interested in is "easier".  The BOLD and ITALIC that our
> authors see actually represent more "semantic" XML elements, e.g., <allergy>
> and <medication>.  Such is life.

I don't know how far down this route you've gone Byron, but can I 
suggest using rtf2xml (http://www.sesha.com/omlette/rtf2xml/) - it 
uses the limited version of Omnimark http://www.omnimark.com as an 
engine and does a very good job of RTF -> XML conversion.

It uses Word paragraph and character styles to convert the RTF into 
well-formed and valid XML, eg

<p stylename="List Bullet" 
color="1"><pntext>&#183;&tab;</pntext><string color="1">Almanack 
&amp; Administration Information </string><string charstyname="URL" 
fontsize="20" italic="on" 
color="1">http://nme.ncl.ac.uk/almanack/</string><string color="1"> 
</string></p>

(you can see that additional, formatting, information that was in the 
original Word document is provided too).

I then pass this through another omnimark program to get to (be aware 
that it's perfectly possible to create invalid and badly-formed XML 
at this stage!!);
...
<subsubsection>
<titleinfo class='subsubsection' level='3'>
<title class='subsubsection'>On-line Resources</title>
<sg_title>Organisation of Tissues</sg_title>
</titleinfo>
<subheading>Student Support and Tutoring (Computer Mediated 
Communication) Tools:</subheading>
...
<item><text>Almanack &amp; Administration Information
 </text><a xml:link='simple' 
href='http://nme.ncl.ac.uk/almanack/'>http://nme.ncl.ac.uk/almanack/</ 
a><text>  </text></item>
...
</subsubsection>

>From this XML, the conversion to another HTML (or RTF etc.) format is 
(relatively) easy.

I tried using the 'HTML' that Word 'emits' and had to have a lie 
down...this scheme of using RTF and well marked up original documents 
seems to be helping us along in our up-conversion process (whoever 
chose that term knew what they were talking about - it's like 
climbing, rather inching up, a vertical cliff face going backwards 
with no ropes...great fun)

hth
tone
------
Dr Tony McDonald,  FMCC, Networked Learning Environments Project
The Medical School, Newcastle University Tel: +44 191 222 5888
Fingerprint: 3450 876D FA41 B926 D3DD  F8C3 F2D0 C3B9 8B38 18A2

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.