[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: XML Convertor

  • From: "DuCharme, Robert" <DuCharmR@m...>
  • To: xml-dev@i...
  • Date: Wed, 22 Dec 1999 09:41:17 -0500

word perfect convertor
>I am currently looking out for converting Word Perfect, MS Word and ASCII
>files into XML. 
>So Far I was just able to find out only RTF to XML convertor, which uses
>omnimark technology.

Converting something to XML means converting it to a text file in which
start and end tags show the beginning and end of structural elements (and,
maybe storing certain pieces of information as attributes in the
start-tags). There has to be some way for the converter to identify the
beginning and end of these structural elements. Rick Geimer's Omnimark-based
rtf2xml (see http://www.omnimark.com/develop/contributed/) does this by
looking at RTF codes.

A program that reads proprietary binary formats (WordPerfect or MS Word) and
does this would be difficult enough that no one I know of has bothered--they
just save as RTF and either write something customized to convert that RTF
to their own DTD or use Rick's program and then convert its output to their
own DTD. WordPerfect and Word 2000 have some XML-related features, so you
might want to look at those. 

To convert an ASCII file to XML, you could put "<myDocument>" at the
beginning and "</myDocument>" at the end, but this wouldn't do you much
good. To put additional tags in places where they would be useful requires a
program that knows what to look for. People often use perl, python, awk,
etc. to write scripts that look for patterns in their input that give them
clues as to which tags should go where.

>Is there anything generalised which would take care of all (or most) types
>of Binary & ASCII files.

To find and identify the structure of the input, the processing program has
to know its structure intimately, so a generalized program that takes care
of all types of binary and ASCII files is impossible. Having spent too much
time studying RTF, I applaud Rick for studying it even harder so that others
wouldn't have to. It would be difficult to do any better.

Bob DuCharme       www.snee.com/bob       <bob@  
snee.com>  see www.snee.com/bob/xmlann for "XML:
The Annotated Specification" from Prentice Hall.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.