[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: &anytype; to RTF converter. Need help!

  • From: "Rick Jelliffe" <ricko@a...>
  • To: <xml-dev@i...>
  • Date: Thu, 20 May 1999 23:37:34 +1000

python rtf

From: Ing. Cesar Bonavides M. <cbona18@c...>
 >I'd like to know if there is a converter from:
>
>TeX, or LaTex, or PostScript or PDF     to   RTF.
>
>I know there's something out there, but I don't have time to find it
out.

To go from PDF to RTF through you can use

* Magellen to go from PDF to HTML
* Dave Ragget's tidy to go from HTML to XHTML
* then write a little OmniMark or Perl or Python script to go from XHTML
to RTF

All these tools are free. Do not ask me for references.

Unfortunately, you may have to locate and learn 3 tools and 4 languages
along the way.

Magellen does not give you text in lines: just words with absolute XY
coordinates. So
you can index the words, but you cannot really edit them. If the pages
are really simple that you can try to figure out lines in some text
processing language.  You may need to try different revisions of the
application that generated the code in order to find the one that puts
out the best PDF.

More realistically, you could try to find a Word Processing package that
accepts HTML and understands absolute positioning attributes: I doubt if
Word does but perhaps FrameMaker might.

Another possibility is to divide your postscript into single pages, and
then use Adobe Illustrator (or is it FreeHand that can read in HTML): it
is remotely possible that Illustrator (or is it FreeHand) can save as
RTF.

Yet another possibility that should not be ignored is to keep the
postscript files, but use
Magellen to extract a good word index for that page (Xeros InXight have
a product for this too) , and perhaps use a scanner to get the text into
lines if people need text. Then you make some metadata for each document
(Dublin Core). This gives you:
* formatted pages
* indexed data
* unformatted text for people who want to extract parts of the data into
other documents
* metadata for finding aids.

That is not nearly as good as an XML document, but might give people a
lot of what they need.  If you need to put it into RTF, import as text,
get  a human to mark it up, and export as RTF.


Rick Jelliffe


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.