[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: XML Convertor
>I am currently looking out for converting Word Perfect, MS Word and ASCII >files into XML. >So Far I was just able to find out only RTF to XML convertor, which uses >omnimark technology. Converting something to XML means converting it to a text file in which start and end tags show the beginning and end of structural elements (and, maybe storing certain pieces of information as attributes in the start-tags). There has to be some way for the converter to identify the beginning and end of these structural elements. Rick Geimer's Omnimark-based rtf2xml (see http://www.omnimark.com/develop/contributed/) does this by looking at RTF codes. A program that reads proprietary binary formats (WordPerfect or MS Word) and does this would be difficult enough that no one I know of has bothered--they just save as RTF and either write something customized to convert that RTF to their own DTD or use Rick's program and then convert its output to their own DTD. WordPerfect and Word 2000 have some XML-related features, so you might want to look at those. To convert an ASCII file to XML, you could put "<myDocument>" at the beginning and "</myDocument>" at the end, but this wouldn't do you much good. To put additional tags in places where they would be useful requires a program that knows what to look for. People often use perl, python, awk, etc. to write scripts that look for patterns in their input that give them clues as to which tags should go where. >Is there anything generalised which would take care of all (or most) types >of Binary & ASCII files. To find and identify the structure of the input, the processing program has to know its structure intimately, so a generalized program that takes care of all types of binary and ASCII files is impossible. Having spent too much time studying RTF, I applaud Rick for studying it even harder so that others wouldn't have to. It would be difficult to do any better. Bob DuCharme www.snee.com/bob <bob@ snee.com> see www.snee.com/bob/xmlann for "XML: The Annotated Specification" from Prentice Hall. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|