[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Advice on dictionary conversion

Subject: Re: Advice on dictionary conversion
From: Ciarán Ó Duibhín <ciaran@xxxxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Jan 2011 18:01:15 -0000
Re:  Advice on dictionary conversion
Grateful thanks to all who responded to my enquiry on this subject. I am encouraged to persevere.

Several people advised to do the conversion as a series of small steps, and I will keep that in mind. However, I am developing my conversion on a non-final version of a part of the dictionary (the letter D), so it must be re-runnable on the other letters, as well as on the final version of D!

I will see how far I can get with a simple series of "template match" operations, before I think about anything more advanced. Whatever can't be done that way may be feasible manually; and in any case I expect to do manual tidying-up of odd cases and data errors which would not be covered by a programmed conversion.

Although there are plenty of webpages about XSLT, it is difficult to find the information you want. For example, I spent most of a day putting this together:
<xsl:template match="my_element[@font-weight='bold' and contains(preceding-sibling::node()[1], '=')]">
But I suppose there are a few things in it that I will not need to learn a second time.

There were some interesting comments on the initial doc/rtf to xml conversion. I tried several, and here are some figures, for the letter D.
.doc file 772 KB
Word save as rtf 711 KB
Word save as txt 286 KB
Word 2003 save as xml 4592 KB
Novosoft rtf-to-fo 2624 KB (http://www.rtf-to-xml.com/)
Walter rtf-to-xml 400 KB (http://thewalter.net/stef/software/rtfx/)
Walter rtf-to-xml with -p parameter 536 KB
Yawc doc-to-xml 433 KB (http://www.yawcpro.com/)
Unfortunately (for me) the Walter and Yawc conversions both discarded small-caps info. I haven't noticed anything important discarded in the Novosoft conversion, but it is still a lot smaller than the MS one.

Incidentally, the original Word files make no use of styles, but are all in "normal".

Many thanks again,
Ciaran S Duibhmn

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.