[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Support For Automatic Thai Word Breaking In XSL-FO
I have to eventually support the composition of Thai documents. The primary challenge I see there is doing automatic word breaking of Thai. As I understand it, the Thai language does not have a well-defined notion of word and therefore Thai as normally written may not have enough break points to allow lines to be properly flowed. In my research into the issue I've found some software (written for TeX) that does automatic line breaking but I didn't find anything that had been integrated with any XSLT or XSL-FO processors. As far as I can discover, MS Word is the main non-TeX tool that provides acceptable Thai word breaking. My question: has anybody integrated any Thai word breaking algorithms into an XSL context? In looking at the free code that's out there, it looks like it wouldn't be too hard to extend Saxon, for example, to apply the word breaking algorithm to text nodes when xml:lang="th". It's not enough to do a pre-process on the XML document using the existing code because the Thai characters may be represented as numeric character references or character entities and the existing code expects some form of Unicode or Thai code page encoding. Thus, the algorithms would need to be applied post-parse. Thanks, Eliot Kimber ISOGEN International, LLC XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|