[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Support For Automatic Thai Word Breaking In XSL-FO

Subject: Support For Automatic Thai Word Breaking In XSL-FO
From: "W. Eliot Kimber" <eliot@xxxxxxxxxx>
Date: Tue, 29 Jan 2002 09:36:32 -0600
xml lang thai
I have to eventually support the composition of Thai documents. The
primary challenge I see there is doing automatic word breaking of Thai.
As I understand it, the Thai language does not have a well-defined
notion of word and therefore Thai as normally written may not have
enough break points to allow lines to be properly flowed. In my research
into the issue I've found some software (written for TeX) that does
automatic line breaking but I didn't find anything that had been
integrated with any XSLT or XSL-FO processors. As far as I can discover,
MS Word is the main non-TeX tool that provides acceptable Thai word
breaking.

My question: has anybody integrated any Thai word breaking algorithms
into an XSL context?

In looking at the free code that's out there, it looks like it wouldn't
be too hard to extend Saxon, for example, to apply the word breaking
algorithm to text nodes when xml:lang="th". It's not enough to do a
pre-process on the XML document using the existing code because the Thai
characters may be represented as numeric character references or
character entities and the existing code expects some form of Unicode or
Thai code page encoding. Thus, the algorithms would need to be applied
post-parse.

Thanks,

Eliot Kimber
ISOGEN International, LLC

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.