[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: ANN: XQEngine 0.61
I'd looked at BreakIterator way back when it was still at Taligent. I can't recall why I chose not to go with it at the time (efficiency concerns?), but it looks worth revisiting. Thanks for the suggestion. Howard -----Original Message----- From: J.Pietschmann [mailto:j3322ptm@y...] Sent: Sunday, December 07, 2003 2:00 AM To: Howard Katz; xml-dev@l... Subject: Re: ANN: XQEngine 0.61 Howard Katz wrote: > All my word breaking is delegated to a class called (surprise) WordBreaker, > which implements a very simple algorithm that uses Java's > Character.isLetterOrDigit() function to determine where words begin and end. > This works well for Western languages. If you want to optimize for a > non-Western language, you can override WordBreaker and implement word > breaking in whatever way makes sense for your particular language or > languages of interest. That's the theory at any rate ... Have a look at java.text.BreakIterator, which helps to implement line and word breaking along the Unicode standards (most notably UTR14). J.Pietschmann ----------------------------------------------------------------- The xml-dev list is sponsored by XML.org <http://www.xml.org>, an initiative of OASIS <http://www.oasis-open.org> The list archives are at http://lists.xml.org/archives/xml-dev/ To subscribe or unsubscribe from this list use the subscription manager: <http://lists.xml.org/ob/adm.pl>
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|