|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Trigram tables
From: "John Cowan" <jcowan@r...> > Richard Tobin scripsit: > > > lost is to the hishart commusted matimany he usider wit darly he the Richard has never made more sense :-) > It's fairly easy to spoof such tables if you are allowed to construct > the text and it need not make sense. Constructing well-formed French or > Indonesian or something that spoofs the tables will not be so simple. Of course, one use for trigram tables is for OCR error detection without the need for a full dictionary. The most common errors of English OCR are confusing 0/O, 1/l, i/j and perhaps 5/S. Trigram tables tend to be able to find those kinds of errors, at different implementation complexity than a spell check. Another use is for trigam indexing, which allows some kinds of faster jumping within documents. (For Chinese, bigram indexing is regarded as one of the most efficient ways to index a document.) Knowledge of the occurrences of trigrams lets you pick a suitable algorithms. There is a lot of academic work on character occurrences, but little of it trickles down in a useable form (as code or tables) to the developer community. Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








