[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Trigram tables


trigram index
From: "John Cowan" <jcowan@r...>

> Richard Tobin scripsit:
> 
> > lost is to the hishart commusted matimany he usider wit darly he the

Richard has never made more sense :-)
 
> It's fairly easy to spoof such tables if you are allowed to construct
> the text and it need not make sense.  Constructing well-formed French or
> Indonesian or something that spoofs the tables will not be so simple.

Of course, one use for trigram tables is for OCR error detection
without the need for a full dictionary.  The most common errors
of English OCR are confusing  0/O, 1/l, i/j and perhaps 5/S.  Trigram tables 
tend to be able to find those kinds of errors, at different implementation complexity 
than a spell check. 

Another use is for trigam indexing, which allows some kinds of faster jumping
within documents.  (For Chinese, bigram indexing is regarded as one of the
most efficient ways to index a document.) Knowledge of the occurrences
of trigrams lets you pick a suitable algorithms. 

There is a lot of academic work on character occurrences, but little of it
trickles down in a useable form (as code or tables) to the developer community.

Cheers
Rick Jelliffe


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.