|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: English sentences, was: Re: Announce: XML Sc
From: "Jonathan Borden" <jborden@a...> > The issue of detection of human language, on the other hand, is one that > interests me. François Yergeau had a paper on this at a conference. Probably Robin Cover's site has the reference. He told me it was quite possible, but of course it must depend on the document size to some extent. See http://www.alis.com/castil/silc/?AlisTargetHost=http://www.alis.com:8080 for the commercialization. I have just been looking for public domain tables giving the liklihood of various trigrams (groups of three letters) occurring in different languages (because this is a useful thing for detecting OCR errors in text which you might not want to spell-check for various reasons) but it seems that none exist. Lots of papers reference them, but it looks like a definitive collection has not come yet. (One good approach to doing this would be to take the spelling tables from aspell and generate them.) Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








