|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Re: English sentences, was: Re: Announce: XML Sc
John Cowan wrote: > Jonathan Borden scripsit: > > > It all depends on what exactly you want, or intend the validator to do. What > > you are saying, in essense, is that an "English sentence" is not defined as > > a sequence of characters which conform to "text-en" and this is most true. > > The original point seems to have gotten lost. Actually this _is_ the original point, isn't it? You are saying that using a specific character set isn't a reliable way to detect a human language (because other characters might be correctly present) and I am agreeing (but because the _problem_ is way more complicated than character sets). > > The publisher's use case was for a datatype representing those letters, > and only those letters, used in writing the Dutch language. Formally, of > course, that's easy: it's an xsd:string type with a pattern facet > consisting of "[ a-zA-Z...]+". The question is, just what are those > other letters represented by the ellipsis in any given case? > > I used the examples of "façade" and "coöperate" and "naïve" to > illustrate that this problem may or may not have a clear-cut answer. These > are not foreign words; they are standard spellings (though not the only > standard spellings) of standard English words. > > It's perfectly true that a sentence like "Al-Musa said, '<insert > Arabic here>'." is also an English sentence even if the Arabic text > is expressed in the Arabic script. But that isn't my point. It's another good point however. What I am saying is that there are lots of good reasons why what was suggested might not be reliable (either false positives or false negatives). > > > Indeed to reliably detect an English sentence the 'recognizer' needs to > > understand how to form words from characters and sentences from words. This > > is way outside the capabilities of the XML schema definition languages we > > have been discussing. > > Of course, of course. But even at the level of characters, there is > a *definitional* (not implementation) problem in saying just what > the character repertoire of <insert language here> is. > Many have come up against this rock and crashed against it. Agreed. Jonathan
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








