[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Re: English sentences, was: Re: Announce: XML Sc


english sentences
John Cowan wrote:


> Jonathan Borden scripsit:
>
> > It all depends on what exactly you want, or intend the validator to do.
What
> > you are saying, in essense, is that an "English sentence" is not defined
as
> > a sequence of characters which conform to "text-en" and this is most
true.
>
> The original point seems to have gotten lost.

Actually this _is_ the original point, isn't it? You are saying that using a
specific character set isn't a reliable way to detect a human language
(because other characters might be correctly present) and I am agreeing (but
because the _problem_ is way more complicated than character sets).

>
> The publisher's use case was for a datatype representing those letters,
> and only those letters, used in writing the Dutch language.  Formally, of
> course, that's easy: it's an xsd:string type with a pattern facet
> consisting of "[ a-zA-Z...]+".  The question is, just what are those
> other letters represented by the ellipsis in any given case?
>
> I used the examples of "façade" and "coöperate" and "naïve" to
> illustrate that this problem may or may not have a clear-cut answer.
These
> are not foreign words; they are standard spellings (though not the only
> standard spellings) of standard English words.
>
> It's perfectly true that a sentence like "Al-Musa said, '<insert
> Arabic here>'." is also an English sentence even if the Arabic text
> is expressed in the Arabic script.  But that isn't my point.

It's another good point however. What I am saying is that there are lots of
good reasons why what was suggested might not be reliable (either false
positives or false negatives).

>
> > Indeed to reliably detect an English sentence the 'recognizer' needs to
> > understand how to form words from characters and sentences from words.
This
> > is way outside the capabilities of the XML schema definition languages
we
> > have been discussing.
>
> Of course, of course.  But even at the level of characters, there is
> a *definitional* (not implementation) problem in saying just what
> the character repertoire of <insert language here> is.
> Many have come up against this rock and crashed against it.

Agreed.

Jonathan


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.