[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Something altogether different?


RE:  Something altogether different?
Salton's approach makes it easy to know when things are 
similar.  Then the human sorts out the noise.  That is 
fine for things that run at human speed.  That is not 
fine for things that run at machine speed and have 
quick system-wide effects.  The power laws of crap 
feeding back to crap have not been suspended.

Take the vector measures and tie them together with 
URIs across multiple notations for the same observations 
and that is an interesting system for machine learning 
as has been shown time and time again.  They aren't 
as useful for targeting munitions; they can be useful 
for fusing multiple systems and giving a human a 
short list, or better, a space of solutions, and that 
is what we see from Google et al.  The web works because 
human smarts take up the slack for computer dumb. 

Google is fine until you try to dispatch an emergency 
system based on it's address and maps.  Two problems:

1.  Locations can be off by half a mile or more.

2.  Satellite photos are stale (by as much as 18 months) 
and vary in the resolution of a given adjacent area that 
is less than ten miles.

3.  In the investigation that follows, one isn't allowed 
to mix unvetted data with vetted data (by policy, the 
name of the neighbor can't be entered without the neighbor 
having a defined role in the event (eg, a witness)).

Dumb things done with dumb data are fine until you need 
something smart and accurate fast.  Relaxing reliability 
to get deployment scale does work.  Ask any driver of a 
T-34.  Massed deployment always beats high potential assets 
in smaller numbers if you can sustain high initial casualty 
rates.

len


From: Ken North [mailto:kennorth@s...]

Len Bullard wrote:
2)  Where one can establish a similarity metric, is that good enough, as
Bosworth is claiming for human processes, for machine-processes?
Bosworth is playing fast and loose with the noise problems.

Cohen and Fan discuss the noise issue in the paper about the CF spider,
which
uses a variant of the cosine distance measure of textual similarity (used in
WHIRL):
"However, although the data is noisy, it seems reasonable to believe metrics
based on it can be used for comparative purposes. We note also that CF
systems
which can learn from this sort of noisy "observational" data (e.g.,
[Liebermann,
1995; Perkowitz & Etzioni, 1997]) are potentially far more valuable than CF
systems that require explicit noise-free ratings."

The solution to the semantic web might be millions of people creating
Atom/RSS,
but I'm more optimistic about applying machine learning with enough
hardware.
Google has already shown an array of processors can crunch the web's
content. If
you embark on creating Google++ using technologies such as WHIRL and the CF
spider, you'll need a large array of hardware. But as Bosworth noted in the
Powerpoint presentation, hardware is cheap.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.