[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: More on Vector Models


vector models
Not at all.  There is no implied or explicit profit 
motive to 'move on'.  There are 
problems that markup doesn't solve, but possibly 
there are also old solutions that can improve markup. 
That is why I was querying Steve DeRose.  His is a 
world class mind with lots of experience in this 
and other fields.  From time to time, the idea of 
combining vector techniques with markup comes up. 
Bosworth's presentation is another stimulus.

1.  Vector space models are old.  (See Salton et al).  
VSM technologies incorporate a set of techniques 
that have been refined over the years to enable such 
things as normalization, increased use of probability, 
relaxed constraints on term independence, use of the 
document vectors to get relevance feedback, etc.

2. One doesn't move on to the next big thing.  One 
looks at the data environment and builds systems that 
cope with what is as is and then possibly, pushes it 
to be otherwise.  

Again, in the record systems I see, there is far more
unstructured text data than any other kind.  So means 
to handle that more effectively are worth investigating. 
Innovation on those means is always desirable.  If one 
really wants to improve the user experience, it pays to 
experience it.   So where we are stuffing lots of 
unstructured text into varchars, being able 
to index the contents in a standard way and then mine that 
more effectively is a big improvement over Like *string* 
statements.

Vector space models are a known effective way to do that. 
Reading indicates that the techniques are now better than 
the last time I looked (eg, the short doc problem is not 
a problem, viz is really cheap or free, etc.).

XML is now a part of a broader set of technologies. 
Exhausted?  Hardly.  Exclusive?  Not at all.  XML is a step  
above the level of 'bag o' words' which is the level 
where VSM thrives.   The question is, given VSM and 
'bag o' words', when should one move on to markup? 
There are some obvious answers and maybe some not 
so obvious.

len


From: David Lyon [mailto:david.lyon@c...]

but "knowing" and seeing are two different things.

As an example, I know that I should be able to get
an electronic receipt loaded from the service station
into (an accounting system in) my mobile phone when 
I go to pay. But seeing that in practice is something 
that is yet to happen.

To label it just "data transport" removes any form
of personalisation and connection with a personal
experience. I think that is a major shortcoming.

I doubt that we have had all "the possible" personal
experiences with xml that we could ever imagine.

Just as there is coffee and there is coffee. Even the 
customer experience that one can have with a simple 
cup of coffee has evolved somewhat over the 
last 20 years.

So I would say that there is still room for change yet
over the next twenty years - even in coffee drinking
where one would think that the choices are fairly
limited.

> The subtleties are in applications.  There can be lots
> of those and there are lots of semantics, but XML is
> blithely ignorant of those.  A very high percentage of
> the discussions on this and other lists that talk about
> 'doing XML' are really about 'applying' XML.

Exactly. It's a 'customer experience' thing.

> There are overlapping areas though that should get
> our attention.  One of these is indexing and automated
> categorization.  Vector models are pretty good at both.

This is out of my field... I actually have no idea what
this is about. Maybe it's the next big thing...

> If you have the vector indices, do you need the markup?

Sounds like the big guns are moving their focus away
from xml onto more potentially profitable pastures. 

Maybe xml has been milked to the point where there 
are no longer any big and easy profits to be made.

I detect that this is really the question that you are
asking, rather than anything to do with markup itself.

(xml) Markup is an extension of the English language.

It makes sense to use it in applications such as
Accounting systems and other day-to-day systems.

So while it sounds like you might be ready to move
onto bigger and better things, I doubt that the
practical uses of things like xml will be going away
anytime soon.

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.