|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: More on Vector Models - Chicken and Duck talk
Hi Len, Interesting set of concepts. Back to the old Enterprise view (StarTrek that is) of trying to understand brave new worlds. Different worlds, cultures and beings. When the world was a much more innocent place (not really of course). On Sat, 30 Apr 2005 4:52 am, Bullard, Claude L (Len) wrote: > If the value of indexing is expressed as the function of the > density of objects in addressable space so that performance > is inversely proportional to the space density (actually, the > address space itself), XML vocabularies increase the > density of the space as well as introducing ambiguity > and uncertainty through semantic loading and can actually > hurt the performance of the system. (yes|no ?) No. there should be no such thing as a performance bottleneck in an enterprise xml system. This tends to only happen in larger organisations. A few years ago I contracted to a telco and worked on integration of payphones into their central system. The big surprise to me was that there was actually five databases that held information about payphones in the land and it simply wasn't possible to do a "select count(*) from payphones where (status="Active")". Just a simple thing but absolutely not possible. It was possible to know how many 20c pieces were collected nationwide in a week, but not how many phones were in service at any one time. Anyway, that's just one experience I have of computers in an enterprise at a very large scale. It can be a mess and there is no easy way to sort it out. I think the majority of xml development has been governed by engineers with this sort of experience. Down at the small business, things are the opposite. Accounting systems usually store everything and delays in processing are usually physical. The time to run up the stairs to the office to check the computer. Ambiguity is always handled by a human mental process and resolved by either speaking softly or yelling down the phone at the other party. There's also the classic deference strategy of "the cheque is in the mail". So the two cultures, small organisation and large organisation are diametrically opposed. The larger ones are process driven whilst the smaller ones are sales driven. > That's why Bosworth's presentation has merit. The problem > however, is that it simply moves the calculation of the similarity > metric away from the apriori schema declaration into raw > microparsed vector results. Hmm.. I'll have to feed this one to computer.... > A schema is the declaration of a > space where occurrence indicators are a determinant of frequency > and therefore, similarity given a rule that frequent terms are > less important than rare terms within a document (term vectors), > and more important across documents (document vectors). Probably. In Chinese, they have this expression called "Chicken and Duck talk" where the chicken speaks in it's language, and the duck in it's. They are both happy. Whilst I never saw this in Star Trek, I think it would make for an interesting future episode. Actually "Chicken and Duck talk" describes what is happening with xml between large and small enterprises. Neither side really gets what the other is saying. I hope in the future that these different cultures can be bridged and that xml is the path. Take care... David -- Computergrid : The ones with the most connections win.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








