[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What is the general direction you are seeing these daysto

  • From: Peter Hunsberger <peter.hunsberger@gmail.com>
  • To: "Costello, Roger L." <costello@mitre.org>
  • Date: Wed, 11 Mar 2015 13:25:51 -0500

Re:  What is the general direction you are seeing these daysto
Steve hits on an important part of my reasoning.  For example, you can take something like Hadoop and run variations of analysis iteratively. So let's say you're doing a (now classic) friends of friends analysis which is known to have polynomial complexity as you increase the relationship depth.  For a given set of users that depth can vary considerably depending on how far away any given person is from a "super node" or other data patterns.  Set an upper bound on execution time and start running the analysis, continually increasing the depth until you hit that bound.  You're going to pull out way more interesting data; things like there is a 40% chance of knowing somebody that know somebody that knows Kevin Bacon and an 70% chance of knowing someone at 4 steps. etc.  If you're dealing in statistical analysis then the algorithms are already coded up for many common analysis and it's just a case of configuring them for a given use case.  Yes, you are talking about entire new sets of infrastructure and skill set for many organizations, but the gain is the ability to perform many orders of magnitude more analysis tasks,  perform them many orders of magnitude faster, and perform them over many more magnitudes of volume of data.

Having said all that, I do have to qualify it: I don't know the business domain, I don't know your organization and I don't know your organizations technical capabilities.  I'm making this recommendations based purely on two things: you have a huge volume of data and you tell us you want to feed something into SAS and SPSS.  I'm assuming that this is part of a larger set of analysis that is ongoing and that it is worth some considerable investment to build a tool set to get the benefits I describe above....

Peter Hunsberger

On Wed, Mar 11, 2015 at 4:14 AM, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

 

Peter made a very interesting assertion:

 

The analytics should run directly on the data,

not on some extract.

 

My plan was to perform XPath and XQuery on the 50 million XML documents and then use the query results as input into SAS and SPSS analytics. So my approach is quite different than what Peter advocates.

 

Peter, why do you assert that the analytics should be run directly on the data? Why is that superior to querying the data and using the query results as input to the analytics? Does everyone agree with Peter that the analytics should be run directly on the data? Anyone disagree with Peter?

 

/Roger

 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.