[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What is the general direction you are seeing these daysto

  • From: Peter Hunsberger <peter.hunsberger@gmail.com>
  • To: ihe.onwuka@gmail.com
  • Date: Tue, 10 Mar 2015 18:58:11 -0500

Re:  What is the general direction you are seeing these daysto
Well it's not exactly like it's hard to get data out of Casandra, Neo4J, Titan or any relational database you might choose.  In fact I'd argue it's easier to build a generic services oriented endpoint on top of those then on top of 50 million XML files....   Both Titan and Neo4j have JSON as native output and there is, for example, a SPARQL plugin for NEO4J and similarly Rexster for Titan will give you a SPARQL endpoint (should that be your flavor of the month).  

However, once more I'll emphasize that it seems very likely that the proper way to manage this is a big data problem using the tools designed for that and not to aim at feeding tool sets designed for other problems.  The analytics should run directly on the data, not on some extract. Recently I saw a complaint about Neo4J taking 7 minutes to traverse a billion nodes and the developers wanted to see diagnostics to figure out why things were taking so long,  These tools are designed for finding answers hidden within very, very large data sets.  As a very rough guess, assuming some degree of normalization is possible, Rogers entire data set might equate to something like 2 to 3 billion nodes and 3 to 5 billion edges which would be manageable in a small Titan cluster.  Titan has been used with graphs of 100 billion edges...

Peter Hunsberger

On Mon, Mar 9, 2015 at 8:14 PM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:

---------- Forwarded message ----------
From: Ihe Onwuka <ihe.onwuka@gmail.com>
Date: Mon, Mar 9, 2015 at 9:11 PM
Subject: Re: What is the general direction you are seeing these days to store and query lots of large complex XML?
To: Peter Hunsberger <peter.hunsberger@g...>, "xml-dev@l..." <xml-dev@l...>


I'm not in disagreement. I would not do serious analytics in XSLT/XQuery either but it's a hedge so that when the client decide to switch from  sparse matrix multiplication in  SQL to something more domain specific like  Mathematica  you are confident you can construct a feed to service that.

 


On Mon, Mar 9, 2015 at 8:43 PM, Peter Hunsberger <peter.hunsberger@g...> wrote:
Umm, no, or rather most emphatically; NO!  XML is a poor mans graph at best. For flat data (which it sounds like this mostly is) XML makes even less sense.  But let's consider the more complex case: real graph traversal algorithms come pre-built for things like Neo4J and Titan and things like Gremlin beat the heck out of xPath, XSLT, xQuery, et al (and I'm an Apache Cocoon committer so I do believe in using those for the right problem!).  Titan wasn't considered possible when XSLT was first conceived, the state of the art has progressed considerably since then. Graph databases, Hadoop and it's related infrastructure aren't the flavor of the month and are not going anywhere.  They package up entire generations of Computer Sciences best practices into well thought out, incredibly powerful, easily deployable systems.  If Roger is truly asking about a big data problem then the fact that his data arrived in the form of XML should not influence his choice of tool chain.  Rather, he should be using the tools that are designed to deal with data volumes the size he mentions and solve the real problem, not just an intermediate step.

BTW, this arrived off list, feel free to put it back on list if you wish...

Peter Hunsberger

On Mon, Mar 9, 2015 at 6:36 PM, Ihe Onwuka <ihe.onwuka@gmail.com> wrote:


On Mon, Mar 9, 2015 at 6:06 PM, Peter Hunsberger <peter.hunsberger@g...> wrote:
Yes, unless there is a need to forward on the XML to some other endpoint I can't really see why it would need to stay as XML?  


Because it's easy to get it out of XML into whatever shape or form your analytics idea of the day/week/month/epoch needs it? 






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.