[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Data science, data analytics using XSLT streaming

Subject: Re: Data science, data analytics using XSLT streaming
From: Ihe Onwuka <ihe.onwuka@xxxxxxxxx>
Date: Tue, 5 Nov 2013 17:29:36 +0000
Re:  Data science
It would be better then to start with more accurate descriptions of
what data science entails than Roger originally provided. Here are a
couple.

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

and

A data scientist is someone who can obtain, scrub, explore, model
and interpret data, blending hacking, statistics and machine
learning. Data scientists not only are adept at working with data, but
appreciate data itself as a first-class product.

-- Hilary Mason, chief scientist at bit.ly

There is also the visualization aspect which neither of the above
definitions mention but is catered to by products like Tableau.

While it is definitely fun  data scientists spend a good proportion of
their time obtaining, labelling and scrubbing the data and that is not
so fun.

On Tue, Nov 5, 2013 at 4:35 PM, Wendell Piez <wapiez@xxxxxxxxxxxxxxx> wrote:
> Hi,
>
> I agree with Andrew. These projects are fun, and XSLT pipelines are
> well suited to them, because they are capable of exposing the semantic
> issues and keeping syntax out of the way. This is even true when the
> first step is a rendering of a non-XML format into an XML
> representation, and the last step is serializing the data in a form
> optimized for something else (such as your query engine of choice).
>
> And no, streaming is not necessary, although it can help.
>
> Plus, the biggest problem isn't scale anyway: it's the semantic
> integrity of the data or (more likely) the lack thereof. This can be
> compounded by non-technical issues such as data owners not seeing the
> information they actually have because they are blinded by their
> expectations of what it is "supposed" to be.
>
> Cheers, Wendell
>
> Wendell Piez | http://www.wendellpiez.com
> XML | XSLT | electronic publishing
> Eat Your Vegetables
> _____oo_________o_o___ooooo____ooooooo_^
>
>
> On Tue, Nov 5, 2013 at 5:26 AM, Andrew Welch <andrew.j.welch@xxxxxxxxx>
wrote:
>>> XSLT streaming is all about processing large amounts of (XML-formatted)
data.
>>>
>>> So XSLT streaming should fit in the "data science" and "data analytics"
categories.
>>>
>>> Broad Question: Would you provide a scenario/example of doing data
science/data analytics using XSLT streaming please?
>>
>> Typically the data is held in multiple files rather than 1 big one, so
>> you don't necessarily need streaming, just a set of steps that process
>> directories of xml into various intermediate formats, then into the
>> final presentation view (such as a table with the data grouped,
>> sorted, with counts)
>>
>> I've done this sort of thing a few times now and I always enjoy it.
>>
>>
>> --
>> Andrew Welch
>> http://andrewjwelch.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.