Big Data, Cloud & DevOps

The role of the data curator: Make data scientists more productive

The ability to harness data to solve critical business challenges is an essential skill for every organization today. There are two primary roles responsible for this function—data scientists and data analysts. Unfortunately these

Fake News and the Responsibility of Data Scientists

95% of statistics are made up. Discussions about fact versus truth come up quite a bit these days, especially with the proliferation of "fake news" and the news media's coverage of certain

Getting That Data Science Job

like a baby bird jumping from the nest Most people who attempt to get hired as a data scientist fail. This article is to help clarify what is happening and increase the chances

  • Does It Make Sense to Do Big Data with Small Nodes?

    In this age of big data and powerful commodity hardware, there’s an ongoing debate about node size. Does it make sense to use a lot of small nodes to handle big data workloads? Or should we instead use only a handful of very big nodes? If we need to process 200TB of data, for example, is it better to do so with 200 nodes with 4 cores and 1 terabyte each, or to use 20 nodes with 40 cores and 10 terabytes each? One reason we hear is that having all this processing power doesn’t really matter because it’s all about the data. If nodes are limited to a single terabyte, increasing processing power doesn’t really help things and only serves to make bottlenecks worse.

    Data is a stakeholder

    Data science is currently very good at coming up with answers. It’s not very good at coming up with questions. I believe that requires data scientists to pay more attention to building non-technical skills, but I think it also requires us to build more tools that facilitate that part of the process. In fact, building the tools will contribute, in large measure, to building the non-technical skills.

    Trying to Persuade with Data?

    Data professionals have to consider the environment around them when creating a data story. It’s not enough to find an issue and then start raising a red flag. Consider your audience and craft your message in a way that they can hear bad news, consider if others even consider the issue a problem, and then work with others to solve the issue.

    Blurred Lines: Data Analyst vs Data Science

    In the world of exponential data growth, companies are turning to 2 jobs to solve some of their biggest problems, Data Analyst and Data Science. However, it’s becoming more apparent that the business world is unsure how to appropriately define the scope and differentiate between these roles. There are near identical skills required in both, but there is a key difference in what separates these roles. Businesses need to ensure they do not blur the lines.

    Piketty Revisited: Improving Economics through Data Science

    The data curation step involves discovering, analyzing, cleaning, transforming, combining, and de-duplicating data sources to produce target data sources that meet the requirements for input to the analysis. Every data curation step should be documented as data provenance that is then compared against the controls to determine the extent to which the appropriate data governance was followed and the required data quality was achieved. 

    Doubt and Verify: Data Science Power Tools

    In all fields new facts and knowledge are constantly being produced based on new data, discoveries, experience, and research -­‐ far more than a single individual can absorb let alone put into practice. So how do professionals or how does anyone understand that they have a bias, its nature and limitations? And re-evaluate their knowledge (world view) in light of new facts (“ground truth”) and conclusions?

    What’s the difference between data science, machine learning, and artificial intelligence?

    When I introduce myself as a data scientist, I often get questions like "What's the difference between that and machine learning?" or "Does that mean you work on artificial intelligence?" I've responded enough

    What would be useful for aspiring data scientists to know?

    Now that I have secured a Data Science (DS) job, some people have come to ask me questions about how I made the transition into DS and into industry in general. I hope

    Sixteen useful Advices for Aspiring Data Scientists

    Why is data science sexy? It has something to do with so many new applications and entire new industries come into being from the judicious use of copious amounts of data. Examples include

