What’s big for big data for 2018

Ready for Big Data Training & Certification? Browse courses like Big Data – What Every Manager Needs to Know developed by industry thought leaders and Experfy in Harvard Innovation Lab.

In his last book, “Thank You For Being Late,” Thomas Friedman highlights 2007 as one of the most pivotal years in the technology space: 2007 saw the birth of Hadoop, the iPhone and Amazon’s Kindle for instance.

I’m no Tom Friedman. But, I’ve been in this industry for what seems like an eternity. I have lived through many tech transformation cycles and have heard endless predictions about what was supposed to “happen next.”

In my humble opinion, 2018 will unleash a major disruption for the analytics and data management space. It will upend decades worth of accepted practices and introduce new winners and losers. At the center of the storm is the public cloud and the many implications it will have for big data.

Prediction #1: The public Cloud becomes the new Data Lake.

Hadoop was the world’s first implementation of a data lake with its native “schema on read” architecture. A data lake is essentially a distributed file system with a catalog, providing users with the flexibility of storing “as is” today while adding a schema only when data needed to be accessed.

The cloud based distributed file systems like Amazon S3, Microsoft ADLS and Google Cloud Storage have similar capabilities and will serve as an alternative data lake platform in the future. Ovum's latest global survey for all big data workloads showed that 27.5% of them are already deployed in the cloud. The industry’s latest Big Data Maturity survey predicted that 72% of you will do Big Data Analytics in the Cloud in the next 4 years. If you’re not convinced of this trend, look again!

Why this matters: Enterprises are looking to the cloud to offload infrastructure management and they already land their data in their respective cloud provider’s data store. Now, with the added capabilities of engines like Spark (from DataBricks, which recently announced $140M in funding) and Presto to query data in situ, cloud customers can harness the power of the data lake without the overhead and cost of managing a Hadoop cluster. I believe that this will become a huge trend that will upend the notion of a data warehouse and bring schema on read to the masses.

Prediction #2: "Insight as a Service” and the Office of the CDO become the norm for the enterprise.

I am seeing more and more enterprises struggle to deal with the unintended consequences of the self-service BI revolution. According to Gartner, the self-service business intelligence space grew by 60 percent in 2015 but it tapered off shortly thereafter. Why? Because, when business users take on more data management tasks, enterprises notice that their employees spend too much time “data-wrangling” and not enough analyzing data to drive revenue and lower costs.

Meanwhile, IT has been struggling to govern, secure and deliver the quality data the business needs. In response to this, an increasing number of enterprises are establishing “centers of excellence” (CoE) to produce "insights as a service”. According to Forrester, the market for “Insights as a Service” will double in 2018 and 80 percent of firms will be relying on such capabilities. The architects of the CoE are the chief data officers (CDO) and the chief analytics officer (CAO). In 2018, we expected enterprises to hire more of them and elevate them to the C suite.

Why this matters: Self-service business intelligence is here to stay, no doubt. However, I predict that we will see an evolution of self-service BI from a “free for all” to a governed data access model managed by a central data group and the CoE. This means that the role of the data engineer will move back to the IT and the business will focus on creating insights instead of data marts. This movement will require a whole new set of tools to facilitate data governance and the semantic layer will again become king.

Prediction #3: Multi-cloud strategy fails as a strategy.

We’ve seen this movie before. The data platform vendors do everything in their power to lock customers into their proprietary ecosystems. So, why would the cloud be any different?

I’ve spoken to a number of large enterprise customers and they all have the intention of using the Cloud vendors as “dumb pipes.” I can’t tell you how many CIOs have told me that they will invest in more than one cloud vendor. The truth is that this runs counter to the plans of the cloud vendors who are pushing their proprietary tools by making them so tantalizingly easy to use. Once any department chooses to leverage a cloud vendor’s tools to deliver a new capability, it’s game over – you’re locked in.

Why this matters: The goal of a multi-cloud strategy is to minimize switching costs. It’s inevitable that your teams will deploy applications that leverage proprietary cloud technologies. The key is to “firewall” or insulate downstream applications and users from those technology choices. That means leveraging cloud independent interfaces and data semantic layers so if you chose to switch your cloud providers, you can minimize the amount of change required to do so. Feel free to educate yourself around the concept of semantic layers and don’t get locked-in.

What’s big for big data for 2018

The role of the data curator: Make data scientists more productive