Ready to learn Data Science? Browse courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
In 2011, The McKinsey Global Institute put out a report predicting significant shortfalls in supplies of data scientists and data-savvy managers. The report in general, and those conclusions in particular, attracted a lot of attention at the time, but what’s happened since then?
There are more data scientists
There has been a lot of progress on the first of those two shortages. The number of people using data science titles in LinkedIn profiles has doubled in the past four years, according to a study conducted by RJMetrics and reported in Forbes. Data science job listings have also increased — by 57% between the first quarter of 2014 and the first quarter of 2015 according to Indeed.
Some of those increases are probably attributable to terminology changes, with data science replacing titles for jobs drawing on similar underlying skills that might previously have been labeled things like statistician, analyst, or researcher. But some of the growth is almost certainly real, and there has also been growth in the number of academic degrees in data science and related fields and in the number of students pursuing those degrees.
But there is little evidence of growth in the number of data-savvy managers
But what about the supply of data-savvy managers? That’s difficult to measure definitively in the absence of some sort of data competency test; however circumstantial evidence, combined with the composition of the work force, suggest that less progress has been made in improving data competency among managers and the rest of the workforce not engaged directly in data-related functions.
If managers and others in the workforce are engaging more with data-related topics, that increased interest would be reflected in greater Internet search activity for data-related terms. The chart below shows weekly Google Trends results for five such terms (data, analytics, analysis, statistics, Excel) going back to 2004. Google Trends measures of the relative frequency of searches for different terms compared to all search activity. It uses 100 for the highest relative level of search activity (compared to all searches for all topics) within the range of time and topics being considered and measures all other data points being considered against that.
As illustrated in the chart, across the world and across categories, searches for data, analysis, and statistics were more common (relative to other things people search for) in 2004 than they are now. The two data-related terms that are now searched for relatively more frequently than they were in 2004 are Excel and analytics. Searches for analytics grew in a relative sense up through 2010, but plateaued after that (compared to searches for everything else). While there has been a modest growth trend in the relative frequency of searches for Excel in the past decade and a lot of data analysis can be done with spreadsheets, Excel isn’t what springs to mind when most people think of data science and most of its capabilities have existed in spreadsheets for decades.
As shown in the chart, there is a lot of week-to-week variation in search frequency. For example, as can be seen in the chart above, the relative frequency of searches for all of these data-related terms tends to drop over the Christmas / New Year period when many people are not working. There is also variation across categories and across countries, so let’s take a look at that.
Interest in data appears to be growing more within science than within business
The chart below shows the average relative search frequency for 2015 compared to 2011 by category — for example, whether people search for sports statistics versus health statistics. Once again, this doesn’t refer to the absolute frequency of searches, but rather the number of searches for each combination of category and term compared to the overall total number of searches. A value of 100% on this chart means that searches for a particular term within a particular category did not increase or decrease between 2011 and 2015 as a proportion of all searches. A value over 100% means the term * category combination accounted for a greater proportion of searches in 2015 than in 2011, whereas a value less than 100% means that the term * category combination accounted for a lesser proportion of searches in 2015 than it did in 2011.
As can be seen from the chart, the greatest increase in searches for the term data came within the games category, followed by the Internet and telecom, sports, and science categories. Searches for analytics and Excel also increased by 20% within the science category. Searches for the term data did not increase at all within the business and industry category. Within that category, searches for analytics did increase significantly (by 30% between 2011 and 2015), but since searches for analysis and statistics both dropped that may reflect a linguistic transition rather than growing interest in analytics.
Interest in data has increased dramatically in China
As is so often the case, the situation in China is different than in other countries. The chart below shows the relative change in search frequency between 2011 and 2015 in a selection of countries, and as shown, there was dramatic relative growth in searches for the terms data and analysis in China between 2011 and 2015.
Conversely, the relative frequency of searches for all data-related search terms dropped between 2011 and 2015 in Japan, India, and Russia, so it seems highly unlikely that data competency is growing within the workforces and managerial ranks in those countries
Data scientists will struggle to be effective without data-savvy managers and colleagues
Data has become ubiquitous and the number and power of tools for working with data seem to grow by the day. There are indications that the number of data scientists available to apply those tools to all of that data is growing. But the McKinsey report discussed at the start of this post estimated that about ten times more data-savvy managers than data scientists would be needed to take advantage of the data revolution, and it doesn’t appear that much progress is being made in growing the data competency of managers (and others whose work could be enhanced through advances in data science).
In many ways, that’s not surprising. Many people in management are well into their careers, having started them at a time when data was relatively scarce. Faced with that scarcity, they had to make decisions based on experience, judgement, and the relatively limited amount of data available. That available data was often easy to store, manipulate and visualize using nothing more than a spreadsheet. Given comfort and prior success with that approach, many managers may not perceive a need to take time away from other priorities to become more data-savvy.
Some may be aware of the growing importance of data, but believe they addressed the issue by hiring data scientists or even whole teams of data scientists. But the real question is how effective those people and teams can be if the data competency of those who manage them and whose work is interdependent with theirs is not that different than it was back when data was small and scarce. In my view, the answer to that question is ‘not very’.
Already data scientists need to know a lot of things that take a long time to master. It’s unrealistic to expect them to also have a deep understanding of all of a businesses’ operations and of all the decisions colleagues within that business need to make. Confidentiality concerns may also limit data scientists’ visibility into those decisions. And without greater understanding of how data can now drive strategic, tactical, and operational decisions, decision makers won’t be able to ask data scientists the right questions and direct their efforts toward finding the information and insights that can add the most value.
For data and data science to achieve their full potential, non data scientists need to become more data-savvy. Of course, some improvement will happen over time as the curriculum of business schools and other educational institutions evolves, but those changes will take a long time to flow through the workforce. In the meantime, there is a real need for training and development within the existing workforce to increase awareness of what can be achieved through smart use of data and to help managers and others learn to work effectively with colleagues who are data specialists.