Data science teams are bringing new business values across the enterprise supply chainso much so that the business units who do not have access to a central Data Center have all set up their own, home-grown data science teams with internal experts. In their book about the data science profession, Tom Davenport & DJ Patil describes data science to be the Sexiest Job of the 21st Century (Data Scientist: The Sexiest Job of the 21st Century).
Traditional BI versus data science
Just to give some examples of how the traditional BI roles have evolved into the new data scientist roles:
- While the traditional BI analyst simply analyzes and reports the quarterly sales for a specific zone, the data scientist may provide future predictions and sales forecasts based on historical sales data.
- While the traditional BI analyst will primarily depend on customer-feedback forms and occasional surveys to gather customer data, the modern data scientist will utilize novel technological platforms like the social media or forum posts to discover valuable insights about different customer segments.
- While the traditional BI analyst may limit the data analysis to finding average cost of shipment of specific goods, the data scientist will also review shipment patterns and interrelationships with other cost data to identify ways to reduce or optimize shipment costs.
Typical data scientist profile
It is probably difficult to pin down an ideal data scientist profile because in current practice, in-house data science teams are formed with experts with a wide variety of academic and professional backgrounds. Booz Allen Hamiltons field guide on data science seems to suggest that proficiency in key technical disciplines is required in order to be considered for a role in a data-science team. Subjects like computer science, domain expertise, or mathematics are high in demand in this profession. As Hamiltons primer points out, a computer science degree is essential for data processing tasks; domain expertise helps the data scientist understand the problem and the processes for measuring the problem; and advanced math skills are required for understanding the business goal of algorithms.
Along with the above core skills, data-science team members must also be skeptical and curious by nature, technically astute with solid quantitative orientation, and demonstrate an aptitude for collaboration and communicative approach to problem solving.
Data science team composition
Booz Allen Hamiltons 110-page primer called The Field Guide to Data Science provides in-depth details of data-science team building. The field guide summarizes some basic characteristics of ideal data-science teams. The underlying assumption here is that collectively, the members of an effective data science team will possess the following qualities:
Ability to ask questions: The natural curiosity to discover relationships between data, and ask pertinent questions is an essential prerequisite.
Creative problem solving: The inclination to ignore established methods and devise new problem-solving approaches is a welcome attribute.
Tenacity: The ability to remain focused for extended periods of time while designing and testing solutions is absolutely necessary. Many times, apparent solutions may fall throughnecessitating fresh rounds of designing and testing from start to finish.
Detail-orientation: Nothing can be left to guesswork in data science projects; so team members must be highly detail-oriented to minimize the risk of intuitive judgments.
In response to a common complaint that data science team members are often reluctant to explore new approaches to problems, Booz Allen Hamiltons field guide recommends creating an environment of trust and communication across all levels, instead of deference to authority.
To ensure your DST provides the greatest value, some unique support roles must also be present in the core team. The data scientist is often aided and mentored by supporting team members, a few of whom are described here:
- A self-taught data guru: This member may or may not possess formal data science training, but may have acquired rare expertise in a specific business area. Additionally, this individual may have an inventive streak to drive the exploration process in team projects.
- A data detective: This person will ideally have a programming background and will be a pro at developing evidence-based analytics strategy. This individual is often very good at understanding the total context and meaning of data, which an algorithm is not proficient in. This person may be involved at the exploratory and post-modeling phases to provide that extra layer of contextual understanding to drive really effective insights.
- Visualizer: This person bridges the gap between a contextual understanding of data and a roadmap for a solution. This person closely interacts with content developers or graphic designers to create intuitive graphics that go beyond conventional bar charts and line graphs. This person is often used in the team for prototyping a solution and building the data story to ensure that the business case can be presented from multiple angles without misrepresenting the data.
When enterprises struggle to convince stakeholders in their organization about the merits of a data science solution, these supporting roles can ensure that the presented data solutions provide real value. Depending on the size of your organization, generally one or two people can fulfill all these roles.
Organizational models: Integrated, distributed, and hybrid
At the top of the organizational data science team, there is generally a Chief Data Executive or Chief Data Scientist heading the data science function. If that role does not exist in a decentralized, data science environment, then smaller, divisional data-science teams must have strong leaders capable of realizing widespread buy-in of the teams objectives.
In an Integrated Data Center, thedata science functions are under the centralized control of a Chief Data Scientist (CDT). In this model, the data science team is expected to function as a hub, providing analytics and BI support to the entire organization.
In Distributed Data Science Teams, the team structure is completely decentralized. Each business unit will usually have its own data science capabilitiesfunctioning under a team leader.
In a Hybrid model, in spiteof the presence of a centralized Data Center, individual business units may also have their own data science capabilities, which are often diffused among different employees.
Problems of outsourcing or crowd-sourcing
Organizations that do not have the financial resources or the confidence to build and deploy in-house data science teams, have the options of either outsourcing their data science function to cloud-based service providers, or crowd-sourcing their projects on need basis. The common problems associated with crowd-sourcing are:
- Time consuming to share and transfer domain-related problems
- Should be prepared for a wide variety of opinions and advice
- Should have a Plan B in case of crowd-sourcing failure
- Quality of service out of control.
Big data consulting marketplaces like Experfy are a response to these challenges because they provide a deadline-driven platform where vetted experts can perform work.
EMC Course on Leading Data Science Teams
This EMC course title Data Science and Big Data Analytics for Business Transformation, teaches the business value of data science, discusses data science lifecycles and team models, and concludes with innovations in data science.