“Requirements rarely lie on the surface”
The majority of Data Science projects fail. I will not even provide any references in support of this statement — the Internet is full of examples. The reasons for the high failure rate are many and varied. However, as surprising as this may sound, one of the main reasons is the lack of clearly defined project goal(s) and the associated requirements.
Problem understanding and requirements gathering make up an initial phase in pretty much any project management framework, including the widely used “Cross-Industry Standard Process for Data Mining” (CRISP-DM). This implies that the project goals and requirements are already there, waiting to be “gathered”. However, as Andrew Hunt and David Thomas say in their famous book “The Pragmatic Programmer”:
“It doesn’t quite work that way. Requirements rarely lie on the surface. Normally, they’re buried deep beneath layers of assumptions, misconceptions, and politics.”
The advice the authors then give is simple:
“Don’t gather requirements — dig for them.”
Wise words, indeed. But what does this advice mean exactly in the context of Data Science projects? There are several crucial aspects to consider, and I will cover some of them in this article. To make things a bit more concrete, let us assume that a Data Science team is tasked with building a recommender system for products sold by an online shop.
Study your stakeholders
Defining the goals of a Data Science project is objectively hard because of the number of stakeholders involved and the different goals they pursue (Hulten 2018). A marketing team may want to have a recommender system to keep customers engaged and cross-sell as many products to them as possible. However, a user experience team may want to use it to make customer journeys on the website as smooth as possible. Finally, Data Scientists mainly care about the predictive accuracy of the Machine Learning model powering their recommender system.
Although related, these goals differ in terms of their measures of success. Marketers will want to see as many conversions as possible. UX experts will care about the time it takes to complete a purchase. Data Scientist will spend days trying to beef up that second digit in the nDCG metric. Oh, and the customers? They will never come back to the website again if they cannot find what they need fast enough or have troubles placing an order.
How does one make sure that all participants of a project are heard? There is only one way, really: get yourself out there and talk to the people you as a Data Scientist work with to understand “what keeps them awake at night”. When done in a structured and empathetic manner, this will help with building up a story around the project that all participants get behind. And do not hesitate to spend as much time for this as you need — you will thank your future self.
Maintain a project glossary
As Data Scientists, we get to work on various problems and often even in different industries. Personally, I think this is the best thing about Data Science that makes it so interesting and attractive. However, this also implies that with every new project one has to learn a lot of new concepts and terms. Maintaining a project-specific glossary of terms can help with better understanding the problem at hand and making the communication with stakeholders much smoother.
Capture requirements with “user stories”
Software developers use several techniques to capture requirements, and I believe these techniques are directly applicable to Data Science projects as well. In my consulting work, I found “user stories” to be particularly useful.
A user story is an informal way of describing a feature of an application following a simple pre-defined template, e.g.:
"As a <role> I want to <capability>, so that <receive benefit>"
User stories are great for the following reasons:
- they can be written in plain English (on post-it notes or using programs like Jira) by any project participant, expressing her domain-specific needs;
- they are high-level descriptions that allow project participants to focus on discussing the desired functionality rather than the implementation details;
- they are well-suited for project planning as they can be given an estimate of how time- and resource-consuming they will be to develop.
In the context of our running example, user stories might look like this:
"As a marketer, I want our website visitors to see products that they are likely to buy, so that I can increase our overall cross-sell rate.""As a UX specialist, I want our website visitors to quickly find the products they are interested in, so that they spend minimal time to complete a purchase."
Having user stories written by all project participants helps with defining the goals and success metrics, understanding the scope of work, and prioritising individual deliverables.
Keep the requirements documented
Once you are done with defining project goals and requirements, get them documented. This can be done in many different ways (I like the “project poster” format developed by Atlassian). Irrespective of the format that makes sense for your organisation, try to avoid unnecessary details. Requirements are not describing the design or architecture of a system, they only capture what needs to be accomplished. Never waste your time writing detailed project charters because 1) they become obsolete the moment you save the file and 2) nobody will ever read them anyway due to their bloated size.
Having goals and requirements documented is important not only for kick-starting a project. It is also a mechanism to protect the project against continuous changes in requirements that, if undocumented, result in scope creep. Although it is natural for goals to evolve, having an easy to grasp requirements document will help all parties to stay informed, preventing the project from getting out of control.
Conclusions
Data Science projects are somewhat unique in that they involve many stakeholders, who have their own agendas and definitions of success. This calls for an extra effort from Data Scientists to define and properly document project goals and requirements. Luckily, Data Scientists can borrow project management techniques from software developers, who often operate under similarly complex conditions.