Four Most Important Success Factors in any Machine Learning Project

If you are a product manager and want to build something with machine learning, here’s a list of the 4 most important things to keep in mind:

1. Prioritise engineering over data science

A machine learning project is first and foremost a software project. Many data scientists have little experience building well architected, reliable and easy to deploy software. When you build a production system, this will become a problem.

As a rule of thumb, engineers can pick up data science skills faster than data scientists can pick up engineering experience. If in doubt, work with the python engineer with 5+ years experience and a passion for AI, rather than the PhD in data science who is having their first go at building business applications.

2. Go lean

It’s important to reduce risks early. Structure your project with concrete milestones:

Finished Prototype: Find out whether your idea is promising 1 day — 2 weeks
Offline tested system: Tune the model and rigorously test it on existing data 2–4 weeks
Online tested system: Finalise the model and test it live 2–4 weeks
Going live: Automate data updates, model training and code deployment:2–4 weeks
Continuous improvement: (optional) 12 months

Total timeline: 1–3 months

An experienced team should be able to follow these timelines for almost any project. Focus the team on setting up a live system in 1–3 months. After it’s live, then decide whether further improvements are worth it.

These temptations can prolong your project unnecessarily:

Waiting for the perfect data
Using the wrong tools (too complex or too slow)
Overengineering for scalability
Endlessly playing with the algorithms (see next point)

3. The algorithm doesn’t matter

Machine learning systems have lots of fascinating knobs you can play with. Don’t.

The improvements that are worth spending time on (in order of importance):

Get more (relevant) input data
Preprocess the data in a better way
Choose the right algorithm and tune it correctly.

The algorithm is the least important factor. Simply choose an algorithm that works. Endlessly upgrading the algorithm is tempting, but it will probably not give you the results you expect.

4. Communicate, communicate, communicate

Share as much of the business context as possible:

Once the engineering team starts building, they have to make a lot of choices. The better they know your priorities, the more they can make the right decisions. You should at least tell them about:

Strategic priorities

Is this fixing a critical issue? Will it need to work for millions of requests a day? Or is it research for a future product?

Problems with the current process

Does the current process take too long? Is it too inaccurate? Or is there a lot of data that simply can’t be taken into account without machine learning?

Inputs and outputs

Inputs: What data would you (as a human) use to make the right decisions?

Outputs: Who will consume the output? How frequently? Does it need to be real time?

Performance metrics

What are the most important metrics: Click through rate? Sales? ROI? False positive rate?

Expected accuracy

If you want to optimise conversion rates, then it might not be worth another 2 weeks of tuning to get 2% more accuracy.

If you build medical diagnostic systems, then false negatives of even 1% can be unacceptable.

TL;DR

Prioritize engineering over data science.
Reduce risks by going lean.
Don’t get distracted by the algorithm.
Share all business requirements with your developers.