Machine learning is an extremely versatile tool. Some applications are very public, like Spotify’s Discover Weekly, Netflix’s Movie Recommender, or Google Translate. Many more are hidden, built behind the scenes by teams that used AI to solve some unique challenge for their business.
If your business has a large amount of data and you are asking yourself, “How can I use AI to build something smart from our data?” — keep reading. We’ve helped many businesses answer this question and discover the best use cases for them. This is our process:
1. Set up a meeting with the right people
2. Introduce everyone to machine learning
3. Absorb everything, assume little
4. Make a list of processes ripe for machine learning
5. Check feasibility
6. Prioritize
7. Research
8. Make a decision
1. Set up a meeting with the right people
Whatever use cases you want to discover have to be rooted in strategic priorities and available data. Neither management nor engineering can give you all the answers. You need a cross-disciplinary meeting. Have a meeting with a product visionary (CEO, VP Product) and someone who knows every dataset (CTO, Head of Data Engineering). You should plan at least half a day for this.
You need a cross-disciplinary meeting.
2. Introduce everyone to machine learning
Machine learning is a tool like any other: the more you understand it, the better you can put it to use. If people think it’s a magical black box, then they won’t be able to pitch in and help you with your search.
So lift the veil on machine learning. Make it practical, leave out the math, and cover these three basics:
- What is machine learning?
- When can you use it?
- What are the common misconceptions?
Find out more: The 3 Basics of Machine Learning
Once everyone has an understanding of what machine learning is, it’s time for you to learn from them.
3. Absorb everything, assume little
Every firm is unique. Even within narrow verticals, the overlap in what two different companies need is smaller than you might think. Don’t try to fit your business into a box.
Map the terrain.
Goals. What goals are driving the company right now? What are the challenges behind these goals?
History. What projects have been implemented in the past? What were the results, the challenges, and the lessons?
Data. What data exists? Where is it generated, and where is it saved? How much consistent history is there in each database? How exactly does each table look? Can the different datasets be merged on unique identifiers?
Infrastructure. What is the preferred infrastructure? Are there relevant policies or restrictions on which provider to use (on-premises, AWS, or Google Cloud)?
Data Science Strategy. What is your data science vision? Do you want to build up your own expert team, or do you want to find an experienced team to build you a solution? Or a combination of both?
After you know what’s driving your business at the moment, you can get more concrete. Now it’s time to collect all the potential use cases.
4. Make a list of processes ripe for machine learning
Where is a lot of data is being used to automate decision making?
Machine learning is just a tool to automate pattern discovery and then make smart predictions based on those patterns. Most of the time it’s about improving an existing process by making it a little bit smarter. Processes that are good candidates for machine learning are usually:
Data based: Decision making is already entirely based on data.
Large scale: Decision making happens over and over again, thousands or millions of times.
Automated: The process already uses software to some degree.
Already automated, large-scale, data-based decision-making processes are the perfect potential candidates for machine learning systems.
For example:
- Product recommendations
- Credit scoring
- Personalized marketing
- Fraud detection
- Image recognition
So think about where a lot of data is being used to automate decision making, and whether there is room for improvement.
The next best thing is to use machine learning to support a process currently implemented by human decision-makers. If it’s entirely data based, highly repetitive, tedious, and therefore slow, it might be ripe for improvement. Can you make it faster by training a machine to make some of these decisions?
5. Check feasibility
For each case, find out whether the necessary data is being captured. Specifically, check whether the different datasets relevant to the problem can be merged.
The more machine learning projects you’ve already implemented, the better you can pinpoint the right questions to ask this time. Build on your previous experience:
- What are the common pitfalls in projects like this one?
- Which datasets are the most important to have, and which ones are optional?
- What is a reasonable level of improvement to expect in this situation?
If you haven’t implemented a similar use case before, talk to a team that has.
6. Prioritize
Prune ideas early if they aren’t helping achieve the company’s most important priorities. Refocus the discussion: “This is more of a nice-to-have, so let’s leave it for now.”
Ask critical questions: If we did manage to automate and improve the accuracy or speed of this process by 20%, what would that mean in revenue per year?
To compare the cases you have left, make an Excel sheet with the following columns:
Data availability — How easy is it to access the right data for this application? If you don’t have the data yet, give it a very low rating.
Potential gain — How big is the potential impact on a critical business priority if this goes very well?
Risk — Are there many unknown factors that could derail the project? What does your experience tell you?
Time to implement — It’s always good to present a successful case early. Then, with one solid success behind you, you can move on to the more complex projects.
7. Research
Once you’ve identified your top 1–3 cases, do a broad search through Google:
- Who has implemented similar systems before?
- What approaches did they try? Which ones did they settle on and why?
- What were the learnings and the final results?
In machine learning, you can patch together some of that information from published academic research. Don’t copy their approach, though. It’s likely that it was just good for their specific dataset. Get inspired, and steal the best ideas. Use them to guide your further investigations.
Another good source is Kaggle competitions. If you find competition for a similar use case, look at the kernels and the forum discussions. They will give very specific input on how to implement and tune an algorithm, and in most cases, you can even find full sample code.
8. Make a decision
Update your rankings with the additional information you learned. Then, based on your research and experience, make a rough project plan for each idea.
Together with your prioritization table and the project plans, present your findings to your team.
If you’ve done this, always keeping the bigger goals in mind, it should now be easy for your team to decide on the best project.
Time to get your hands dirty and implement.