The cold start problem: how to break into machine learning

It’s hard to get hired into your first machine learning role. That means I get asked lots of questions that look like: right now, I do X. I want to become a machine learning engineer. How do I do it?

Depending on what X is, that ranges from something that’s pretty hard to do (physics, software engineering) up something that’s extremely hard to do (UI design, marketing).

So to help everyone at the same time, I’ve put together a progression that you can follow from any starting point to actually become a machine learning engineer. Every ML engineer I’ve met (who didn’t go to grad school at SAIL or somewhere similar) went through some form of this.

Before you start learning ML, there’s a set of basics you need first.

1. Learn calculus

The first thing you need is multivariable calculus (up to second-year undergrad).

Where to learn it: Khan Academy’s differential calculus course is pretty good. Make sure to do the practice problems. Otherwise you’ll just nod along with the course and won’t learn anything.

2. Learn linear algebra

The second thing you need is linear algebra (up to first-year undergrad).

Where to learn it: Rachel Thomas’s mini-course on computational linear algebra is targeted at people who want to learn ML.

NOTE: I’ve heard convincing arguments that you can skip calculus and linear algebra. A few people I know have jumped right into ML and learned most of what they needed by trial and error and intuition, and they turned out okay. Your mileage will vary, but whatever you do, don’t skip this next step:

3. Learn to code

The last thing you need is programming experience in Python. You can do ML in other languages, but these days Python is the gold standard.

Where to learn it: Follow the advice in the top answer of this Reddit thread. You should also pay close attention to the numpy and scipy packages. Those come up a lot.

There’s more to say about good programming practice than I have room for here. In one sentence: make your code legible and modular, with good tests and error handling.

Pro tip: If you’re learning to code from scratch, don’t bother memorizing every command. Just learn how to look up questions online fast. And yes, this is what the pros do.

Also: learn the basics of git. It pays off fast.

Learn machine learning

Now you get to learn machine learning itself. In 2018, one of the best places to do that is Jeremy Howard’s fast.ai course, which teaches ML at the state of the art with an approachable curriculum. Go through at least Course 1, and ideally Course 2, do all the exercises, and you’ll be ahead of most industry practitioners on model-building (really).

Most of the progress in machine learning over the past 6 years has been in deep learning, but there’s much more to the field. There are also decision trees, support vector machines, linear regression, and a bunch of other techniques. You’ll run into these as you progress, but you can probably learn them as they come up. A great centralized place to learn and use them is Python’s scikit-learn package.

Build personal projects

Everyone who applies to their first ML position has done personal projects in machine learning and data science, so you should too. But it’s important to do it well, and I’ll cover exactly how in a future post. For now, the only thing I’ll say is: the most common mistake I see when people showcase personal projects is that they apply well known algorithms to well known datasets.

This is a mistake because (1) machine learning hiring managers already know all the well known datasets, and (2) they also know that if you showcase a project where you apply a well known algorithm to a well known dataset, you might not know how to do much of anything else.

Some things are hard to learn by yourself

The truth is that a lot of the things that make you stand out from the crowd are hard to learn by yourself. In machine learning, the three biggest ones are (1) data prep, (2) ML devops, and (3) professional networking.

Data prep is the hacks you use when you work with realistic data. That means dealing with outliers and missing values. But it also means collecting data yourself when there isn’t already a dataset for the problem you want to solve. In real life, you’ll spend 80% of your time cleaning and collecting data. Model-building is an afterthought in the real world, and engineering managers know that.

ML devops is what you do to run your model on the cloud. It costs money to rent compute time, so people sometimes don’t do this in their personal projects. But if you can afford it, it’s worth it to get familiar with the basics. Start with Paperspace or Floyd for an intro to running ML on the cloud.

Honest engineers often ignore networking, because they think they should get hired on their skills alone. The real world doesn’t work like that, even though it should. So talk to people. I’ll write more about this part in a future post.

Ask for help

Some steps are hard to take on your own. Schools aren’t good at teaching data prep, ML devops, or networking. Most people learn those things on the job, or from a mentor if they’re lucky. Many people never learn them at all.

But how do you bridge that gap in the general case? How do you get a job without experience when you need a job to get experience?