Back when I was studying physics, I’d often need to look up fancy-sounding technical terminology on Google. And when I did, more often than not, the first hit would be a Wikipedia article.
But as great as Wikipedia entries are, it never quite seemed as though they were written at my level. They almost always went way over my head or seemed far too simple for my use case. This happened so consistently that I’m tempted to call it a Law of the Internet: “No technical Wikipedia entry can be simultaneously comprehensible and informative”.
I think a lot of data science career advice (or job search/interview preparation advice) follows a similar law: there are posts aimed at complete beginners, posts aimed at veteran software engineers, and posts designed to help junior data scientists hone their skills. All of this noise makes it difficult for many aspiring data scientists to know where to invest their time as they look to transition into the field.
This is one of the main things I focus on when I work with mentees on SharpestMinds. And although there’s no one-size-fits-all solution for everyone, I’ve found that I consistently give the same advice to about 3 different categories of people.
Category 1: complete beginners
If you’re just breaking into data science, keep this in mind: the field is evolving very fast, so any advice that I give here will almost certainly be out of date by the time you’re job-ready. What got people hired back in 2017 doesn’t work today, and the disparity between data science hiring standards today and those that will apply one or two years from now will probably be even bigger.
With that out of the way, here are some pieces of advice if you’re looking to break into data science today, and you don’t already have a coding/STEM background:
- Before anything else, keep an open mind. If you’re a complete beginner, then by definition, you don’t actually know what data science is, so it’s entirely possible that it isn’t the job you want after all. Reach out to some data scientists on LinkedIn, and offer to buy them a coffee & chat. Follow a data science podcast. Becoming a data scientist involves a major commitment of time and effort, so diving in head-first just because you think self-driving cars are cool is *not* a good reason to take the plunge. Make sure you understand the less glorious aspects of data science, like data wrangling, and building data pipelines, which account for the majority of a data scientist’s day-to-day.
- If you decide to move forward, that’s great! The first thing you’ll need to do is to learn Python. Take a MOOC, and as soon as possible, and build a basic project. When you’re comfortable with your Python skills, learn how to work with Jupyter notebooks, and take a few data science MOOCs. If you’re looking for more specific instructions, this blog post lays out a great learning path.
- Targeting a full-on data science position isn’t necessarily the best idea if you’re truly starting from scratch. Instead, aim for lower-hanging fruit: data visualization, or data analytics roles are in high demand, and are more accessible ways to break into the market. They often involve working alongside data scientists, and open up the possibility of a lateral move in that direction once you’ve picked up some experience.
How to brand yourself: if you get to the point where you’re ready to apply for jobs, you might be surprised to learn that building a personal brand is unusually important in data science. And you might worry that because you don’t have any professional experience, or a graduate degree in CS, branding might be a problem. But that can actually be your biggest brand advantage: you’re the self-made, self-taught developer/data scientist who companies can count on to learn fast and work hard™. The catch is that the burden is on you to live up to that image: it’s a steep hill to climb, but the reward can definitely be worth it.
Category 2: software engineers
Probably 20% of the aspiring data scientists I run into are software engineers. On one hand, having experience deploying code to production, and working with teams of developers can be a great asset. On the other, demand for fullstack developers is so high that companies sometimes end up nudging software engineers in that direction, even if the role they were hired for involved “data science” on paper. So you’ll want to avoid being pigeon-holed as a software engineer rather than a data scientist.
Some other thoughts:
- If you haven’t, consider first migrating your current position into a more backend/database-focused direction. Getting more familiar with data pipelines is a good start, and can help you to build your core data manipulation skillset. It also allows you to rebrand, and frame yourself as a an experienced data wrangler.
- Machine learning engineering is probably the closest adjacent data science-related role, which makes it an easier job to transition into. Target roles that emphasize deploying models, or integrating them into existing apps, since these will most effectively leverage your existing skillset. You can always double down on model development later, but this is a great way to get your foot through the door.
- You’re most likely going to have to build machine learning or data science projects to impress employers. Leverage your software engineering skills by integrating these into apps that you can show off to recruiters and technical leads. This can be particularly effective because it leaves nothing to the imagination, and emphasizes your potential as a full-stack data scientist.
- Something to keep in mind: you will almost certainly take a pay cut in your transition. Even senior software engineers generally have to transition to junior roles when they pivot to data science, but a surprising number of them don’t factor that into their decision off the bat, and are disappointed when the offers start coming in.
How to brand yourself: one of the easiest ways to brand yourself is by leveraging your experience in software development. You already know how to write clean, well-documented code, and how to collaborate with others, and that’s a strength that isn’t shared by most applicants to junior-level positions. But to effectively lean into your “clean production code” brand, you’ll have to understand the analogous best practices in data science too, so be sure to tick that box if you can.
Category 3: new CS, math or physics grads
If you’re a new undergraduate, Master’s or PhD STEM grad, you probably have a good foundation in statistics and math. But you’ve probably never applied for a job in tech, and you’re not sure how to prepare for interviews. Also, assuming you’ve been programming during your degree, you most likely can’t write clean, well-organized code.
A few things to keep in mind:
- No, the R you learned during your degree won’t be enough. And no, if you’re a physicist and you’re betting on your MATLAB or Mathematica skills to get you a job in industry, those won’t cut it either. Just learn Python.
- Things you probably don’t know that you need to learn as soon as possible: collaborative version control (learn how to work with other people with GitHub), containerization (learn how to use Docker), and devops (learn how to deploy models on the cloud with AWS or some similar service). SQL is also a must.
- Learn test-driven development in Python. Learn how to use docstrings. Learn how to modularize your code. If you haven’t already, learn how to work with Jupyter notebooks.
- If you’re in a particularly math-oriented field, deep learning *may* be a good direction to explore. But you might find it easier nonetheless to start with a more conventional “scikit-learn”-type data science role and migrate to deep learning later. The most important thing is for you to get into industry, and start working on production code as soon as possible.
How to brand yourself: especially if you’re a math or physics major, your best strategy is to cast yourself as someone with deep theoretical knowledge. To do that, you need to be able to confidently explain how various models work, and ideally to be familiar with the latest “hot” results in the literature (especially true if you’re aiming for a deep learning role).
****
Caveat: the advice I’ve provided isn’t going to map perfectly onto every situation. Some software engineers have further to go than others, and some total beginners have a knack for math and might be best suited to becoming deep learning researchers. But it should provide a good starting point for biasing the direction of your skills development.
At the end of the day, whether you’re a software engineer, a recent grad, or a complete beginner, a key question to ask yourself is what career trajectories are closest to you in parameter space. If a stint as a data analyst, or a data visualization specialist is necessary to get your foot in the door, that can often be the best way to launch you on the right long-term trajectory.