If there’s one thing I’ve learned from the data science mentorship I work at, it’s this: getting feedback on your data science job application or interview is virtually impossible.
There are good reasons that companies are cagey about giving feedback. For one, every piece of feedback a company gives to a rejected applicant is a potential lawsuit. Plus, there’s the fact that many people don’t respond well to negative feedback, and some get downright combative.
And just imagine the time it would take for a recruiter to send a thoughtful feedback email to you—and to the dozens (or hundreds) of other applicants they also have to consider. And there’s the fact that, at the end of the day, they get absolutely nothing out of issuing any kind of feedback, no matter how helpful or obvious it may be.
The tragic end result of all this is a huge number of confused, directionless aspiring data scientists. But here’s some good news: there aren’t actually that many reasons why applicants get turned down from data science roles, and there’s a lot you can do to cover those bases.
And those reasons — the technical and nontechnical skills that most applicants don’t have but that companies most badly want — are what this post is all about.
Reason 1: Python-for-data-science skills
The vast majority of data science roles are Python-based, so that’s what I’ll focus on here. A few tools distinguish novices from job-ready pros when it comes to Python for DS. They’re great differentiators if you want to build outstanding projects that get noticed by employers.
To force yourself to improve your data science theory and implementation game, use these in a few projects, if you haven’t already:
- Data exploration. You should have
pandas
functions like.corr()
,scatter_matrix()
,.hist()
and.bar()
on the tip of your tongue. You should always be looking for opportunities to visualize your data using PCA or t-SNE, usingsklearn
‘sPCA
andTSNE
functions. - Feature selection. 90% of the time, your dataset will have way more features than you need (which leads to excessive training time, and a heightened risk of overfitting). Get familiar with basic filter methods (look up scikit-learn’s
VarianceThreshold
andSelectKBest
functions), and more sophisticated model-based feature selection methods (look upSelectFromModel
). - Hyperparameter search for model optimization. You definitely should know what
GridSearchCV
does and how it works. Likewise forRandomSearchCV
. To really stand out, try experimenting withskopt
‘sBayesSearchCV
to learn how you can apply bayesian optimization to your hyperparameter search. - Pipelines. Use
sklearn
‘spipeline
library to wrap their preprocessing, feature selection and modeling steps together. Discomfort withpipeline
is a huge tell that a data scientist needs to get more familiar with their modeling toolkit.
Reason 2: probability and statistics knowledge
Probability and statistics don’t always come up explicitly during on the job, but they’re foundational to all data science work. As a result, it’s easy to bomb an interview if you haven’t read up on:
- Bayes’s theorem. It’s a foundational pillar of probability theory, and it comes up all the time in interviews. You should practice doing some basic Bayes theorem whiteboarding problems, and read the first chapter of this famous book to get a rock-solid understanding of the origin and meaning of the rule (bonus: it’s actually a fun read!).
- Basic probability. You should be able to answer questions like these.
- Model evaluation. In classification problems, for example, most n00bs default to using model accuracy as their metric, which is usually a terrible choice. Get comfortable with
sklearn
‘sprecision_score
,recall_score
,f1_score
, androc_auc_score
functions, and the theory behind them. For regression tasks, understanding why you would usemean_squared_error
rather thanmean_absolute_error
(and vice-versa) is also crucial. It’s really worth taking the time to check out all the model evaluation metrics listed insklearn
‘s official documentation.
Reason 3: software engineering know-how
Increasingly, data scientists are required to take on software engineering work. Many employers insist that applicants understand how to manage their code and keep clean notebooks and scripts. In particular:
- Version control. You should know how to use
git
, and interact with your remote GitHub repos using the command line. If you don’t, I suggest starting with this tutorial. - Web development. Some companies like their data scientists to be comfortable accessing data that’s stored on their web app, or via an API. Getting comfortable with the basics of web development is important, and the best way to do that is to learn a bit of Flask.
- Web scraping. Sort of related to web development: sometimes, you’ll need to automate data collection by scraping data from live websites. Two great tools to consider for this are
BeautifulSoup
andscrapy
. - Clean code. Learn how to use docstrings. Don’t overuse inline comments. Break your functions up into smaller functions. Way smaller. There shouldn’t be functions in your code longer than 10 lines of code. Give your functions good, descriptive names (
function_1
is not a good name). Follow pythonic convention and name your variables with underscoreslike_this
and notLikeThis
orlikeThis
. Don’t write python modules (.py
files) with more than 400 lines of code. Each module should have a clear purpose (e.g.data_processing.py
,predict.py
). Learn what anif name == '__main__':
code block does and why it’s important. Use list comprehension. Don’t over-usefor
loops. Add aREADME
file to your project.
Reason 4: business instinct
An alarming number of people seem to think that getting hired is about showing that you’re the most technically competent applicant to a role. It’s not. In reality, companies want to hire people who can help them make more money, faster.
In general that means moving beyond just technical ability, and building a number of additional skills:
- Making something people want. When most people are in “data science learning mode”, they follow a very predictable series of steps: import data, explore data, clean data, visualize data, model data, evaluate model. And that’s fine when you’re focused on learning a new library or technique, but going on autopilot is a really bad habit in a business environment, where everything you do costs the company time (money). You’ll want to get good at thinking like a business, and making good guesses as to how you can best leverage your time to make meaningful contributions to your team and company. A great way to do this is to decide on some questions that you want your data science projects to answer before you begin them (so that you don’t get carried away with irrelevant tasks that form part of the otherwise “standard” DS workflow). Make these questions as practical as possible, and after you’ve completed your project, reflect on how well you were able to answer them.
- Asking the right questions. Companies want to hire people who are able to keep the big picture in mind while they tune their models, and ask themselves questions like, “am I building this because it’s going to be legitimately helpful to my team and company, or because it’s a cool use case for an algorithm I really like?” and “what key business metric am I trying to optimize, and is there a better way to do that?”.
- Explaining your results. Management needs you to tell them what products are selling well, or which users are leaving for a competitor and why, but they have no idea (and don’t care about) what a precision/recall curve is, or how hard it was for you to avoid overfitting your model. For that reason, a key skill is the ability to convey your results and their implications to nontechnical audiences. Try building a project and explaining it to a friend who hasn’t taken math since high school (hint: your explanation shouldn’t involve any algorithm names, or refer to hyperparameter tuning. Simple words are better words.).
Of course no list of like this one can be exhaustive, but from what I’ve seen coaching hundreds of early career data scientists through the job application and interview process (and from talking to our hiring partners themselves), it probably accounts for 70% of the rejections people get.
Keep in mind that other, less well-defined things like personality fit can often be a factor, too. If you didn’t get along with your interviewer, or if the conversation felt strained or awkward, it’s always possible that your technical qualifications are solid, but that you didn’t hit check the culture fit box. Companies regularly turn down applicants who would have been amazing technical performers for exactly this reason, so don’t take a rejection or two too much to heart!