vs Data Analyst vs Research Scientist vs Applied Scientist vs…
Introduction
Man, this topic has been in the back of my mind for a long time. But because there are so many things to potentially cover, I couldn’t get myself to finish this daunting task. But, stuck in my room due to the shelter-in-place order and running out of things to waste time with, I finally decided to finish it.
As its popularity has exploded since 2013, the data science industry has been wildly evolving yet slowly converging into more specific roles. Inevitably, this caused confusions and inconsistent job functions during its growth. For example, there are seemingly many different titles with the exact same roles or same titles with different roles:
Analytics Data Scientist, Machine Learning Data Scientist, Data Science Engineer, Data Analyst/Scientist, Machine Learning Engineer, Applied Scientist, Machine Learning Scientist…
The list goes on. Even for me, recruiters have reached out to me for positions like data scientist, machine learning (ML) specialist, data engineer, and more. Clearly, the industry is confused. One of many reasons for such a high variance is that companies have very different needs and uses of data science. Regardless of the reason, it appears that the field of data science is branchingand merging into these top few categories: Analytics, Software Engineering, Data Engineering, and Research. No matter what the similar titles say, they usually fall into these categories. This specialization is most true in larger tech companies that can afford it.
In this article, we will first look into the overall trend of the data science industry and then compare ML engineer and data scientist in more depth. I do not mean to provide an extensive history but rather narrate what I have seen and experienced while living in Silicon Valley as a data scientist. Even when I wrote my article How to Data Science Without a Degree in 2017, my perspective on data science was very different.
Last year, I covered this topic when I was invited to give a short talk to data science students at Metis Bootcamp. I want to use this opportunity to explain the differences and help you find the role that suits you best. Let’s find out if this industry is still booming or ending with data, because that is what data scientists do, right? (Maybe not). Regardless, I hope you find it useful and informative.
Trend of Data Science Industry
Before we dig deeper, take a look at the following two job descriptions that I found on LinkedIn. Try to guess what title these descriptions are for. I highlighted some key points in red:
Very different, right? Surprisingly, both are for a data scientist position. Left is for Facebook, right is for Etsy. I do not mean that one is better than the other. The main point is seeing how different they are.
Even at work, people have active discussions on trying to figure out what exactly defines a data scientist. I’ve seen people describe data scientists as computer science PhDs or new data analysts. This is because different companies use the term data scientist for very different positions. However, I believe the industry has been learning to be more specific and have more specialized roles, instead of bucketing everything into the broad scope of data science.
Then, what are some different roles that data scientist can imply? Largely, I think they are software engineers, data analysts, data engineers, and applied/research scientists. I have seen my friends with the same data scientist title but their role is one of the four. Check out the diagram that I created below. In the early days of data science, data scientist might have included all of these four roles. However, today positions are becoming more specific and specialized, as seen in the diagram below.
Did Harvard Business Review see it coming?
Is this trend surprising? According to the famous article Data Scientist: The Sexiest Job of the 21st Century, not so much:
Data scientists’ most basic, universal skill is the ability to write code. This may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards.
As the article suggests, you have less reasons to be a good coder today as a data scientist. Before, tools and methods to analyze big and nasty data were not as accessible and user friendly before. This required data scientist to have a relatively strong engineering skill on top of other skills. But tools for ML and data science have developed quickly and are now more accessible than ever before, such that you can access state of the art (SOTA) models with just a few lines of code. This makes the separation of roles into analytics or engineering easier. Now we do not have to focus on learning all of analytics, engineering, and statistics to become a data scientist, which seemed like the case before.
For example, Facebook led this trend in which data analyst jobs have evolved into data scientists. This was a natural process because with an increasing data size and more challenging data problems, more skills and training were needed to perform good analysis. Not only Facebook, but many other companies like Apple, Airbnb have been putting a clearer distinction between analytics/product data scientist vs ML data scientist.
How company size affects the roles
It is worth mentioning that specialization occurs more in larger tech companies. Unlike software engineers, who are needed in tech companies of all sizes, not all of these companies need specialized research scientists or ML engineers. Having a few data scientists might be enough. So in smaller companies, there still are data scientists who might be functioning within all four roles.
As a rule of thumb today, data scientists in big companies (FANG) are often similar to advanced analysts, while data scientists in smaller companies are more similar to ML engineers. Both functions are important and needed. Going forward, I will stick to my new definitions by which data scientist implies an analytics function.
Different Data Scientists and How to Choose Them
In the chart below, I tried to show a similar picture as the above diagram but with a bit more detailed view of the four functions. The descriptions aren’t perfect but you can refer to it.
Job search — Which title to choose and how to prepare?
If you are trying to get into this field, whether as an ML engineer or a data scientist, you might wonder which one you should choose. Let me list out a simplified (and stereotypical) description of the four main ML-related roles to help you clarify. Though I have not personally worked as all of those titles, I have learned insights from friends in each field. I also provided potential interview content in the parenthesis (think of it as four rounds of interviews).
- Data Scientist: Do you want to analyze big data, design experimentation and A/B test, build simple machine learning and statistical models (e.g. using sklearn) to drive business strategy? This role is less structured with more uncertainties and you will be driving the narrative of the project. (Interview: 1 Probably/Statistics, 1 Leetcode, 1 SQL, 1 ML).
- ML Engineer: Do you want to build and deploy up-to-date machine learning models (e.g. Tensorflow, PyTorch) into production? Your focus is not just building models but the software required to run and support your models. You are more of a software engineer (SWE). (Interview: 3 Leetcode, 1 ML).
- Research Scientist: Do you have a PhD in computer science with several ML publications in ICLR? Do you want to push the boundaries of ML research, and get excited when your paper is cited? These are the rare breeds and you already know who you are. Most of these people end up in Google or Facebook. Also, entering into it without a PhD is possible, but unfortunately rare. (Interview: 1 Leetcode, 3 ML/Research).
- Applied Scientist: You are a hybrid of ML engineer and research scientist. You care about the code but also about using and pushing state of the art (SOTA) machine learning models. (Interview: 2 Leetcode, 2 ML).
Obviously, these descriptions aren’t exhaustive. But when talking to my friends and looking at many job descriptions, I found these ideas to be common. If you are unsure about the role to which you are applying, here are a few tips to learn more:
- Read the job description: Title honestly does not matter as much. It might be called the same “data scientist,” but the job description may be vastly different.
- LinkedIn stalk: If you aren’t sure what data scientists are like in Apple, simply look through what kind of backgrounds Apple data scientists have in LinkedIn. Are they mostly CS PhDs? Undergrads? What kind of trainings do they have? This will help you get a better idea.
- Interview: If you think your role was a technical role yet are not interviewed for coding, you probably won’t be getting a technical role. Your interview content reflects the job nature.
ML Engineer vs Data Scientist
Okay, that was long. Now back to our topic. In recent years, I started to hear people say more negative things about the data science job. A few reasons for this is that there are more and more data scientist jobs that no longer seem to have a cool machine learning factor and seem easier to obtain. Perhaps five years ago most job descriptions required at least a Master’s degree to get a data scientist job, but that is no longer the case. Whatever the reason why people think the data science (of old days’ at least) is over, let’s look at some data.
The below data and chart are from a world-renowned salary database engine, Salary Ninja. It searches over the H1-B database based on foreign workers in the United States. You will see the average salary and number of job positions that have either “Data Scientist” or “Machine Learning Engineer” in the job title between 2014 and 2019.
Are you surprised by the result? Even though the average salary is similar for both titles, you can see that the average decreased for data scientists in 2015 and 2016. Perhaps that is what people mean by good days are over for data scientists. In terms of sheer quantity, data science is much bigger than ML engineering, but you can see that ML engineers are growing faster and have higher salaries.
For your amusement, I included a summary statistics that I gathered from Salary Ninja of the few roles we have discussed in this article. I did an overall summary of the past six years (first table) and its subset with the most recent year in 2019 (second table). Lastly, I included a table for just one company, Microsoft (third table).
I learned a few interesting insights:
- Overall, there are more data analysts than data scientists, but that flips in 2019! Could this be a sign that data analysts are being rebranded as data scientists?
- ML engineers have slightly higher pay than data scientists, but there are far fewer ML engineers in the field. This is because ML engineers’ official title is often just software engineers.
- The average for research scientists was surprisingly low. I found out that this is because the database can include many other types of research scientists and not just those in tech ML research. That is why I included a table just for one tech company to reduce this noise. As anticipated, researchers took the throne for highest pay at Microsoft.
- I was surprised by the $1.3m base salary for the data engineer. That is crazy! Maybe you should consider that career.
- Keep in mind that this dataset only includes base salary, and stocks usually play a huge role in the tech world. Also, it does not paint a full picture of the job market. However, given how many foreign workers we have in the tech sector, this should still be a good proxy.
According to this data, I cannot say that the data science industry is a bust. It is still growing but possibly with more focus in analytics. From what I have observed, it seems to be true that there are more data science jobs that require fewer prerequisites, but that is not a bad thing.
Conclusion
I talked about a lot of things but I hope you stayed with me. I wrote this article because I myself was confused about all the changes that were going on in the industry. Also, it seemed like people have so many different opinions about what data science is. Regardless of who is right or wrong, I hope you can see the trend and decide for yourself.
In the end, do not choose a job or industry because it has the higher average salary or because of the buzz words. It does not matter if your title is data scientist or ML engineer or data analyst. It does not matter if someone says data scientist is an engineer or an analyst because both can be true.
Though it is easy to compare job titles based on pay, it is far more important to choose a role you enjoy and are good at. Focus on the actual work you do and make sure it suits you. Just because the average pay may be lower, it doesn’t have to mean that you will actually get paid less. As you saw earlier, all of the roles I discussed have a very high maximum pay.
Before I conclude, there are a few other resources that you can refer to for more information:
- Airbnb’s One Data Science Job Doesn’t Fit Allarticle: I think Airbnb does one of the best jobs in organizing the data science job family, and this article explains it in detail. Instead of having one vague data scientist title, they have three tracks of Analytics, Algorithms, and Inference.
- What REALLY is Data Science? Told by a Data Scientist by Joma on YouTube: He does a good job of explaining different kinds of data scientists by company size. You will also have a better understanding of what Analytics Data Scientists do in big tech companies.
Thank you again for reading. My wish is that this article has given you some insights so that you won’t be lost while looking into the world of data science and machine learning. As always, comment below if you have any questions. I wish you the best during this difficult time, and I hope you find this article useful. Until next time.