Hard work won't make you a data scientist

Ready to learn Data Science? Browse courses like Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab.

I went to a career fair to hire data scientists a few weeks ago at a local University and left feeling discouraged. There was a lot of energy there, motivated students ready to interview, resumes in hand. You could tell they had worked really hard to get where they were and they were willing to work even harder to get where they wanted to go. They all wanted full-time data science jobs at the best Utah companies. Companies like HireVue, Qualtrics, Health Catalyst, Savvy Sherpa, and Recursion Pharmaceutical.

What makes a great company?

When looking for a company I would encourage the students to find a company that will make them into the person they want to be 5 years from now. At HireVue we have full buy-in from the executive team on the data science vision. This buy-in has allowed us to build a great data science team capable of solving the hardest problems in HR analytics. The problems we work on are diverse, hitting nearly every topic within data science rather than being an expert on just one problem type. Lastly, the most important part of a good job is work-life balance. HireVue gets it, we work to live, not the other way around. Being a Sequoia capital company also adds some magic garnish as well.

The Work Hard Fallacy:

This is interesting to me because hard work has always been an important competency to me. I remember by dad encouraging me to do some very hard jobs when I was young (roofing, construction, arborist, etc..) to help teach me how to work hard. He also used that as a motivator for me to get an education so I could escape these types of jobs in the future.

I used to consider the idea that those who work hard will be successful. After leaving the job fair I felt like I saw a bunch of fish swimming in circles, some faster than others. If you told them to do X or Y you could see them wanting to swim even faster…. in circles.

Common Issues:

Chasing Big Data Stacks:

Some people think they need to know Hadoop, Spark, Pig, Hive, Cassandra, Kafka, etc… so they try and kick the tires on several of these only to find out that the employers they are interested in don’t use these frameworks and don’t value them. Being a full-stack developer who is comfortable setting up cloud services with multiple instances using Docker would be more valuable in my opinion.

Industry Gap:

Despite having studied a formal predictive analytics background there was still a noticeable gap between what they had studied and what industry wanted. Many students were less familiar with deep-learning than they should have been. Another issue here is so many of their courses are intro courses where ideally we want to hire people that know more than we do about these topics. I remember telling a student that I valued extra-curricular projects in data science, to me that demonstrated an intrinsic motivation on the topic. The girl I was talking to pushed back saying she was so overwhelmed just surviving her ridiculous class load, and that she spent her waking hours surviving. She was swimming in circles faster than the rest of them.

Another thing that surprised me was the students were not familiar with any of the local data Meet-ups where industry folks go. So they were swimming in silos, unaware of the direction or tools they needed to adopt to reach their desired destinations.

Steering:

I think any of these students or anyone reading this article wanting to be a data scientist can be one a great one, but you can’t if you stay in a silo. So going to Meet-ups, reading Kaggle forums, reading recommended data science books, following technical thought leaders, can help ensure you are at least heading in the right direction. Finding an industry mentor can also be very helpful.

Silver Bullet:

Everyone wants a silver bullet solution, so today here is mine: (I use python, if you are an R user you can find the equivalent.) The most popular data science libraries in python are Sklearn, Pandas, and Keras. You could even consider Scipy and Skimage. Clone these open source projects and read every line of code, understand all functions, classes, etc.. know these so well that you now know them better than anyone else. Doing this will also teach you core data science functions and methods you may not be familiar with yet. Now that you know these code bases inside and out commit meaningful contributions to all of them. Blog about these commits and why you did them, let the community know. Now on your resume include your commits for these libraries and what you did. Based on the significance and creativity of your contributions, and ultimately their acceptance into the main code base you will stand out from everyone else. You would have the most compelling resume I have ever seen. This recommendation is the best I could think of to get as close to industry as possible and improve what industry is using today. It also seems like the most efficient use of time. You may have some foundation work like reading some data science books or taking courses for you to be successful with this exercise.

Lastly, fall in love with it. Passion for the topic and intrinsic motivation will help you stand out from the school of fish in the market.

Have you noticed people spinning in place? Why is there still a gap between what industry wants and what people are taught?

Hard work won’t make you a data scientist

What passions and motivations are driving your DevOps teams?