Ready to learn Data Science? Browse Data Science Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
"How do I become a data scientist?" Is one of the most common questions asked on the internet. It is a fair question for those that are deciding to pivot that direction because they want to eliminate the learning waste that traditional educations are full of. I understand that, I studied chemical engineering. There is so much content my hindsight would replace: O-chem=nope, P-chem=nope, thermo=nope, industrial design=nope, etc… Most of the successful data scientists today have a twisted path to where they are now because they didn't have any mentors to direct them, there were no "data scientists" before them.
I have written in detail what I think it takes to become a data scientist (math/stats/programming, etc..) here. Something I think might be more valuable for some are calling out the reasons why I expect you to never be a data scientist. That's right, I'm betting against you and I have great odds BTW. Prove me wrong.
1: Passion
The number one reason I think you will never be a data scientist is not your lack of Theano, Sklearn, Spark, Julia, Python, SQL or other technical competency, it is your lack of passion. Your toe is in the water, you scan social feeds for data science content, you go to meet-ups regularly, and maybe you've even done a Kaggle competition or two and placed in the bottom 90%. For people that have no problems getting multiple job offers for data science they have run past you and are diving in head first. The ships have been burned and they are not looking back. Relative to you their drive may even seem reckless.
They don't care. They don't care about what has been done, they don't care that you have your toe in the water. They don't care about this blog post. They only care about perfecting their craft.
Honestly, passion fixes everything. People who are passionate have great breadth. If I rattle off a list of 6 important emerging data science libraries they have not only heard of them but they have REAL experience with them having used them in several all-night hackathons.
How to get passion?
This is hard if you don't already have it. I would suggest finding a project that you think is amazing. Something that makes you giddy. A few weeks ago I took a second pass on 3D MRI convolutional nets for fun:
Image processing is a great place to start because of the eye candy, you get feedback that motivates you. Another place is algorithmic trading, it is still a never ending gold rush of excitement for many. Start scraping bitcoin and build a custom deep learner searching for alpha. In the end you will fail, but it will be fun and you will be better off for trying.
Do I have enough passion?
"I love data science, I go to all of the meet-ups, I'm part of all of the LinkedIn groups, and I read ALL of the content I can." Sure everyone is excited, but do you really love the work? A quick litmus test might be asking yourself these three questions. If you are already a data scientist I'm not saying you aren't one if you don't have these, I'm just saying these are three simple ideas for those that are wondering how to get there. I'm also not endorsing Kaggle.com as a necessary pathway, it is just a succinct and straightforward option for many.
A. Have you placed in the top 10% for a Kaggle competition?
If the answer is no I would say it was not because you weren't smart enough it was because you lacked the passion for the time drain. The passion would have lead you to read all of the previous solutions to competitions in the past and you would have realized if you had run Xgboost on k-folds with genetic tuning and feature creation tricks you might have had a chance. For the image classifications challenges you might have realized by reading online that if you had done batch scaling, rotating, fuzzing, flipping in addition to very deep nets with boosting you would have done well. Don't forget the rectifier tricks [leaky, shifted, PReLU,….] that all of the passionate people know about too.
B. Have you stayed up too late to hack on a project?
If you haven't stayed up passed 4am or woken up at 3am to hack on a project you care about before work you are missing that spark. I'm not saying you have to do this weekly, that will kill you, but to say you have never had something that important with data that has caused you to do that is saying something. A few weeks ago I became obsessed with genetic deep nets and stayed up until 5am building one that converged. Find that project that pulls you out of your 9-5 work window.
C. What are you doing this weekend?
Many of us use the weekend to recharge emotionally. We go biking, climbing, boating, fishing, camping, running, veg, etc… Have you ever burned entire weekends Friday-Sunday data hacking? Again I'm not saying you need to do this weekly, but to say you have never done it is saying something. The other person falling into the pool built an amazing data app while you were watching season 5 of the walking dead.
2. Breadth
The biggest flaw I see in most people attempting to break into data science is their lack of breadth. They are not familiar with state-of-the-art clustering, or dimension reduction algorithms. Their experience with classification and regression problems is also limited. Their NLP knowledge is very basic, they are familiar with bayesian n-grams but have never heard of LSTM. Nobody wants to babysit anybody, they want to hire a data scientist and say this is your project #1, your project #2, your project #n and have you run and succeed again and again unsupervised. If you are lacking breadth your employer will have major concerns over your ability to deliver. How do I fix breadth? Passion.
3. Learning Courses
LOL, just kidding. Couldn't resist. So many people think if they do a few machine learning courses, they will suddenly have a great foundation. It doesn't hurt, but honestly many don't need it. If you read every single winning Kaggle code solution and really understand their methods used in addition to every sklearn/numpy/theano/pandas function and understand their use case you would be better off than 99.999% of courses takers. There are some AMAZING data science YouTube tutorials out there. In 30 minutes you could know more about gradient boosting, deep learning, or clustering than you do right now. If you start watching a YouTube tutorial and realize in a few minutes it is boring stop it and find another one. The majority of people teaching data science on YouTube suck at teaching so make sure you don't waste time with them and find the videos like the ones I posted above. There is something to be said about having a crappy foundation, that is your uphill battle. Going to meetups and finding mentors can help you realize where your gaps are. How do you really fix your foundation gaps? Passion.