The Ultimate Roadmap for Starting Your Data Science Journey
I get asked quite often on my YouTube channel (Data Professor) the following questions about how to break into data science:
- How to become a Data Scientist?
- What is the roadmap to being a Data Scientist?
- What courses should I take to learn Data Science?
So I thought that it would probably be a great idea to write an article about it. And so, here it is. It should be noted that the 10 things that I wish I knew about learning data science is based on my personal journey as a self-taught data scientist. The thing is if I could turn back time and advise my 22 year old self about learning data science, then these are some of the things that I would like to say.
I started my data science journey back in 2004. It was a time when the term data science was in its infancy while the more widely used term was data mining. In was not until 2012 that the term data science started to gain traction and propelled itself to mainstream popularity as made possible by the Harvard Business Review article entitled Data Scientist: The Sexiest Job of the 21st Century by Thomas Davenport and D.J. Patil.
What is Data Science?
In a nutshell, Data Science is a field that essentially makes use of data to solve problems and bring impact, value and insights to companies and organisations. Data science has been applied to a wide range of disciplines and industries spanning education, finance, healthcare, geology, retail, travel and esports. The technical skill sets of data science involves the use of data collection, data pre-processing, exploratory data analysis, data visualisation, statistical analysis, machine learning, programming and software engineering. Aside from the technical side, there are various soft skills that are desirable for a data scientist. A high-level overview of the essential skill sets of a Data Scientist is provided in the following infographic.
1. Your Data Science Journey is Personal
Your Data Science Journey is personal. Don’t compare yourself to others, remember that everyone is unique and that each of us are on different journey. Why would we want to be on someone else’s journey? Focus on your own data science journey. It is okay to be delayed by setbacks but don’t let these obstructions keep you from reaching your goal. It’s better to be late than never.
Embrace imposter syndrome and consider the insecurities as a guiding map that will help you in the grand scheme of things of your data science journey. Particularly, this may lead you to the path of self-improvement. Craft your own list of things to learn and do. Identify data science concepts and skills that you don’t yet know and jot down what you would like to know. Then from this bucket list of data science concepts/skills, focus on learning just 1 new thing a day. Over the course of 1 year, you’ll be amazed at the compound effect and how much new concepts and skills that you will have learned.
2. How to Learn Data Science?
How do we learn? Learning style* is popularly classified into 3 major types:
- Visual (See)
- Auditory (Hear)
- Kinaesthetic (Do)
*Disclaimer: It should be noted that there is no scientific proof for the learning styles and thus herein we used the term ‘popularly’ to depict the mainstream popularity of its use. The learning style is used herein to illustrate the various and many form and medium that exists. Advices presented herein are based solely on my own opinion and experience. Please refer to the published research on learning style myth at: https://www.apa.org/news/press/releases/2019/05/learning-styles-myth
Knowledge is everywhere and the source of learning comes in many shape and form. For example, you can learn from books, blogs, videos, podcasts, audio books, lectures, teaching and most important of all by doing.
“The Best Way to Learn Data Science is By Doing Data Science.”
— Chanin Nantasenamat (AKA Data Professor)
As you learn new concepts or skills (i.e. from visual and auditory), you can reinforce what you have learned by immediately applying that newfound knowledge to your data science project (i.e. kinaesthetic). By constantly doing data science, you will gradually reinforce and hone the new concepts and skills that you had just learned. And over time you will have mastered them.
In addition, to further reinforce your understanding of these new concepts or skills you can teach others (i.e. writing a tutorial blog, making a video tutorial and teaching others). By doing this, you can harness the above mentioned 3 learning styles and thereby maximize your learning potential. It is also worthy to note that teaching others will help you to materialize the new concepts or skills into your very own wordings and in doing so helps to reorganize your thoughts and better your understanding of it.
Learn How to Learn
This is just the tip of the iceberg on advices on how to learn. In fact, there is an online course on Coursera called Learning How to Learn by Dr. Barbara Oakley and Dr. Terrence Sejnowski, which is a great course that will teach you some of the learning techniques to help you learn more efficiently.
Another great read is a Medium article by Evernote entitled Learning From the Feynman Technique, which summarizes the learning technique devised by the Nobel laureate and physicist, Richard Feynman. Additionally, a YouTube video on The 25 Best Scientific Study Tips provides actionable tips on effective study tips that you can also used in learning data science.
Moreover, Scott Young has written an excellent book on Ultralearning where he shares his self-education experience in learning MIT’s 4-year computer science curriculum in just 1 year. In addition, Josh Kaufman delivered a TED talk and described in his book The First 20 Hours that we can learn anything that we want in just 20 hours.
Mastering the art of learning will allow you to learn and study data science more effectively and in turn will make your learning experience much more enjoyable.
Strategies for Learning Data Science & Skill Sets Needed
Late last year, I released a YouTube video Strategies for Learning Data Science in 2020 where I share some of the practical tips and tricks to get started in your data science journey. You will also want to check out How to Become a Data Scientist (Learning Path and Skill Sets Needed) where I take you on a bird’s eye look at the holistic landscape of data science and cover the 8 important skill sets that all Data Scientists should know about. Additional videos providing strategies and advices on learning data science can be found in the Data Science 101 playlist on the Data Professor YouTube channel.
Ken Jee has made an excellent Medium article and YouTube video on How to ULTRALEARN Data Science. Additionally, he also shares his tips in his YouTube video How I Would Learn Data Science (If I Had to Start Over).
4. Resources for Learning Data Science (Fee vs Free)
There are an abundance of learning resources out there for learning data science. In fact there are so much that it may be overwhelming to choose from. I will break down the available learning resources into 2 major types: Fee vs Free.
In the following sections, I will be listing some of the resources for learning data science for fee and free.
Learning Resources for a Fee
- 365 Data Science
- O’Reilly Online Learning ($49/month or $499/year)
- Data Science course — the following are top courses:
1. Machine Learning A-Z™: Hands-On Python & R In Data Science
2. Python for Data Science and Machine Learning Bootcamp
3. The Data Science Course 2020: Complete Data Science Bootcamp
4. Data Science A-Z™: Real-Life Data Science Exercises Included
5. R Programming A-Z™: R For Data Science With Real Exercises!
6. Artificial Intelligence A-Z™: Learn How To Build An AI
7. Machine Learning, Data Science and Deep Learning with Python
8. Python A-Z™: Python For Data Science With Real Exercises!
9. Statistics for Data Science and Business Analysis
10. Complete Machine Learning and Data Science: Zero to Mastery
Learning Resources for Free or for a Fee
- edX — Aside from CS50, all other are for a Fee
1. CS50 (Free / Verifiable certificate for $90)
2. Professional Certificate in Data Science (Harvard University)
3. MicroMasters® Program in Statistics and Data Science (MIT)
4. MicroMasters® Program in Data Science (UC San Diego)
5. IBM’s Professional Certificate in IBM Data Science (IBM)
6. MicroMasters® Program in Analytics: Essential Tools and Methods (Georgia Tech University)
7. Master of Science in Analytics (Georgia Tech University)
- Free for audit or for a Fee to earn certificate
1. Machine Learning (Andrew Ng / Stanford University)
2. Data Science Specialization (10 courses / John Hopkins University)
3. Executive Data Science Specialization (5 courses / John Hopkins University)
4. Data Mining Specialization (6 courses / University of Illinois)
5. Master of Computer Science in Data Science (8 courses / University of Illinois at Urbana-Champaign)
6. Master of Applied Data Science (University of Michigan)
- Free for selected introductory courses or for a Fee for Nanodegree programs available in Udacity’s School of Data Science and School of Artificial Intelligence.
1. Intro to Data Science (Free)
2. Intro to Data Analysis (Free)
3. Data Analysis and Visualization (Free)
4. SQL for Data Analysis (Free)
5. Intro to Inferential Statistics (Free)
6. Data Scientist Nanodegree Program (Fee)
7. Data Analyst Nanodegree Program (Fee)
8. Data Visualization Nanodegree Program (Fee)
9. Data Engineer Nanodegree Program (Fee)
10. Machine Learning Engineer Nanodegree Program (Fee)
There are also many Learning Resources available for Free
- Kaggle Micro-Courses — 14 micro-courses consisting of:
2. Intro to Machine Learning
3. Intermediate Machine Learning
4. Data Visualization
6. Feature Engineering
7. Deep Learning
8. Intro to SQL
9. Advanced SQL
10. Geospatial Analysis
12. Machine Learning Explainability
13. Natural Language Processing
14. Intro to Game AI and Reinforcement Learning
- YouTube — There are several excellent channels covering several important topics in data science.
1. Data Professor
2. Ken Jee
3. Krish Naik
6. StatQuest with Josh Starmer
8. Data School
9. Python Programmer
10. Lex Fridman
11. Abhishek Thakur
12. Two Minute Papers
13. Andreas Kretz
14. Cory Schafer
15. Siraj Raval
16. Story by Data (Kate Strachnyi)
18. Joma Tech (Data Science Playlist)
19. 365 Data Science
20. Data Science Dojo
21. Data Camp
22. Import Data
23. Data Science Jay
24. David Langer
25. Daniel Bourke
26. Python Engineer
Learn from Competitions and Hackathons
Another way to learn and grow your data science skills is to participate in data science competitions and hackathon. A well-known and popular platform where you can take part in data science competition is Kaggle while a platform that hosts machine learning hackathon is MachineHack.
The great part about taking part in these events is the great deal of improvisation and creativity that is required in tackling the problem at hand owing to the impromptu nature. As these events involve a time factor (i.e. having a specific deadline submission date) therefore subconsciously you are motivated to carry the project through to completion. It could be envisioned that in a learning scenario where there are no pressure for you to finish the data science project by a specific date or time then there may be a chance that you may delay (i.e. that may arise from procrastination or life’s events that are thrown at you) the completion of the project. Recall the time when you were in school and you have to study for an exam that is to take place on a specific date and time. As such, you would prepare for the exam (i.e. by performing necessary reading, recall, understanding and memorisation) so that you would be able to attend and take the exam. Likewise, if you are determined to participate and complete a competition or hackathon then you would need to prepare and analyze the dataset given by these events (i.e. data pre-processing, exploratory data analysis, feature engineering, model building and model interpretation).
Another great reason for participating in competitions and hackathons is the valuable tips and tricks that you may gain by finding creative ways to improve model performance. In doing so, you may be engaged in the learning process that may lead you to reach out to other fellow data scientists to discuss novel ways on how to approach the dataset and in doing so learn a few new things. Additionally, your journey to seek out ways to improve the model performance may lead you to dig the research literature and try out new things, libraries and/or approaches. All of these may not have been possible if learning only through traditional and passive ways.
For more great data science competitions and hackathons please refer to the excellent post in Towards Data Science by Benedict Neo on 10 Data Science Competitions for you to hone your skills for 2020. Moreover, it is also recommended to take the Coursera course on How to Win a Data Science Competition: Learn from Top Kagglers and learn about some of the best practices for winning at these competitions. You may also want to check out Abhishek Thakur’s YouTube video on My Journey: How I Became The World’s First 4x (and 3x) Grand Master On Kaggle.
5. Why Data Science?
Having a clear purpose and reason for why you want to learn Data Science can help you to appreciate data science more. Take some time to think about this by exploring the following major questions.
Why Do I Want to Learn Data Science?
The most important question that you want to ask yourself is simply Why do you want to learn data science? By answering this question you will better understand which area of data science that you need to focus on learning first because the field is vast and it is easy to get lost and down the rabbit hole.
How Will I Use Data Science in My Projects
It is important to determine how you will put Data Science into use in your projects. Some of the questions that you want to answer includes:
- Will you be performing exploratory data analysis
- Will you be developing a regression / classification / clustering model
- Will you be developing a chat bot
- Will you be developing a recommendation system
What Values can I Bring to My Work Through The Use of Data Science
As Stephen Covey puts it in his 7 Habits of Highly Effective People, “Begin With the End in Mind.”
- So take a moment to think about the desired destination that you hope to reach with Data Science.
- With a clear goal in mind, you’ll be amazed at how committed you’ll become toward reaching that goal.
6. Keep Yourself Accountable and Be Productive!
First of, being accountable for your own learning progress will help keep you on track. I have been a part of an awesome online community of data scientist founded by Ken Jee. In this online community, there would be a discussion board where members could publicly post what their aims for the week or month is. In doing so, it helps us stay committed to the goal we originally intended and aimed for.
Ken also shares more of his tips and tricks for staying motivated and productive in his Medium article on How to Stay Motivated and Productive When Learning Data Science. Additional tips can be found in 8 Habits of Highly Accountable People by Kevin Daum on Inc.
Here are some basic advice for being productive:
- Set aside dedicated time every day (preferably 1–2 hours or at least 45 minutes everyday) that you can spend learning and doing Data science
- Avoid distractions (Turn off your phones, avoid checking social media, etc.). If you cannot stop distractions from reaching you then maybe it may be a better idea to move yourself from a distractive environment. This means that you should find someplace quiet where you can put your undivided attention to focus.
- Don’t procrastinate, don’t over think, and just do it! (like Nike) To help you overcome this, try applying the 2-minute rule (read this Medium article on How to Stop Procrastinating by Using the ‘2-Minute Rule’) to help keep you in motion.
Because at the end of the day, if you’re not making progress, you’re not learning and you’re not getting ahead to meet the goals and be where you want to be in your career.
7. Embrace Failure and Learn to Love Debugging
Embrace failure. You’ll have to learn to get comfortable with the uncomfortable. Because simply put, there’s No Free Lunch. No pain, No gain. So when you encounter failure, don’t dwell on it, just get back up and keep on trying.
It is perfectly okay to get stuck, it is okay to don’t understand algorithm X, and it is okay to not know how to debug your failed code. You can take a break to refresh your mind before getting back into tackling your challenge. Sometimes your mind gets clogged and get sluggish and so taking a break may help to rejuvenate and refresh the mind.
When you are stuck with a coding error for your data science project and you are not sure on how to proceed. If you have a friend who is knowledgeable in coding, ask him or her. If not, search Stack Overflow if there is already an answer for your question. If not, ask!
Learn to love debugging, take it as a learning opportunity that you can gain valuable insights and lessons learned from failures and mistakes. Because if you don’t fail, you don’t learn. But when you do fail, don’t be too harsh on yourself and learn to get back up and start over. You want to be resilient to failure.
8. Don’t Worry About Trying to Learn Everything
A newcomer to the field may be stunned by all the fancy terminologies but try not to be intimidated and remember that Data science and Machine learning is a dynamic, growing and evolving field and therefore there will always be the introduction of new technologies. Simply put, the only thing that will remain constant is change itself.
As mentioned above, don’t be intimidated and take the dive and start. It does not matter where you start, the most important thing that does matter is that you actually start your data science journey.
Focus on the Basics
- Data wrangling (Python — pandas, R — dplyr)
- Read up on statistics so that you can apply them in your models. For example, applying proper statistics to Compare models (parametric vs non-parametric).
- Exploratory data analysis and descriptive statistics for gaining an overview of the data
- Start with building simple and interpretable machine learning models (linear regression, tree-based methods)
- Use machine learning approaches that you are confident in using (knowing the math behind it)
Focus on the Project and Not on the Technology
Don’t over think. Overcome the “What language should I learn?” dilemma, choose one and move on.
Know that programming is a tool, which should help you in taking your project’s idea forward to development and deployment
The underlying concepts of programming is language agnostic, meaning that the core fundamentals applies across languages:
- Defining variables, arrays, data frames, etc.
- Flow control (e.g. for loops, if and else statements)
- Specific tasks in Data science
– Data wrangling / Data pre-processing
– Data visualization
– Model building
– Model deployment
9. Make Your Projects Reproducible
Some of the benefits of making your data science projects reproducible are as follows:
Others can help you
- When you are faced with a coding error, it is essential to make a minimal working example (MWE) as it will allow others to reproduce your errors so that they can help you.
Save time for your future self and others
- Export your project as Docker containers as well as Python’s and Conda’s environments. Because what works today may not work 6 months from now owing to the constantly changing versions of the underlying libraries that are installed in your coding environment. It is thus essential to use virtual environments, Docker containers or at least export the library versions (shown below for pip and conda).
Exporting environment in pip:
pip freeze > requirements.txt
Exporting environment in conda:
conda env export > environment.yml
10. Learning Success Starts from Within
This section explore the idea that the level of success for your data science journey starts from within. It is about preparing your mind for what is to come and become of you. These concepts include: Curiosity, Love the Process, Growth Mindset and Grit.
Curiosity can be considered to be one of the core and necessary skill for becoming a data scientist because it keep us motivated and persistent in the pursuit of creative ways of solving problems. Albert Einstein once compared curiosity and knowledge.
“Curiosity is more important than knowledge.”
Eric Colson stressed the importance of curiosity in his Harvard Business Review article Curiosity-Driven Data Science.
“…think less about how data science will support and execute your plans and think more about how to create an environment to empower your data scientists to come up with things you never dreamed of.”
Loving the Process
Learning data science is not an easy endeavor nor an impossible feat. It is definitely possible for an individual from a non-technical background to break into data science as I did and discussed in my previous Medium article How a Biologist Became a Data Scientist.
When talking about loving the process, three names come to mind: Michael Jordan, Gary Vaynerchuk and Clément Mihailescu. These three individuals can be considered to be the best in what they do and their passion for what they do are relentless.
In signing his first professional basketball contract, Michael Jordan made sure that a special
clause was included in the contract which would allow Jordan to play basketball whenever and wherever without restrictions.
As Gary Vaynerchuk (Chairman of VaynerX, CEO of VaynerMedia, 5-Time NYT Bestselling Author) says in a YouTube video when asked if he could delegate most of his job to spend less time at work.
“I love the process of the work, I love the grind, I love the climb.…I would suffocate if I couldn’t put out the work that’s needed to accomplish the things that I want.”
Clément Mihailescu (CEO of AlgoExpert, Ex-Facebook Software Engineer and Tech YouTuber) says in a YouTube video about how he doesn’t experience burn out.
“At the end of the day, you have to enjoy the process. Whatever it is that you’re doing, whatever endeavor you’re pursuing, you have to enjoy the day to day, you have to love the nitty gritty stuff. You have to live and breath it.”
Growth Mindset and Grit
Based on several years of research, Angela Duckworth (Founder and CEO of Character Lab and Professor of Psychology at the University of Pennsylvania) defines the term grit in her best-selling book Grit: The Power of Passion and Perseverance (YouTube video) as the combination of passion and persistence. Particularly, an excerpt of her definition of grit is:
“Grit is the tendency to sustain interest in and effort toward very long-term goals.”
Carol Dweck described in her book Mindset: Changing The Way You Think to Fulfil Your Potentialfindings from her research on the two main mindset guiding our life: (1) growth mindset and (2) fixed mindset. The former has been associated with success while the latter will usually lead to self-doubt and unfulfilled life. In her TED talk, Dweck proposes the importance of working outside your comfort zone as the key to improving your performance.
In data science, change is inevitable as there will always be the introduction of new and challenging concepts that may overwrite or redefine prior concepts altogether. We will always be bombarded with complex challenges, to cope with these change and challenges, starts from within, particularly having the right mindset that help steer your path to success.
Bonus: 11. Taking Full Responsibility
It is often easy to come up with excuses and blame countless things for the misfortunes of life. When we do this, “we have zero accountability” as Gary Vaynerchuk would always say (an excellent YouTube video on Stop Blaming Others & Take Full Responsibility).
Learning data science is no different than any other endeavor that we do in our life. The thing is will we be accountable for our own delays or obstacles that we encounter during our learning journey or will we not take full responsibility and put the blame elsewhere.
Consider the following quotes on taking full responsibility (watch these on YouTube for the first two quotes and the third quote)
“Take full responsibility for what happens to you, it is one of the highest form of human maturity. Accepting full responsibility, it’s the day you know you have pass from childhood to adulthood.”
“Until you accept responsibility for your life, someone else runs your life.”
“Everything on you, everything’s your fault. You want to really win in life? You want to get real happy? Do you know why I’m really happy? Because I think that everything is my fault. If I don’t like it, I can change.“
Now, take a moment and reflect. Let’s start taking accountability and taking full responsibility, you’ll be amazed at how much you can achieve in your data science journey. Only if we can be objective and take full responsibility for our actions and lack of progress, will we be empowered to do something about it. I’ll leave you with this quote by Jim Rohn.
Success is not something you pursue, success is something you become.
And there you have it, the 10 things that I wish I knew about learning data science if I could go back in time and tell my 22 year old self about learning data science. I hope that these are useful in getting you started on your data science journey or if you have already started, hope that you can find something useful from it. Until next time, the best way to learn data science is to do data science and please enjoy the journey!