Data Science, Machine Learning or AI: Where Should I Start?

Spoiler: the answer is often “none of the above”.

One can barely look at the news these days without coming across a reference to data science, machine learning or artificial intelligence. And for good reason — the explosion in collected data via connected sensors and devices combined with breakthroughs in artificial intelligence, fueled by drastic increases in computing power, have pushed these topics to the top of the priority list for every technology company. The resultant demand for technical talent with these skills is well documented — according to LinkedIn’s 2019 “Emerging Jobs” report, hiring for roles requiring these skills has increased 74% annually over the past four years. Data Scientist and AI Engineer average salaries are well above $100k with top experts in the field commanding salaries of several hundred thousand dollars.

As a result, a large variety of university and private educational offerings has sprung up to meet the rapidly growing industry need for this talent, ranging from private bootcamps to university masters and PhD programs in data science, machine learning or AI. Demand for these programs is booming as people of all ages are drawn to the exciting work opportunities and high pay of the field. But one central question is often unclear for folks looking to break into the field or expand their skillset:

Should I learn about data science, machine learning, or AI? Is there really a difference?

The short answer to the above question is “yes”. But the differences are not always clearcut, and the boundaries between them are sometimes murky. Many universities offer graduate-level programs in “data science”, “machine learning” or “AI” (or some combination of the three) and themselves do a poor job explaining their focus and occasionally apply the terms inconsistently with their peer universities. This leaves many interested learners unclear which one of these areas they should focus on. Without clarity on these terms, new learners are lost in where to start wading through the overwhelming array of articles, books, online courses, and videos available on these topics. Unfortunately, it’s easy to get started at the wrong spot and quickly get in over your head and get discouraged by the complexity.

So what ARE the differences between these terms?

We’ll start with “data science”. Data science refers to the collection of methods, tools and practices of analyzing data to derive insights in order to support decision-making. Data science is a broad term, and as a result, true data scientists must possess a wide skillset including programming, math/statistics, and domain knowledge of the desired field of application.

Successful data scientists are able to combine strong programming skills with math and statistics knowledge and a sufficient depth of domain knowledge about the problem they are solving. Data scientists perform a range of activities for an organization including data collection and processing, analytics modeling and machine learning, and data visualization. And data scientists work on problems across a wide range of fields, from the social sciences to agriculture to consumer goods. Thus, the job market for data scientists is expansive across companies, government, and non-profit organizations.

“Machine learning” is generally considered to be the ability of a computer program to “learn” or improve performance through examples rather than explicitly programmed rules. Machine learning is one of the key tools which data scientists use to analyze and interpret data. And in turn, software engineers applying machine learning rely on the techniques and tools of data science to prepare data for use in ML. While some organizations have created dedicated ML Engineer roles, in many others the responsibility for creating ML models falls to software engineers or data science teams. Whether one is a dedicated ML Engineer or a software developer charged with implementing a ML model, this function in an organization requires a combination of strong mathematical foundations, understanding of the theory of machine learning and its algorithms, and reasonable proficiency in programming to implement models in code. Although there is a need in every industry, ML Engineers are most commonly found in web/tech companies and industry-specific software companies.

“Artificial intelligence” is often used to describe machines that are capable of replicating the cognitive capabilities that are associated with the human mind. The field as an area of research dates back to the mid-1950s, and is composed of several sub-fields such as computer vision and robotics. A distinction is generally made between “artificial general intelligence”, or replicating the human mind’s capabilities in a broad sense, and “narrow AI” in which a machine learns to accomplish a very specific task. Many of the recent advances in the field of AI have been accomplished using machine learning techniques, although the field also includes areas such as expert systems or intelligent search.

Both the fields of data science and AI use machine learning as a central tool. In data science, machine learning is commonly utilized as a data analysis tool to uncover patterns in data and sometimes to make predictions. In the field of AI, machine learning is the key to creating intelligent agents. Often in AI, the data utilized for machine learning comes from hardware or sensors, and machine learning tools are used in near real-time to enable machines to take action. The other key element that connects all three fields is that the tools of data science are utilized to clean, process and analyze data as an input. While the sources of data may differ, most often the same techniques and programming tools are utilized.

Which one should I learn?

The answer to this question depends largely on what your goals are. For scientists and researchers working in diverse fields with data analysis, a thorough understanding of the tools of data science is a great place to start. For engineers who seek to build intelligence into software or hardware products, machine learning or more generally AI may be a logical path.

If you’re not exactly sure where you aspire to go and just want to get started, it’s hard to go wrong by starting with data science fundamentals (along with brushing up on your programming and calculus/linear algebra/statistics). Ultimately, data is the key to success in all of these fields, and so a strong set of skills in processing, cleaning, analyzing and visualizing data, along with the statistical knowledge required to do so, will serve you well no matter what direction you ultimately go.

Before you begin to study any of these areas, however, I would ensure that you have a sufficiently strong foundation of domain knowledge in the field/industry in which you aspire to apply these technologies. If you are entering college or just beginning your career, it’s important to build your domain knowledge before (or at least in parallel to) spending your time learning about data science/machine learning/AI. Unless you plan to make a career out of ML/AI research building new algorithms, you will be using these tools as part of applications within your own field of interest. Perhaps the largest single contributor to your ability to succeed in doing this is having sufficient domain knowledge to thoroughly understand the problem you are trying to solve and how data science / ML / AI can play a role in solving it.

We are fortunate to live at a time where there is a wide array of content available on each of these fields. The challenge is how to best navigate through it. Hopefully, this summary gave you a better sense of where to get started. Good luck, and enjoy the journey!

Data Science, Machine Learning or AI: Where Should I Start?

So what ARE the differences between these terms?

Which one should I learn?

Education For An Industry 4.0 World