What an ML Engineer needs to know

Are you interested in Machine Learning? Are you asking yourself which are the key skills within this profession? This blogpost will tell you about the most valuable skills within the field and what you really need to have in your arsenal to call yourself a machine learning engineer.

If you are interested in machine learning, you are not alone. In fact, machine learning is one of the hottest fields right now and more and more people get interested in it everyday. But being interested in it and really pursuing such a complex topic is another thing. But don’t worry, this blog post will tell you what skills you really need, to start a machine learning career. The first thing you need to know is that it’s not enough to simply be a software engineer or a data scientist, because Machine learning is a mixture of both of these two professions.

Often times people confuse the job of a data analyst with the one of a machine learning engineer. Although these two have a lot of similarities, they are two different professions. The key difference is that as a data analyst your final output is mostly an analysis or visualizations, that tell a story about the data you were working on. As a machine learning engineer, the final output is working software, that gives you a probability/prediction.

Understanding the domain you’re working in

As a Machine Learning engineer, you need to understand the whole ecosystem for which you are building your system. If you are working at a company and you need to build a machine learning model that predicts when costumers will churn off the business, you need to understand their inventory, pricing, seasonality, customer behavior, how they address their marketing, etc. It is actually more about understanding the whole ecosystem than knowing a lot of machine learning algorithms because you need to write software that will successfully integrate and interface within the business.

The Key Skills

1. Programming

Of course, you need to be able to write software as a machine learning engineer. Python is one of the most widely used programming languages within the field, because of it’s many machine learning libraries like scikit-learn, TensorFlow, Theano, Caffe and so on. Other commonly used languages are R, Java, C, C++, Julia, Scala, Ruby, Octave, etc. I would recommend you to choose Python and understanding the basics of the language, libraries, and data structure. Ideally, you also know the fundamentals of computer science like data structures, algorithms, computability, complexity and computer architecture.

2. Statistics

Since Machine learning is all about predictions and understanding the data you are dealing with, you need to understand the basics of statistics to call yourself a machine learning engineer. In detail, you need to understand descriptive and inferential statistics to understand properly what the data is telling you.

3. Mathematics

A thorough mathematical understanding of machine learning techniques is necessary to get a good grasp of the inner workings of the algorithms and getting good prediction results. This mathematical intuition will help you for example at selecting the right algorithm, choosing the right parameter settings, recognizing over- & underfitting and so on. Although it depends on the type of problem you are dealing with, the minimum level of mathematics a Machine Learning Scientist/Engineer should be able to master includes Linear Algebra, Multivariate Calculus, Algorithms and Complex Optimizations.

4. Feature Engineering

The quality of feature engineering determines the difference between the good and the average machine learning engineers. You will spend most of your time on this task, which is why it is so important. Feature engineering is the process of transforming raw data into features that better represent the underlying problem. You could also say that you turn your inputs into things the algorithm can understand. If you prepared your data well, you can still get a satisfactory result even if you didn’t choose the best algorithm for your current problem.

5. Applying Machine Learning Algorithms & tools

Of course, you need to be able to apply the standard Machine Learning tools (like scikit-learn, Theano, TensorFlow, etc.). You also must be able to apply algorithms effectively, which involves choosing the right model (like support vector machines, decision trees, k nearest neighbor, neural networks and so on), as well as understanding how the hyper-parameters affect the learning of your model. You should also be familiar with the various traps (like overfitting and underfitting, missing data, data leakage, etc.) and the advantages and disadvantages of different machine learning approaches.