Untold Truths of being a Machine Learning Engineer

Roman Orac Roman Orac
September 22, 2020 AI & Machine Learning

Irecently was a part of an interesting Reddit discussion and a few of my answers got highly upvoted. The main point of it was the untold truths of being a machine learning engineer. I am sharing the key takeaways in a curated manner as I was one of the more active participants.

Untold Truths of being a Machine Learning Engineer
What are the untold truths of being an ML engineer? on Reddit

1. Using Deep Learning

Many Machine Learning enthusiasts think that they will play with fancy Deep Learning models, tune Neural Network architectures and hyperparameters. Don’t get me wrong, some do, but not many.

The truth is that ML engineers spend most of the time working on “how to properly extract the training set that will resemble real-world problem distribution”. Once you have that, you can in most cases train a classical Machine Learning model and it will work well enough.

Just out of curiosity, which is the hardest problem being solved by any of these algorithms? And which one is being used to solve it?

Using Deep Learning
Photo by Caleb George on Unsplash

Deep Learning has the most success in Computer Vision (eg., self-driving cars) and Natural Language Processing more recently (GPT-3, etc.). So researchers and practitioners who work in those areas most likely use Deep Learning.

IMO All-time greatest achievement is DeepMind’s AlphaGo Zero. The self-driving car is probably the one that will have the most impact on society. The most recent achievement in Natual Language Processing is GPT-3.

Are Deep Learning models difficult to explain in comparison to classic ML models?

OP said it nicely:

Can’t see how explaining a Convolutional Neural Net would be any harder than explaining a whole classification framework based on SVMs, Random Forests or Gradient Boosting.

I feel like this statement has become less and less true over the years as NNs have seen more research into interpretability.

It clearly still holds when comparing NNs to good old traditional statistics like GLMs or Naive Bayes. But as soon as you move to CART based methods or anything using the kernel trick this fabled interpretability goes out the window.

Autonomous SystemsThe field of autonomous vehicles is set to grow by 42% within the next four years, with salaries for top engineers…medium.com

2. Learning Machine Learning

Learn Engineer really cool
Photo by NeONBRAND on Unsplash

When learning, you tend to go through a lot of papers on arxiv-sanity with some really cool algorithms. Then you enter the industry and all you see is relatively basic stuff like logistic regression, feedforward NNs, random forests (decision trees), bag-of-words instead of embeddings, and you feel like these models could be implemented by the average undergrad or even a smart high schooler. Maybe if you’re lucky you’ll see an SVM.

Infrastructure and data pipelines are where all the real engineering work happens.

I felt similar to the OP above at the beginning of my career. But why would you use a more complicated tool to solve the task when there’s no need for it. Many real-world problems don’t require state-of-the-art NN architecture to be solved. Sometimes a simple logistic regression gets the job done.

The second part of the comment is true for smaller startups in which you usually have to take care of data pipelines by yourself. In bigger companies, there are designated departments that deal with infrastructure. But there are no shortcuts — Data Scientists still need to be well informed about how data infrastructure works.

3. Learning Theory

Untold Truths of being a Learn Engineer
Photo by 🇸🇮 Janko Ferlič on Unsplash

Learn as much fancy theory as you want, but at the end of the day, your job is going to be 99% data cleaning and infrastructure work.

99% is a bit overexaggerated. To rephrase the OP: Machine Learning Engineers don’t just play with fancy models. Sometimes they need to get their hands dirty by cleaning and labeling the data.

Why don’t you use software and services to label data?

This is very true. So much so that I thought I was alone. I work mostly in NLP and 99% of my job is labelling data and making some infrastructure in Java.

Data labeling services are usually too expensive for the big datasets that are used in practice. Some datasets are not trivial to label. I had an experience where I was working on invoice classification and you would need professional accountants to label that data.

How does Machine Learning look in the real-world?

How does ML look in the real-world
Meme created with imgflip

I increasingly notice that there is a gap in understanding what do Data Scientists do. Many aspiring Data Scientists are then disappointed when expectations don’t meet reality. Data Science is not just about tweaking parameters of your favorite model and getting higher on the Kaggle leaderboard- what if I told you there is no leaderboard in the real world?!?

That is the reason I wrote Your First Machine Learning Model in the Cloud Ebook to show how does working on an actual Data Science projects looks from start to finish. This Ebook is aimed at Data Science enthusiasts and Software Engineers who are thinking to pursue a career in Data Science.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Roman Orac

    Tags
    Artificial IntelligenceMachine LearningMachine Learning Engineer
    Leave a Comment
    Next Post
    5 Tips for Getting Into Information Systems Management

    5 Tips for Getting Into Information Systems Management

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.