The Surprising Truth About What it Takes to Build a Machine Learning Product

Josh Cogan Josh Cogan
August 6, 2019 AI & Machine Learning

When I was in college, an ice cream shop opened nearby, and a few friends and I went to check it out. We walked in, and it looked completely normal — they had all the usual flavors like mint, chocolate, and the like. However, at the end of the counter, they had this flavor called “The Broccoli Surprise”. A naturally curious individual, I had to try it. I asked the attendant behind the counter for a sample. It was white with little green specks, and it tasted sweet, creamy, and rich. I was confused — there was no broccoli flavor in here. So I asked, “what’s the surprise?” “There’s no broccoli,” she replied with a smile.

Machine learning (ML) has a surprise, too. One of the biggest misconceptions about ML deployment within organizations is comprehending the difficulty and the value.

Integrating ML into your business workflows can be broken down into five activities:

Defining KPIs — Key Performance Indicators allow us to measure and discuss what we are trying to improve. Common KPIs include customer retention, manufacturing yield, or employee turnover. Setting KPIs is a critical step in Machine Learning since they ultimately drive optimization along the way to a performant model.

Collecting Data — Collecting the data that will be used to train your ML algorithms. Yes, you could use ML models others have produced if you lack data. However those business considerations are similar to other SaaS offerings, so let’s leave them out of scope here.

Infrastructure — ML infrastructure includes various pieces of software: data management, annotation tools, model training, and testing environments. This infrastructure is an upfront investment, but makes iterating and improving the model and data set much more efficient.

Optimizing ML Algorithm — Here we consider factors like which model to use based on a given data set/problem, the amount of necessary training data, the layers in your neural net, and hyperparameter tuning. There are a plethora of choices.

Integration — Getting an ML model working in a vacuum is a great achievement, but it is not until the model is integrated with a real workflow that it starts to create a tangible business impact. Integration is the process of building pipes and structure which seamlessly pass information and data between users and computers.

Based on many conversations with companies interested in deploying machine learning, there is a high perceived effort required in, and pay off from, optimizing a machine learning algorithm.

There are a few possible reasons for this:

  • For most practitioners, optimizing ML models is the biggest “unknown” in the stack, so it’s easy to imagine it being more complicated and time-consuming than it really is.
  • Availability Heuristic — since ML algorithms and optimization are talked about more in literature and media, it is common for people to assume that they play larger roles than they do in the actual implementation process.

The Surprise

When I talk to practitioners that have had a lot of experience building and scaling these ML systems inside Google, I hear a very different story. Based on these conversations, optimizing an ML algorithm takes much less relative effort, but collecting data, building infrastructure, and integration each take much more work. The differences between expectations and reality are profound.

Defining KPIs — once we deploy data-driven systems, we spend less time and organizational resources selecting KPIs since there are constant streams of data feedback. This obviates the need for proxy KPIs. Since good ML is predicated on good data, we must have a great collection pipeline already in place.

Collecting Data — Collecting data is almost always an underestimated component of spinning up an ML project. Some factors to consider when building a data collection and processing strategy are described in a previous post.

Infrastructure — Infrastructure building, which is mostly a software engineering task as opposed to an “ML task,” is one of the most time-consuming parts of most projects.

Optimizing ML Algorithm — The task of training and optimizing ML models almost always takes less time and effort than anticipated for two reasons. First, performance is a strong function of what data you possess. Tweaking algorithms yields benefits, however, pales in comparison to cleaning up your data. Second, tools for optimizing ML algorithms (like AutoML) make it much easier and faster to train and optimize models based on labeled or unlabeled data.

Integration — integration is another underestimated part of the ML deployment process. Error and exception handling, redundancies, and the challenge of moving from a static product to one of continuous iterations presents a host of software, product, and engineering challenges. Just think of all the technical debt hidden inside of your training data!


ML actually has two surprises.

First, many companies are wrong about which parts of the ML implementation process will be difficult. Tools and technical advances are dramatically changing ML optimization at a rate unmatched by software infrastructure for brute force data collection and management. Like the broccoli ice cream — there is usually not that much ML in an end-to-end ML system.

Secondly, the path of implementing ML (asking questions about your customers, building infrastructure to collect, interpret and act upon that data, etc.) is valuable, regardless of whether or not ML is actually implemented in the end. Not every problem has an ML-powered solution, but many do, and even those that do not will benefit from this journey.

This article has first appeared on Medium.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Josh Cogan

    Tags
    Machine Learning
    Leave a Comment
    Next Post
    The Data Fabric for Machine Learning – Part 1

    The Data Fabric for Machine Learning – Part 1

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.