Why Production Machine Learning Fails And How To Fix It

Machine learning has emerged as a must-have tool for any serious data team: augmenting processes, generating smarter and more accurate predictions, and generally improving our ability to make use of data.

However, discussing applications of ML in theory is much different than actually applying ML models at scale in production. In this article, we walk through common challenges and corresponding solutions to making ML a force multiplier for your data organization.

From generating your weekend bike route on Google Maps to helping you discover your next binge-worthy show on Netflix, machine learning (ML) has evolved well beyond a theoretical buzzword into a powerful technology that most of us use every day.

For the modern business, the appetite for ML has never been stronger. But while certain industries have been transformed by the automation made possible by ML—think fraud detection in finance and personalized product recommendations in e-commerce—the hard truth is that many ML projects fail before they ever see the light of day.

In October 2020, Gartner reported that only 53% of projects make it from prototype to production—and that’s at organizations with some level of AI experience. For companies still working to develop a data-driven culture, that number is likely far higher, with some failure-rate estimates soaring to nearly 90%.

Data-first tech companies like Google, Facebook, and Amazon are transforming our daily lives with ML, while many other well-funded and highly talented teams are still struggling to get their initiatives off the ground. But why does this happen? And how can we fix it?

We share the four biggest challenges modern data teams face when adopting ML at scale— and how to overcome them.

Misalignment between actual business needs and ML objectives

The first challenge is strategic, not technical: starting with a solution instead of a clearly defined problem.

As companies race to incorporate ML into their organizations, leaders may hire data scientists and ML practitioners to automate or improve processes without a mature understanding of which problems are actually suitable for ML to solve. And even when the business problem is a good fit for ML, without a shared definition of what success looks like, projects can languish in experimentation mode for months while stakeholders wait for an idealized level of machine-like perfection that can never be reached.

Machine learning is not magic, it will not solve every problem perfectly, and should, by nature, continue to evolve over time. Sometimes, a model merely achieving the same results as humans is a worthy project—errors and all.

Before starting any project, ask your team or your stakeholders: What business problem are we trying to solve? Why do we believe that ML is the right path? What is the measurable threshold of business value this project is trying to reach? What does “good enough” look like?

Without these clear, shared definitions articulated at the outset, many worthy ML projects will never reach production and valuable resources will be wasted. Solve a business problem using ML and not just embark on a ML project for checking off the ML box.

Model training that doesn’t generalize

With a clearly defined business problem and targeted success metrics, your potential pitfalls get more technical. During the model training stage, issues related to your training data or model fit are the likeliest culprit for future failure.

The goal of model training is to develop a model that can generalize, or make accurate predictions when given new data because it understands the relationships between data points and can identify trends. Your training dataset should be clean, sizable, and representative of the real-time data your model is expected to process once in production. No where has one seen clean data in a production environment. Expect to spend considerable time cleaning, labeling and feature engineering just to get the data ready.

Representative training data is also key: If your training data doesn’t reflect the actual datasets your model will encounter, you may end up with a model that won’t perform once you’ve reached testing or production.

Another issue that can occur during training is overfitting and underfitting. Overfitting happens when a model learns too much and produces outputs that fit too closely with your training data.

Underfitting is simply the opposite—your model doesn’t learn enough to make useful predictions on even the training data itself, let alone new data it will encounter in testing or production.

Testing and validation issues

As you test and validate your models, new challenges can arise from merging multiple data streams and making updates to improve performance. Changes to data sources, model parameters, and feature engineering all introduce room for error.

This may also be the stage when you detect overfitting or underfitting in your model—a model that performed wonderfully during training but fails to produce useful results during testing may be overfit.

Even at companies like Google, where ML engineers abound, surprises in your product models can—and will—arise.

Deployment and serving hurdles

Deploying ML projects is rarely simple, and teams typically can’t use consistent workflows to do so—since ML projects solve a wide range of business problems, there’s a similarly wide range of ways to host and deploy them. For example, some projects require batched predictions on a regular basis, while others need to generate and deliver predictions on-demand when an application makes an API request to predict using the model. (This is part of why it’s challenging to make models apply to different use cases, no matter how appealing it may sound to executives who may view ML models as more magic than narrowly focused functions.)

Additionally, some ML projects can require a lot of resources, and cross-functional teams need to agree upon priorities of deployment. Engineers only have so many things they can productionalize, and as we’ve discussed, ML projects are much more than models and algorithms: most will need infrastructure, alerting, maintenance, and more to be successfully deployed.

This is why it’s so important to articulate the business problem clearly at the outset, agree upon what success looks like, design an end-to-end solution, and have a shared understanding of your ML project’s value compared to other priorities. Without this strategic plan, your project may never receive the engineering resources it needs to finally reach production.

For just one example, Netflix never productionalized its million-dollar prize-winning recommendation algorithm due to the winning model’s complexity to implement—instead choosing another submission that was simpler to integrate.

Tactics for scalable ML in production

Beyond strategic planning and staffing, there are concrete steps you can take to help scale your ML production.

Lean into the cloud

If your teams are working locally instead of in the cloud, it’s time to shift. Working in the cloud is the “glue” that keeps model training and deployment workflows in lockstep. Most vendors and open-source tools are developed for the cloud, and once there, it’s much easier to automate processes. Testing, training, validation and model deployment needs to be a repeatable process, it should not go from local Python code to a production environment.

Leverage a DevOps approach

Just as we’ve talked about applying DevOps practices to data, like setting data SLAs and measuring data health along observability pillars, ML teams can follow in DevOps’ footsteps by implementing the Continuous Integration (CI) and Continuous Delivery (CD) model, while introducing Continuous Training (CT). By setting up agile build cycles and tools, ML teams can more rapidly deliver changes into the codebase and improve overall performance.

Similar to DevOps best practices, ML teams should use containerization to consistently run software across any type of device and make it simpler for engineers to productionalize their work. Keeping a consistent and visible build process that deploys smaller changes more frequently allows the team to work more smoothly and have more insight into what’s working well, and what’s not. Visibility also helps would-be code “gatekeepers” trust the build process and speed up the ML team’s workflow.

Investing time to build a strategic MLOps team and processes will help by reducing the likelihood of a project stalling out before production, and making continuous improvements feasible—setting every project up for long-term success.

Invest in observability and monitoring

Finally, the primary rule of machine learning is that your outcomes will only be as good as your inputs. Healthy data is absolutely essential to ML. Without clean data and working pipelines, models won’t be able to perform to their fullest potential and may fail to make accurate predictions.

And when you’re relying on ML to make important business decisions, you don’t want to find out about a broken pipeline or inaccurate data after those outputs have already been delivered.

That’s why data observability, which provides a full understanding and comprehensive monitoring of data health—and can prevent bad data from reaching your ML models in the first place—is well worth the investment.

Achieving ML production at scale

Machine learning isn’t magic, but it is powerful—and often misunderstood.

Still, with the right mix of strategy, processes, and technology, ML projects can deliver competitive advantages and fuel growth across every industry.

Even if you’re not building the next fraud detection algorithm or virtual personal assistant, we hope these best practices help you on your journey to (successfully!) deploying ML at scale.

Co-Author: Manu Mukerji

Manu Mukerji is the VP of Engineering, AI, Data & Analytics, at 8×8.

Why Production Machine Learning Fails — And How To Fix It