April 2019

Volume 34 Number 4

[Machine Learning]

Closed-Loop Intelligence: A Design Pattern for Machine Learning

By Geoff Hulten | April 2019

There are many great articles on using machine learning to build models and deploy them. These articles are similar to ones that teach programming techniques—they give valuable core skills in great detail. But to go beyond building toy examples you need another set of skills. In traditional systems these skills are called things like software engineering, software architecture or design patterns—approaches to organizing large software systems and the teams of people building them to achieve your desired impact.

This article introduces some of the things you’ll need to think about when adding machine learning to your traditional software engineering process, including:

Connecting machine learning to users: What it means to close the loop between users and machine learning.

Picking the right objective: Knowing what part of your system to address with machine learning, and how to evolve this over time.

Implementing with machine learning: The systems you’ll need to build to support a long-lived machine learning-based solution that you wouldn’t need to build for a traditional system.

Operating machine learning systems: What to expect when running a machine learning-based system over time.

Of course, the first question is determining when you need to use machine learning. One key factor in the decision is how often you think you’ll need to update an application before you have it right. If the number is small—for example, five or 10 times—then machine learning is probably not right. But if that number is large—say, every hour for as long as the system exists—then you might need machine learning.

There are four situations that clearly require a number of updates to get right:

  • Big Problems: Some problems are big. They have so many variables and conditions that need to be addressed that they can’t be completed in a single shot.
  • Open-Ended Problems: Many problems lack a single, fixed solution, and require services that live and grow over long periods of time.
  • Time-Changing Problems: If your domain changes in ways that are unpredictable, drastic or frequent, machine learning might be worth considering.
  • Intrinsically Hard Problems: Tough problems like speech recognition and weather simulation and prediction can benefit from machine learning, but often only after years of effort spent gathering training data, understanding the problems and developing intelligence.

If your problem has one of these properties, machine learning might be right. If not, you might be better off starting with a more traditional approach. If you can achieve your goal with a traditional approach, it will often be cheaper and simpler.

Connecting Machine Learning to Users

Closing the loop is about creating a virtuous cycle between the intelligence of a system and the usage of the system. As the intelligence gets better, users get more benefit from the system (and presumably use it more) and as more users use the system, they generate more data to make the intelligence better.

Consider a search engine. A user types a query and gets some answers. If she finds one useful, she clicks it and is happy. But the search engine is getting value from this interaction, too. When users click answers, the search engine gets to see which pages get clicked in response to which queries, and can use this information to adapt and improve. The more users who use the system, the more opportunities there are to improve.

But a successful closed loop doesn’t happen by accident. In order to make one work you need to design a UX that shapes the interactions between your users and your intelligence, so they produce useful training data. Good interactions have the following properties:

The components of the interaction are clear and easy to connect. Good interactions make it possible to capture the context the user and application were in at the time of the interaction, the action the user took and the outcome of the interaction. For example, a book recommender must know what books the user owns and how much they like them (the context); what books were recommended to the user and if they bought any of them (the action); and if the user ended up happy with the purchase or not (the outcome).

The outcome should be implicit and direct. A good experience lets you interpret the outcome of interactions implicitly, by watching the user use your system naturally (instead of requiring them to provide ratings or feedback). Also, there won’t be too much time or too many extraneous interactions between the user taking the action and getting the outcome.

Have no (or few) biases. A good experience will be conscious of how users experience the various possible outcomes and won’t systematically or unconsciously drive users to under-report or over-report categories of outcomes. For example, every user will look at their inbox in an e-mail program, but many will never look at their junk folder. So the bad outcome of having a spam message in the inbox will be reported at a much higher rate than the bad outcome of having a legitimate message in the junk folder.

Does not contain feedback loops. A closed loop can suffer from feedback that compounds mistakes. For example, if the model makes a mistake that suppresses a popular action, users will stop selecting the action (because it’s suppressed) and the model may learn that it was right to suppress the action (because people stopped using it). To address feedback loops, an experience should provide alternate ways to get to suppressed actions and consider a bit of randomization to model output.

These are some of the basics of connecting machine learning to users. Machine learning will almost always be more effective when the UX and the machine learning are designed to support one another. Doing this well can enable all sorts of systems that would be prohibitively expensive to build any other way.

Picking the Right Objective

One interesting property of systems built with machine learning is this: They perform worst on the day you ship them. Once you close the loop between users and models, your system will get better and better over time. That means you might want to start with an easy objective, and rebalance toward more difficult objectives as your system improves.

Imagine designing an autonomous car. You could work on this until you get it totally perfect, and then ship it. Or you could start with an easier sub-problem—say, forward collision avoidance. You could actually build the exact same car for forward collision avoidance that you would build for fully autonomous driving—all the controls, all the sensors, everything. But instead of setting an objective of full automation, which is extremely difficult, you set an objective of reducing collisions, which is more manageable.

Because avoiding collisions is valuable in its own right, some people will buy your car and use it—yielding data that you can leverage with machine learning to build better and better models. When you’re ready, you can set a slightly harder objective, say lane following, which provides even more value to users and establishes a virtuous cycle as you ultimately work toward an autonomous vehicle.

This process might take months. It might take years. But it will almost certainly be cheaper than trying to build an autonomous car without a closed loop between users and your machine learning.

You can usually find ways to scale your objectives as your models get better. For instance, a spam filter that initially moves spam messages to a junk folder could later improve to delete spam messages outright. And a manufacturing defect detection system might flag objects for further inspection as a first objective, and later discard defective objects automatically as models improve.

It’s important to set an objective that you can achieve with the models you can build today—and it’s great when you can grow your machine learning process to achieve more and more interesting objectives over time.

Implementing with Machine Learning

Systems built to address big, open-ended, time-changing or intrinsically hard problems require many updates during their lifetimes. The implementation of the system can make these updates cheap and safe—or they can make them expensive and risky. There are many options for making a system based on machine learning more flexible and efficient over time. Common investments include:

The Intelligence Runtime To use machine learning you need to do the basics, like implement a runtime that loads and executes models, featurizes the application context and gives users the right experiences based on what the models say. A runtime can be simple, like linking a library into your client, or it can be complex, supporting things like:

  • Changes to the types of models used over time, moving from simple rules toward more complex machine learning approaches as you learn more about your problem.
  • Combining models that run on the client, in the service, and in the back end, and allowing models to migrate between these locations over time based on cost and performance needs.
  • Supporting reversion when deployments go wrong, and ways to rapidly override specific costly mistakes that machine learning will almost certainly make.

Intelligence Management As new models become available, they must be ingested and delivered to where they’re needed. For example, models may be created in a lab at corporate headquarters, but must execute on clients across the world. Or maybe the models run partially in a back end and partially in a service. You can rely on the people producing the models to do all the deployments, the verification, and keep everything in sync, or you could build systems to support this.

Intelligence Telemetry An effective telemetry system for machine learning collects data to create increasingly better models over time. The intelligence implementation must decide what to observe, what to sample, and how to digest and summarize the information to enable intelligence creation—and how to preserve user privacy in the process. Telemetry can be very expensive and telemetry needs will change during the lifetime of a machine learning-based system, so it often makes sense to implement tools to allow adaptability while controlling costs.

The Intelligence Creation Environment For machine learning-based systems to succeed, there needs to be a great deal of coordination between the runtime, delivery, monitoring and creation of your models. For example, in order to produce accurate models, the model creator must be able to recreate exactly what happens at runtime, even though the model creator’s data comes from telemetry and runs in a lab, while the runtime data comes from the application and runs in context of the application.

Mismatches between model creation and runtime are a common source of bugs, and machine learning professionals often aren’t the best people to track these issues down. Because of this, an implementation can make machine learning professionals much more productive by providing a consistent intelligence creation experience.

For all of these components (the runtime, the intelligence management, intelligence telemetry and intelligence creation) you might implement something bare bones that does the basics and relies on ongoing engineering investments to adapt over time. Or you might create something flexible with slick tools for non-engineers so they can rebalance toward new objectives cheaply, quickly and with confidence that they won’t mess anything up.

Intelligence Orchestration

Intelligence orchestration is a bit like car racing. A whole team of people builds a car, puts all the latest technology into it, and gets every aerodynamic wing, ballast, gear-ratio, and intake valve set perfectly. They make an awesome machine that can do things no other machine can do.

And then someone needs to get behind the wheel, take it on the track and win!

Intelligence orchestrators are those drivers. They take control of the Intelligent System and do what it takes to make it achieve its objectives. They use the intelligence creation and management systems to produce the right intelligence at the right time and combine it in the most useful ways. They control the telemetry system, gathering the data needed to make their models better. And they deal with all the mistakes and problems, balancing everything so that the application produces the most value it can for users and for your business.

Right about now you might be saying, “Wait, I thought machine learning was supposed to tune the system throughout its lifecycle. What is this? Some kind of joke?” Unfortunately, no. Artificial intelligence and machine learning will only get you so far. Orchestration is about taking those tools and putting them in the best situations so they can produce value—highlighting their strengths and compensating for their weaknesses—while also reacting as things change over time. Orchestration might be needed because:

Your objective changes: As you work on something, you’ll come to understand it better. You might realize that you set the wrong objective to begin with, and want to adapt. Heck, maybe the closed loop between your users and your models turns out to be so successful that you want to aim higher.

Your users change: New users will come (and you will cheer) and old users will leave (and you might cry), but these users will bring new contexts, new behavior, and new opportunities to adapt your models.

The problem changes: The approaches and decisions you made in the past might not be right for the future. Sometimes a problem might be easy (like when all the spammers are on vacation). At other times it might get very hard (like near the holidays). As a problem changes, there’s almost always opportunity to adapt and achieve better outcomes through orchestration.

The quality of your models changes: Data unlocks possibilities. Some of the most powerful machine learning techniques aren’t effective with “small” data, but become viable as users come to your system and you get lots and lots of data. These types of changes can unlock all sorts of potential to try new experiences or target more aggressive objectives.

The cost of running your system changes: Big systems will constantly need to balance costs and value. You might be able to change your experience or models in ways that save a lot of money, while only reducing value to users or your business by a little.

Someone tries to abuse your system: Unfortunately, the Internet is full of trolls. Some will want to abuse your service because they think that’s fun. Most will want to abuse your service (and your users) to make money—or to make it harder for you to make money. Left unmitigated, abuse can ruin your system, making it such a cesspool of spam and risk that users abandon it.

One or more of these will almost certainly happen during the lifecycle of your machine learning-based system. By learning to identify them and adapt, you can turn these potential problems into opportunities.

Implementing machine learning-based systems and orchestrating them are very different activities. They require very different mindsets. And they’re both absolutely critical to achieving success. Good orchestrators will:

  • Be domain experts in the business of your system so they understand your users’ objectives instinctively.
  • Comprehend experience and have the ability to look at interactions and make effective adaptations in how model outputs are presented to users.
  • Understand the implementation so they know how to trace problems and have some ability to make small improvements.
  • Be able to ask questions of data and understand and communicate the results.
  • Know applied machine learning and be able to control your model creation processes and inject new models into the system.
  • Get satisfaction from making a system execute effectively day in and day out.

Wrapping Up

Machine learning is a fantastic tool. But getting the most out of machine learning requires a lot more than building a model and making a few predictions. It requires adding the machine learning skills to the other techniques you use for organizing large software systems and the teams of people building them.

This article gave a very brief overview of one design pattern for using machine learning at scale, the Closed-Loop Intelligence System pattern. This includes knowing when you need machine learning; what it means to close the loop between users and machine learning; how to rebalance the system to achieve more meaningful objectives over time; what implementations can make it more efficient and safer to adapt; and some of the things that might happen as you run the system over time.

Artificial intelligence and machine learning are changing the world, and it’s an exciting time to be involved.


Geoff Hulten is the author of “Building Intelligent Systems” (intelligentsystem.io/book/). He’s managed applied machine learning projects for more than a decade and taught the master's level machine learning course at the University of Washington. His research has appeared in top international conferences, received thousands of citations, and won a SIGKDD Test of Time award for influential contributions to the data mining research community that have stood the test of time.

Thanks to the following Microsoft technical expert for reviewing this article: Dr. James McCaffrey


Discuss this article in the MSDN Magazine forum