Jeremy York

Content Selection Optimization in Data Driven Marketing

A common problem in data driven marketing systems can be summarized as follows:

1. We have a set of possible content that we could show

2. We have information about the shopper and the context for the content

3. We want to choose content that has the best chance to get a response from the shopper

4. We want our system to learn from the current choice

In this post and a follow-up next week, I'll discuss some applications, and give an non-technical overview of challenges, opportunities and solutions to designing such a system. These posts are intended as companions to a series of three technical posts by Sergey Feldman. My posts will provide the business setting, some of the real world problems, and design aspects, while Sergey will provide algorithmic details.

In the academic literature, this class of problems is referred to the "multi-armed bandit" problem. That name comes from the following thought experiment: you step into a casino with money to spend, and there are a number of slot machines (one-armed bandits) with different odds and pay-outs. In order to maximize your payout, you need to collect data about the odds for the different slot machines. But if all you do is experiment and collect data, by the time you have a good idea of the best odds, you've spent much of your money.


Here are a couple of areas where RichRelevance uses multi-armed bandit approaches; the broader applications should be obvious.

Recommendations: RichRelevance has a wide variety of recommendation strategies that can be used to show items to the customer. Examples of strategies are "customers who looked at this item also looked at..." or "customers who looked at this item went on to purchase..." When choosing content, we know things about the page (what type of page or product, what position on the page), and we also have information about the shopper (recognized or not, recent page views, purchases, etc). If we determine what metric we're most interested in (click through, add to cart, likelihood of purchase, order value, revenue per session), we can capture data on how the shopper responds, and optimize for the desired metric.

Promotions and Content: In our RichPromo product, our client can provide a set of different promotions or advertisements; similarly for RichContent, there may be a set of articles, videos, or other content available. We then have the task of choosing which promotion or content to show in a variety of contexts, and optimizing the choice in order to drive response. In contrast to the recommendations application, promotions often have a short shelf life. We have to learn quickly which promotions are the best, because they may no longer be applicable if we wait.

Note that these applications apply to websites as well as other channels (content in a mobile application, information provided in a sales associate application).


Fundamental to this problem is that two goals listed above are in conflict:

  • maximizing our metrics for the current request
  • maximizing the information learned for future requests

When we experiment with the choice of content, we will often pick content that our data tells us is not the best possible choice. However, if we just pick the thing that appears to be the best, we might be responding to random noise, and we might miss out on developing trends.

In addition, whatever approach that we choose has to be scalable, have low latency, and be fault tolerant. A major challenge to the practitioner is that much of the academic research fails to address some of these challenges.


Stationarity is a data science term for a process that doesn't change its behavior over time. A non-stationary process is subject to fluctuations over time. For example, a website redesign can drastically impact which strategies are most effective on a given page. In holiday time periods, shopping behavior changes because people will be browsing for gifts as well as looking for things for themselves. With promotions and content, the response rate can decline as the content goes stale, but it can later spike because of news or viral activity. This makes it imperative in many applications that the system be able to respond very quickly to changing circumstances. The first article by Sergey largely assumes stationarity, but most of these algorithms can be adjusted to be useful in a non-stationary situation. The performance of the approaches he surveys in a non-stationary situation is primarily a function of how quickly they can learn.

Optimization metrics, and the level of uncertainty around them, are important to consider. For example, focusing too much on driving click through rate will not necessarily drive conversion. Driving conversion will not necessarily improve order value. And simultaneously optimizing for multiple metrics is difficult to do. Compounding the problem is the fact that some of the most important metrics (e.g. revenue per session) are much more variable than other options such as click-through or conversion rate. Essentially, any revenue-based metric is substantially more variable than a categorical metric (such as conversion). This, in turn, makes it much more difficult to learn quickly. Applying a naive approach to a revenue-based metric will lead to a long period of instability, which will in turn negatively impact the results.

Operational issues have a significant impact on which approaches will perform well. When considering disaster recovery and run-time latency, batch approaches will be most attractive. They allow you to decouple run-time services from longer, more complex data processing jobs. It also becomes more feasible to have multiple instances of a service running (perhaps in multiple data centers) without worrying about synchronization. However, much of the academic literature focuses on systems that update in real time, that have full access to all of the data observed so far. It is possible to have your cake and eat it too, but it's at the cost of significantly more engineering and operations investment. You'll see in Sergey's second post how these factors have influenced some of our algorithm choices at RichRelevance.

Complexity can ramp up quickly as you have many types of content to pick from and a wide variety of contexts to consider. If the context has a large impact on which choice is optimal, there are two paths forward. The simpler approach is to split your data according to context, and treat each context as a separate optimization problem. This approach is simpler, but misses some learning opportunities. Introducing a modeling layer, which allows you to borrow strength across the contexts, complicates things considerably. Watch Sergey's blog posts next week for some insight into the opportunities and challenges that come from using context.


There are some great advantages to establishing a tight feedback loop for displaying content, measuring response, learning, and acting upon what we learn. It's applicable to website automation, mobile content delivery, promotions, targeted advertising, and more. The reason the problem is hard is that when we experiment and learn, we fail to use the content that we know or suspect is the best; but when we simply go for the choices that seem to be the best, we're at risk of not knowing as much as we think we do. Meanwhile, the correct choice of metric is crucial, and the most important metrics are the most difficult ones to work with. Add to that the various complications that are often overlooked in the academic literature, and it can be daunting to try to build such a system.

Well, so there's a lot about this problem that is difficult, but there are some important and valuable applications. Watch this space for a follow-up post next week which provides a high-level overview of some of the solutions.

About :

Jeremy leads the data science and analytics team at RichRelevance, with a focus on expanding and improving product capabilities through the application of statistical and machine learning techniques.

Previously, Jeremy was Principal Scientist at DS-IQ, where he managed the design and development of its shopper marketing solutions, including a system for optimizing the sales impact of in-store video content, and development of cross-channel targeting and recommendations systems. Prior to DS-IQ, he built and led Amazon's Website Experimentation Software team, and made significant contributions to Amazon’s personalization capabilities by pioneering automated, scientific approaches to evaluating and managing content. His efforts resulted in many millions of dollars in increased sales; he is listed as an inventor on seven patent applications.

Jeremy earned his MS and PhD in Statistics at the University of Washington, where his doctoral thesis was awarded the international Leonard J. Savage Award. He also holds a BS in Statistics, with highest honors, from the University of Illinois.

Leave a Comment

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>