Updating Autobidding Models in Near-Real Time

Introduction

This is work done during my internship in the Autobidding team at Microsoft Ads, under Ajith Moparthi and Yannis Exarchos. The team is a part of the Advanced Reasoning Group at Microsoft Ads, led by Patrick Jordan.
Whenever you click on a page or engage with content with advertising potential, a small auction is triggered where automated agents from advertisers bid for your ad impression in an “ad exchange.” The winner of the auction gets to place their ad on your page or in sync with your content.
Autobidding teams bid for an ad impression on behalf of an advertiser. Better bids means better performance, means happier advertisers.
The best bid is determined by solving an optimization problem established in this paper. This optimization problem gives rise to a dual variable, and the optimal dual determines the best bid.
How do we get the optimal dual? Teams need to model the curve or map from the chosen dual to the observed KPI. These are our “models”. The KPIs here could be advertiser spend, clicks, conversions, etc.
The current models used at Microsoft are created using ad campaign features and calibrated by replaying past auctions.
However, these models have high latency for updates (about 4 days), and are not helpful for hourly model updates that would be needed for adaptive bidding within a day. The current models also tend to not be very helpful for “cold-start ad campaigns” (campaigns that recently started and do not have much data).

What I did

Aim: Design a method to update priors generated by current models, at an hourly granularity. Important considerations for us:
- Cross-time correlation: Want data obtained at time T to affect predictions at time T + X.
- Adjust faith in prior vs data: Want a data-dependent way to adjust faith in the existing curve vs faith in noisy auction data, when making updates.
- Monotonicity option: In many cases, we might want the curves to be monotonic – higher bid typically means more spend, for example.
- Lower latency: Want to support hourly as well as daily adjustments in models.
- Low data regime: Only have a few datapoints per map.
Represented curves linearly in function space: I worked with linear combinations of non-linear “basis functions” to represent hourly dual-KPI curves. This can be thought of as a linear representation in basis function space. I modeled these representations as GPs as a function of hour.
Punchline: KPI(t,d) = SUM {coeff(t) * basis_func(d)} + noise. The noise level controls faith in model.
Derived update rule: I derived the update rule for representations given real auction data. The subtlety is that while the GP is given by a vector coeffs(t), we don’t get noisy samples not of the vector itself. We get samples of the KPI, which is its dot product with the vector basis_funcs(d). I derived an exact inference rule for multitask GPs given dot product samples.
Designed and implemented a new KPI prediction pipeline:
- Fit a prior using Excalibur curves
- Make posterior updates using real auction data
- Use posterior mean of linear representation to predict KPI
Tuned hyperparameter to calibrate faith in signal: A noise hyperparameter trades off faith in prior vs faith in signal, can be tuned using any performance metric of choice (e.g. MSE prediction error for my experiments).

Code and Presentation

Link to repository. You can find a version of my final presentation with sensitive data removed at this link.