Lipschitz Contextual Bandits

Lipschitz Contextual Bandits

Lipschitz contextual bandits are contextual bandit problems where contexts and actions live in metric spaces, and similar context-action pairs are assumed to have similar expected rewards.

Key points

  • Lu, Pál, and Pál model sponsored search as a Lipschitz contextual multi-armed bandit: search queries are contexts, ads are actions, and click-through rate is the unknown payoff function [src-020].
  • The model assumes a metric over queries and a metric over ads. The expected payoff changes smoothly with respect to each metric, so similar queries for the same ad and similar ads for the same query should have close rewards [src-020].
  • This makes the problem richer than a context-free bandit. The learner must generalise across both context similarity and action similarity instead of treating every query-ad pair as unrelated [src-020].
  • The paper allows query sequences to be fixed in advance by an oblivious adversary, so the setting is not merely IID personalisation [src-020].
  • Regret depends on the geometric complexity of the query and ad spaces: upper bounds use covering dimensions; lower bounds use packing dimensions [src-020].

Related entities

_(none yet)_

Related concepts

Source references

  • [src-020] Tyler Lu, David Pál, Martin Pál — “Contextual Multi-Armed Bandits” (AISTATS 2010)