Bandits with Knapsacks

Bandits with Knapsacks

Bandits with knapsacks extend bandit learning to settings where actions consume limited resources, so the algorithm must learn rewards while respecting budgets or supply constraints.

Key points

  • Slivkins treats bandits with knapsacks as one of the book’s economics-oriented chapters, alongside bandits in repeated games and incentivised exploration [src-019].
  • The model is relevant when exploration is constrained by finite inventory, money, time, impressions, or other budgets rather than an unlimited sequence of identical trials [src-019].
  • The “knapsack” framing changes the objective: a good policy must balance expected reward against resource consumption, not just identify the best arm in isolation [src-019].
  • For product systems, this is the bandit variant to consider when recommendations, ads, procurement, marketplace allocation, or experimentation compete for scarce operational resources [src-019].

Related entities

Related concepts

Source references

  • [src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)