Bandits with Knapsacks
Bandits with knapsacks extend bandit learning to settings where actions consume limited resources, so the algorithm must learn rewards while respecting budgets or supply constraints.
Key points
- Slivkins treats bandits with knapsacks as one of the book’s economics-oriented chapters, alongside bandits in repeated games and incentivised exploration [src-019].
- The model is relevant when exploration is constrained by finite inventory, money, time, impressions, or other budgets rather than an unlimited sequence of identical trials [src-019].
- The “knapsack” framing changes the objective: a good policy must balance expected reward against resource consumption, not just identify the best arm in isolation [src-019].
- For product systems, this is the bandit variant to consider when recommendations, ads, procurement, marketplace allocation, or experimentation compete for scarce operational resources [src-019].
Related entities
Related concepts
Source references
- [src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)