Bandits with Knapsacks

Bandits with knapsacks extend bandit learning to settings where actions consume limited resources, so the algorithm must learn rewards while respecting budgets or supply constraints.

Key points

Slivkins treats bandits with knapsacks as one of the book’s economics-oriented chapters, alongside bandits in repeated games and incentivised exploration ^[src-019].
The model is relevant when exploration is constrained by finite inventory, money, time, impressions, or other budgets rather than an unlimited sequence of identical trials ^[src-019].
The “knapsack” framing changes the objective: a good policy must balance expected reward against resource consumption, not just identify the best arm in isolation ^[src-019].
For product systems, this is the bandit variant to consider when recommendations, ads, procurement, marketplace allocation, or experimentation compete for scarce operational resources ^[src-019].

Related entities

Aleksandrs Slivkins

Related concepts

Source references

^[src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)

Bandits with Knapsacks

Bandits with Knapsacks

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services