Stochastic Bandits
Stochastic bandits are bandit problems where each arm has a fixed reward distribution and rewards are independently sampled over time, making them the basic IID setting for adaptive exploration.
Key points
- Slivkins uses stochastic bandits as the starting point for the book, moving from the basic model and examples to simple uniform exploration and then adaptive exploration algorithms [src-019].
- The model assumes stable reward distributions, so the algorithm’s job is to identify and exploit better arms while controlling the regret incurred during learning [src-019].
- Adaptive exploration methods use accumulated evidence to concentrate future pulls on promising arms instead of spreading trials uniformly forever [src-019].
- Stochastic bandits provide the base case for later variants: Bayesian bandits, similarity-aware bandits, and Thompson Sampling all extend or reinterpret this foundation [src-019].
Related entities
Related concepts
Source references
- [src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)