Stochastic Bandits

Stochastic Bandits

Stochastic bandits are bandit problems where each arm has a fixed reward distribution and rewards are independently sampled over time, making them the basic IID setting for adaptive exploration.

Key points

  • Slivkins uses stochastic bandits as the starting point for the book, moving from the basic model and examples to simple uniform exploration and then adaptive exploration algorithms [src-019].
  • The model assumes stable reward distributions, so the algorithm’s job is to identify and exploit better arms while controlling the regret incurred during learning [src-019].
  • Adaptive exploration methods use accumulated evidence to concentrate future pulls on promising arms instead of spreading trials uniformly forever [src-019].
  • Stochastic bandits provide the base case for later variants: Bayesian bandits, similarity-aware bandits, and Thompson Sampling all extend or reinterpret this foundation [src-019].

Related entities

Related concepts

Source references

  • [src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)