Stochastic Bandits

Stochastic bandits are bandit problems where each arm has a fixed reward distribution and rewards are independently sampled over time, making them the basic IID setting for adaptive exploration.

Key points

Slivkins uses stochastic bandits as the starting point for the book, moving from the basic model and examples to simple uniform exploration and then adaptive exploration algorithms ^[src-019].
The model assumes stable reward distributions, so the algorithm’s job is to identify and exploit better arms while controlling the regret incurred during learning ^[src-019].
Adaptive exploration methods use accumulated evidence to concentrate future pulls on promising arms instead of spreading trials uniformly forever ^[src-019].
Stochastic bandits provide the base case for later variants: Bayesian bandits, similarity-aware bandits, and Thompson Sampling all extend or reinterpret this foundation ^[src-019].

Related entities

Aleksandrs Slivkins

Related concepts

Source references

^[src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)

Stochastic Bandits

Stochastic Bandits

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services