Adversarial Bandits
Adversarial bandits are bandit problems where rewards or costs are not assumed to come from fixed IID distributions, so algorithms must perform well even when the environment changes in hostile or worst-case ways.
Key points
- Slivkins separates the adversarial line of work from IID reward models, covering full-feedback adversarial costs, adversarial bandits, and extensions with linear rewards and combinatorially structured actions [src-019].
- The adversarial setting weakens assumptions about the world: instead of estimating stable arm means, algorithms must guard against changing or strategically selected reward sequences [src-019].
- This matters for AI/product systems when the environment is non-stationary, competitors react, user populations shift, or incentives cause participants to adapt to the algorithm [src-019].
- The contrast with Stochastic Bandits is the modelling trade-off: weaker assumptions give robustness, but often require different regret guarantees and more conservative exploration [src-019].
Related entities
Related concepts
Source references
- [src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)