Adversarial Bandits

Adversarial bandits are bandit problems where rewards or costs are not assumed to come from fixed IID distributions, so algorithms must perform well even when the environment changes in hostile or worst-case ways.

Key points

Slivkins separates the adversarial line of work from IID reward models, covering full-feedback adversarial costs, adversarial bandits, and extensions with linear rewards and combinatorially structured actions ^[src-019].
The adversarial setting weakens assumptions about the world: instead of estimating stable arm means, algorithms must guard against changing or strategically selected reward sequences ^[src-019].
This matters for AI/product systems when the environment is non-stationary, competitors react, user populations shift, or incentives cause participants to adapt to the algorithm ^[src-019].
The contrast with Stochastic Bandits is the modelling trade-off: weaker assumptions give robustness, but often require different regret guarantees and more conservative exploration ^[src-019].

Related entities

Aleksandrs Slivkins

Related concepts

Source references

^[src-019] Aleksandrs Slivkins — “Introduction to Multi-Armed Bandits” (2019-04-15; revised 2024-04-03)

Adversarial Bandits

Adversarial Bandits

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services