Epsilon-Greedy
Epsilon-greedy is a simple exploration strategy where an algorithm usually chooses the currently best-looking action, but with a small probability chooses randomly to keep learning.
Key points
- In Yildirim’s contextual-bandit overview, epsilon-greedy first estimates which treatment is performing best for a given context, serves that treatment with probability
1 - epsilon, and explores other treatments with probabilityepsilon[src-021]. - The value of epsilon can be scheduled: larger early in an experiment when uncertainty is high, smaller later as more evidence accumulates [src-021].
- For high-dimensional contexts, the article recommends using a model to estimate outcome values instead of trying to compute averages for every exact context value [src-021].
- Epsilon-greedy is easy to explain and implement, but it explores randomly rather than directing exploration toward actions with high uncertainty or high posterior promise [src-021].
Related entities
_(none yet)_
Related concepts
Source references
- [src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)