Epsilon-Greedy

Epsilon-Greedy

Epsilon-greedy is a simple exploration strategy where an algorithm usually chooses the currently best-looking action, but with a small probability chooses randomly to keep learning.

Key points

  • In Yildirim’s contextual-bandit overview, epsilon-greedy first estimates which treatment is performing best for a given context, serves that treatment with probability 1 - epsilon, and explores other treatments with probability epsilon [src-021].
  • The value of epsilon can be scheduled: larger early in an experiment when uncertainty is high, smaller later as more evidence accumulates [src-021].
  • For high-dimensional contexts, the article recommends using a model to estimate outcome values instead of trying to compute averages for every exact context value [src-021].
  • Epsilon-greedy is easy to explain and implement, but it explores randomly rather than directing exploration toward actions with high uncertainty or high posterior promise [src-021].

Related entities

_(none yet)_

Related concepts

Source references

  • [src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)