Epsilon-Greedy

Epsilon-greedy is a simple exploration strategy where an algorithm usually chooses the currently best-looking action, but with a small probability chooses randomly to keep learning.

Key points

In Yildirim’s contextual-bandit overview, epsilon-greedy first estimates which treatment is performing best for a given context, serves that treatment with probability 1 - epsilon, and explores other treatments with probability epsilon ^[src-021].
The value of epsilon can be scheduled: larger early in an experiment when uncertainty is high, smaller later as more evidence accumulates ^[src-021].
For high-dimensional contexts, the article recommends using a model to estimate outcome values instead of trying to compute averages for every exact context value ^[src-021].
Epsilon-greedy is easy to explain and implement, but it explores randomly rather than directing exploration toward actions with high uncertainty or high posterior promise ^[src-021].

Related entities

_(none yet)_

Related concepts

Source references

^[src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)

Epsilon-Greedy

Epsilon-Greedy

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services