Epsilon-Greedy

Epsilon-greedy is a simple exploration strategy where an algorithm usually chooses the currently best-looking action, but with a small probability chooses randomly to keep learning.

Key points

In Yildirim’s contextual-bandit overview, epsilon-greedy first estimates which treatment is performing best for a given context, serves that treatment with probability 1 - epsilon, and explores other treatments with probability epsilon ^[src-021].
The value of epsilon can be scheduled: larger early in an experiment when uncertainty is high, smaller later as more evidence accumulates ^[src-021].
For high-dimensional contexts, the article recommends using a model to estimate outcome values instead of trying to compute averages for every exact context value ^[src-021].
Epsilon-greedy is easy to explain and implement, but it explores randomly rather than directing exploration toward actions with high uncertainty or high posterior promise ^[src-021].

Related entities

_(none yet)_

Related concepts

Source references

^[src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)

Robin Cartier perspective

This page is part of Robin Cartier's working AI knowledge graph: a practical research layer for production AI, recommendation systems, experimentation, GEO, and agentic web readiness.

The useful next step is to connect this concept back to applied product leadership and operating models.

Recommended next

Keep reading from this thread

From 491 indexed pages and articles.

Epsilon-Greedy

Epsilon-Greedy

Key points

Related entities

Related concepts

Source references

Robin Cartier perspective

Keep reading from this thread

Robin Cartier

Company

Services