Upper Confidence Bound

Upper Confidence Bound is a bandit exploration strategy that scores actions by combining estimated reward with an uncertainty bonus, favouring actions that look good or have not been tried enough.

Key points

Yildirim describes UCB as choosing based on both the predicted outcome and confidence in that estimate, so under-sampled actions remain eligible for exploration ^[src-021].
In simple bandits, uncertainty can be tied to how often an action has been served; in contextual bandits, uncertainty must account for how often actions have been served in relevant contexts ^[src-021].
With high-dimensional context, UCB usually needs a model-based form. The article points to LinUCB as a popular version that formalises contextual UCB in a linear model framework ^[src-021].
UCB is more directed than Epsilon-Greedy because exploration is focused where uncertainty is high rather than allocated as undifferentiated random traffic ^[src-021].

Related entities

_(none yet)_

Related concepts

Source references

^[src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)

Upper Confidence Bound

Upper Confidence Bound

Key points

Related entities

Related concepts

Source references

Explore Robin's AI portfolio

Recent posts

Archive

Tags

Senior AI product leadership

Robin Cartier

Company

Services