Upper Confidence Bound

Upper Confidence Bound

Upper Confidence Bound is a bandit exploration strategy that scores actions by combining estimated reward with an uncertainty bonus, favouring actions that look good or have not been tried enough.

Key points

  • Yildirim describes UCB as choosing based on both the predicted outcome and confidence in that estimate, so under-sampled actions remain eligible for exploration [src-021].
  • In simple bandits, uncertainty can be tied to how often an action has been served; in contextual bandits, uncertainty must account for how often actions have been served in relevant contexts [src-021].
  • With high-dimensional context, UCB usually needs a model-based form. The article points to LinUCB as a popular version that formalises contextual UCB in a linear model framework [src-021].
  • UCB is more directed than Epsilon-Greedy because exploration is focused where uncertainty is high rather than allocated as undifferentiated random traffic [src-021].

Related entities

_(none yet)_

Related concepts

Source references

  • [src-021] Ugur Yildirim — “An Overview of Contextual Bandits” (2024-02-02)